In this article I am trying to share a piece of code that might be useful to 
some of the developers.
We can find a lot of code in C# that will parse the http urls in given string. 
But it is difficult to find a code that will:
- Accept a url as argument, parse the site 
	content 
- Fetch all urls in the site content, parse 
	the site content of each urls
- Repeat the above process until all urls 
	are fetched.
Scenario
Taking the website http://valuestocks.in  
(A Stock Market Site) as example I would like to get all the urls inside the 
website recursively.
Design
The main class is SpiderLogic which contains all necessary methods and 
properties.
![Url1.gif]()
The GetUrls() method is used to parse the website and return the urls. There are 
two overloads for this method.
The first one takes 2 arguments. The url and and a Boolean indicating if 
recursive parsing is needed or not.
E.g.: GetUrls(http://www.google.com", true);
The second one is 3 arguments, url, base url and recursive Boolean.
This method is intended for usage like the url is a sub level of the base url. 
And the web page contains relative paths. So in order to construct the valid 
absolute urls, the second argument is necessary.
E.g.: GetUrls("http://www.whereincity.com/india-kids/baby-names/ 
", http://www.whereincity.com/ , 
true);
Method Body of GetUrls()
public
IList<string> 
GetUrls(string url, 
string baseUrl, 
bool 
recursive)
{
    if (recursive)
    {
        _urls.Clear();
        RecursivelyGenerateUrls(url, baseUrl);
       
return _urls;
    }
    else
        return InternalGetUrls(url, 
baseUrl);
}
InternalGetUrls()
Another method of interest would be InternalGetUrls() which fetches the content 
of url, parses the urls inside it and constructs the absolute urls.
private
IList<string> 
InternalGetUrls(string baseUrl,
string absoluteBaseUrl)
{
    IList<string> 
list = new List<string>();
   
Uri uri = null;
    if (!Uri.TryCreate(baseUrl,
UriKind.RelativeOrAbsolute,
out uri))
        return list;
    
// Get the http content
    string siteContent = 
GetHttpResponse(baseUrl);
    
var allUrls = GetAllUrls(siteContent);
    
foreach (string uriString
in allUrls)
    {
        uri = null;
        if (Uri.TryCreate(uriString,
UriKind.RelativeOrAbsolute,
out uri))
        {
            if (uri.IsAbsoluteUri)
            {
                if (uri.OriginalString.StartsWith(absoluteBaseUrl))
// If different domain / javascript: urls needed 
exclude this check
                {
                    list.Add(uriString);
                }
            }
            else
            {
                string newUri = 
GetAbsoluteUri(uri, absoluteBaseUrl, uriString);
                if (!string.IsNullOrEmpty(newUri))
                    list.Add(newUri);
            }
        }
        else
        {
            if (!uriString.StartsWith(absoluteBaseUrl))
            {
                string newUri = 
GetAbsoluteUri(uri, absoluteBaseUrl, uriString);
                if (!string.IsNullOrEmpty(newUri))
                    list.Add(newUri);
            }
        }
    }
    
return list;
}
Handling Exceptions
There is an OnException delegate that can be used to get the exceptions 
occurring while parsing.
Tester Application
A tester windows application is included with the source code of the article.
You can try executing it. 
The form accepts a base url as the input and clicking the Go button it parses 
the content of url and extracts all urls in it. If you need a recursive parsing 
please check the Is Recursive check box.
![Url2.gif]()
Next Part
In the next part of the article, I would like to create a url verifier website 
that verifies all the urls in a website. I agree after doing a search we can 
find free providers like that. My aim is to learn & develop a custom code that 
could be extensible and reusable across multiple projects by community.