Link Fetcher Service


In this article we will learn how to create a Web Service that fetches all the links from a given URL.

Description

A link fetcher is a very popular tool for many websites these days. Although the one we will create is rather basic, many websites use a link fetcher to crawl a website, in order to store interesting material in a database. Google.com is a brilliant example on how a very advanced link fetcher works.

The Web Service we will create, is only capable of handling a single page, and will only return the hyperlinks, including their classes, destinations and other properties.

Let us begin

Fire up Visual Studio.NET and create a new C# ASP.NET Web Service. Call it "Linkfetcher".

Enter the Code View and fill out the blanks.

Read the comments inside the code, to get a better understanding of what's going on. Basically this method handles the download of the website specified in the "links" field. When the website has been downloaded it is sent to the sortLinks method provided below.

Notice the [WebMethod] definition at the top of the code. This instructs the Web Service to allow interaction from the user. You can have as many [WebMethod]-tags as you wish inside a Web Service.

Now type/paste the following sortLinks method into your project just below the last closing curly bracket. 

private string sortLinks(string site, string findVal)
{
string myLinks = "";
string colLink = "";
//Loop through the site one char at a time. Notice the -3 in order to not overflow later on.
for (int i = 0; i < site.Length -3; i++)
{
try
{
//Check 3 chars at a time, to see if they match an "<a " tag. That's why we use the -3 above,
//otherwise we would overflow, when reaching the end of the file.
if (String.Compare(site.Substring(i,3),"<a ",true) == 0)
{
//if an "<a " was found, then... Set up a new int variable with the same value as i,
//so that even though we continue looping, i will keep it original value, until we brake out of the loop below.
int j = i;
//This will loop to the end of the file, if we don't stop it.
while (j < site.Length)
{
//Search for a closing "</a>"-tag
if (String.Compare(site.Substring(j,4),"</a>",true) == 0)
{
//If we found one, then add it to our string, and break out of the loop.
//Otherwise we could be stuck there forever!
colLink += "</a>";
break;
}
else
{
//If we haven't reached a closing "</a>" yet, then add the chars to our string.
colLink += Convert.ToChar(site.Substring(j,1));
j++;
}
}
}
}
catch (System.Exception err)
{
//If something happened on the way, then return an error message.
return err.Message;
}
//if our temporary string colLink has a value after being through the loop up there,
//we know that it contains a hyperlink.
if (colLink != "")
{
//Check to see if the user specified a query to search for.
if (findVal.Length > 0)
{
//If the colLink contains the search query specified in the findVal field
if (colLink.ToLower().IndexOf(findVal.ToLower()) > 0)
{
//Then add the link to the myLinks variable, and delimit with a semi-colon.
myLinks += colLink + ";";
}
}
else
{
//If no query was specified, then add all links to the myLinks variable.
myLinks += colLink + ";";
}
}
colLink = "";
}
//If the myLinks variable contains a value
if (myLinks != "")
{
//Then return the value.
return myLinks;
}
else
{
//Otherwise return a message.
return "No links found matching your request";
}
}
}
}