How To Make Data Extraction In ASP.NET Using C#

The need and the necessity of the data in IT Services is increasing day by day, as Websites and related sources over the internet are increasing; the need of the data is also increasing and it will always be increasing in an exponential way. 

Today’s era is all about the technology and information. In such platforms, the data plays a very important role in making various functions as search engine, analysis, data mining and many more. It is not a subject oriented platform; it is widely accepted as a departmental platform.

Today, in this tutorial, we will see how we extract the data from any Website, using HTML Agility pack in an ASP.NET, using C#, where you can extract the data on the basis of HTML tags, whatever data you want; you can just put the related tag and you can extract it.

INITIAL CHAMBER

Step 1

Open your Visual Studio 2010 and create an Empty Website, give a suitable name [dataextract_demo].

Step 2

In Solution Explorer, you get your empty Website, add a Web form. You can proceed, as shown below.

For Web Form

dataextract _demo (Your Empty Website) -> Right click -> Add New Item -> Web Form. Name it as dataextract _demo.aspx.

DESIGN CHAMBER

Step 3

Now, open your dataextract _demo.aspx file, where we just put a GridView, so that whatever data extracts at the runtime, you can show them in a GridView. You can take your own control to show the data.

Here, we had used HTML Agility Pack and you can get it from the relevant source over the internet, but we are using NuGet Package to make an easy access.

For it, go to tools in the NuGet Package Manager Console. If somehow, you don’t have the options in the tools, then you have to install NuGet Package Manager from Visual Studio extensions.

For an HTML Agility pack

C#

CODE CHAMBER

Step 4

Open dataextract.aspx.cs. Here, we will write our Server side code to extract the data, but first include the namespace for an HTML Agility Pack.

Namespace

  1. using HtmlAgilityPack;  
  2. using System;  
  3. using System.Collections.Generic;  
  4. using System.Linq;  
  5. using System.Web;  
  6. using System.Web.UI;  
  7. using System.Web.UI.WebControls;  
  8. using HtmlAgilityPack;  
  9. public partial class webscrap: System.Web.UI.Page {  
  10.     protected void Page_Load(object sender, EventArgs e) {  
  11.         refreshdata();  
  12.     }  
  13.     List < string > scrap = new List < string > ();  
  14.     private void refreshdata() {  
  15.         var htmlweb = new HtmlWeb();  
  16.         var websource = htmlweb.Load("http://www.csharpcorner.com");  
  17.         var webtag = websource.DocumentNode.SelectNodes("//a");  
  18.         int count = 1;  
  19.         if (webtag != null) {  
  20.             foreach(var tg in webtag) {  
  21.                 string st = count + "." + tg.InnerHtml;  
  22.                 string text = HttpUtility.HtmlDecode(st.ToString());  
  23.                 scrap.Add(text);  
  24.                 count++;  
  25.             }  
  26.         }  
  27.         GridView1.DataSource = scrap;  
  28.         GridView1.DataBind();  
  29.     }  
  30. }  

Output

For an Anchor tag
OUTPUT

For H1 tag

OUTPUT
OUTPUT

I hope you liked it. Have a good day. Thank you for reading.