John Riker

John Riker

  • NA
  • 85
  • 14.4k

HTML Agility Pack get specific URL's

Nov 1 2020 12:15 PM
Trying to parse a webpage and get all the URL's for the shows listed and then do some work with them.  So in this case take:
 
https://abc.com/shows/general-hospital
 
I want to grab all the general hospital show links listed on this page.  Can't really trust it but each one right now shows 36m in the name for the link but of course a video could be shorter or longer.  Any way to grab all the top row of data?  May change a bit Monday as it will have part of October and part of November.
 
Right now I have this and returns the text of all links on the entire site:
  1. static void Main()  
  2. {  
  3.     WebClient webClient = new WebClient();  
  4.     var page = webClient.DownloadString("https://abc.com/shows/general-hospital");  
  5.   
  6.     HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();  
  7.     doc.LoadHtml(page);  
  8.   
  9.     foreach (var td in doc.DocumentNode.SelectNodes("//a[@href]"))  
  10.     {
  11.         Console.WriteLine(td.InnerText);  
  12.     }  
  13.   
  14. }
So basically if you look at that page source would need to target:
  1. <script data-react-helmet="true" type="application/ld+json">    
  2.                 [{    
  3.                     "@context""http://schema.org/",    
  4.                     "@type""ItemList" ,    
  5.                     "itemlistElement":    
  6.                         [ [{    
  7.                                 "@type""ListItem",    
  8.                                 "position" : 1,    
  9.                                 "name" : "November 2020",    
  10.                                 "item":  []    
  11.                             }],[{    
  12.                                 "@type""ListItem",    
  13.                                 "position" : 2,    
  14.                                 "name" : "October 2020",    
  15.                                 "item":  [{    
  16.                                     "@type""Thing",    
  17.                                     "name" : "General Hospital 10/30/20",    
  18.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/30-general-hospital-103020"    
  19.                                 },{    
  20.                                     "@type""Thing",    
  21.                                     "name" : "General Hospital 10/29/20",    
  22.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/29-general-hospital-102920"    
  23.                                 },{    
  24.                                     "@type""Thing",    
  25.                                     "name" : "General Hospital 10/28/20",    
  26.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/28-general-hospital-102820"    
  27.                                 },{    
  28.                                     "@type""Thing",    
  29.                                     "name" : "General Hospital 10/27/20",    
  30.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/27-general-hospital-102720"    
  31.                                 },{    
  32.                                     "@type""Thing",    
  33.                                     "name" : "General Hospital 10/26/20",    
  34.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/26-general-hospital-102620"    
  35.                                 },{    
  36.                                     "@type""Thing",    
  37.                                     "name" : "General Hospital 10/23/20",    
  38.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/23-general-hospital-102320"    
  39.                                 },{    
  40.                                     "@type""Thing",    
  41.                                     "name" : "General Hospital 10/22/20",    
  42.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/22-general-hospital-102220"    
  43.                                 },{    
  44.                                     "@type""Thing",    
  45.                                     "name" : "General Hospital 10/21/20",    
  46.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/21-general-hospital-102120"    
  47.                                 },{    
  48.                                     "@type""Thing",    
  49.                                     "name" : "General Hospital 10/20/20",    
  50.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/20-general-hospital-102020"    
  51.                                 },{    
  52.                                     "@type""Thing",    
  53.                                     "name" : "General Hospital 10/19/20",    
  54.                                     "url" : "www.abc.com/shows/general-hospital/episode-guide/2020-10/19-general-hospital-101920"    
  55.                                 }]    
  56.                             }],[{     
And would probably want the name and url from each section. 

Answers (1)