Unified Search Result Across Different Sites In Sitecore

I started in Sitecore just a year ago, so I thought to start sharing my learning with you all as it may help beginners to get up to speed in Sitecore.
One of our clients had multiple site in different domains and they wanted a unified search result, through their Sitecore site. These sites are not related (let's say Site A and Site B).
To achieve this we need to add third party data into the site and include it in the search results.

Execution Steps

  1. Create a template to store Site B data in sitecore (Site A)
  2. Retrieve Site B data from all  its pages
To read data first we needa  list of all public urls of site B. As then these urls will be scanned for data.
To get a  list of all these pages we have used SiteMap XML of Site B.
Also we use  .net Webclient to retrieve data and HtmlAgilityPack as an HTML parser.
But as Site B was a secure site we were getting a security issue while reading its data. Thus I used “ServicePointManager” class (SecurityProtocolType.Tls12) to get rid of that problem.
  1. string sitemapHtml;  
  2. ServicePointManager.Expect100Continue = true;  
  3. ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;  
  4. using(var wc = new WebClient()) {  
  5.         sitemapHtml = wc.DownloadString(“https: //SiteB/sitemap.xml”);  
  6.         }  
  7.         // create HtmlAgilityPack document object from HTML  
  8.         var sitemapDoc = new HtmlDocument();  
  9.         sitemapDoc.LoadHtml(sitemapHtml);  
  10.         // parsing HTML  
  11.         HtmlNode[] sitemapNode = sitemapDoc.DocumentNode.SelectNodes("//NodeName").ToArray();  
Here now retrieve sitedata using sitemapNode based on your requirement.
  1. foreach(HtmlNode item in sitemapNode) {  
  2.     -- -- -- - get data based on your requiredment-- -- -  
  3. }  
Don’t forget to replace any unwanted data with empty string before saving.
e.g. to replace comments,
Regex.Replace(strhtmlData, "<!--.*?-->", string.Empty, RegexOptions.Singleline);

Store data in Site A

I created a folder in SiteA sitecore, it will have child items for each public page in Site B.
High level flow to add data,
  1. Sitecore.Data.Database masterDB = Sitecore.Configuration.Factory.GetDatabase("master");  
  2. parentItem = masterDB.GetItem(“folder itemid under which item  
  3.     for each page will be created”);  
  4. parentItem.Editing.BeginEdit();  
  5. var template = masterDB.GetTemplate(SettingsManager.Investors.UrlTemplate);  
  6. parentItem.DeleteChildren();  
  7. this.ExtracUrlDataAndSave(parentItem, template, sitemapNode);  
  8. this.PublishItem(parentItem); //to publish items to web database parentItem.Editing.EndEdit();  
Include added SiteB data in search results either by updating existing search index or creating a new index.
Based on your search index configuration you need to update its config file. If all fields and templates are not included in search, then first you need to allow this newly-created template and its field to be indexed in configuration file:
  1. <include hint="list:IncludeField">  
  2.     <Title>{4ACD80D8-DC30-41BA-811E-B2224B77CB4B}</Title>  
  3.     <Subtitle>{9CB0A314-35AE-4998-AFEE-9F587FAEDE41}</Subtitle>  
  5. ------list of all fields-------  
  8.     <NewField1>{055FD520-E764-4010-B135-2677D0ADD6BA}  
  9.     </ NewField1>< NewField2>{740BC3B3-5710-42B2-8BF3-9D0679DAA62C}  
  10. </ NewField2>undefined</include>undefined<include hint="list:IncludeTemplate">  
  11. <HomePage>{018A497A-05EA-4638-98CD-D7EC4A55471F}</HomePage>  
  13. -----list of all allowed template which can take part in search------  
  16. <NewTemplate>{F0DF64D5-B1DE-4831-95C7-8977AC751499}  
  17. </ NewTemplate >undefined</include>  
In Site Index config, make sure crawler information is specified. As crawler detail will be used when index is created for data, if these details or paths where your data is present don’t have crawler detail then your data won’t be searched.
  1. <locations hint="list:AddCrawler">  
  2.    <crawler type="Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch">  
  3.       <Database>master</Database>  
  4.       <Root>/sitecore/content</Root>  
  5.    </crawler>  
  6. </locations>  
Above crawler includes all items of content tree to be searched. But you can have different indexes and each can have a different crawl location. It helps in generating indexes fast and to segregate index responsibility.
Add fields to be searched in index class,
  1. [IndexField("NewField1")]  
  2. public virtual string Field1 {  
  3.     get;  
  4.     set;  
  5. }  
  6. [IndexField("NewField2")]  
  7. public virtual string Field2 {  
  8.     get;  
  9.     set;  
  10. }  
Now add these newly-added fields in search predicate query.
Don’t forget to rebuild the index where you added these fields and now you are all set to get these data in search results… ??