Search Content Source to Website

Search Content Source to Website

In this article we can create a Search Content Source to a Website.

What is the Goal?

Our goal is to make the content in the following blog searchable from SharePoint 2013.


Please note that the web site above is a reference. You can come up with your own web site with a valid robots.txt file.

Procedure

The following is the procedure involved.

Step 1: Create Content Source

Open Central Administration then seelct "Service Applications" > "Search Service Application" > "Content Sources".


Create a new Content Source and enter the following information.


Click OK to save the changes.

Step 2: Crawl

Now choose the Full Crawl option for the content source.


Wait for a few minutes for the crawling to be completed.


SharePoint access the Home Page through the URL, parsing contents, reading metadata, extracting URLs and digging deeper for more contents and all together performs the indexing.

Step 3: View Log

You can check the Content Source for any Crawl Errors or Warnings that prevent the content from showing.


You will get the following page.


You can click on the links to view the error/warning. Discard the non-serious ones.

Step 4: Search

Open the Enterprise Search Center site and type in the following text.


You can see the results showing with blog URL above. This confirms our Web Content Source configuration.

Challenges

In the real world scenarios things won't work in this speed. You may encounter the following issues and I can provide some links to resolve them.

You can view these errors from the Content Source > View Crawl Log menu.

Items might not be crawled due to one of the following reasons: Preventive crawl rule; specified content source hops/depth exceeded; URL has query string parameter; required protocol handler not found; preventive robots directive.

Solution 1: If query strings are involved in the URL then go for Crawl Rules > http://bit.ly/1k1sIKt

Solution 2: If the source is in the same system then do a loop back check > http://support.microsoft.com/kb/896861/en-us

The content for this address was excluded by the crawler because this item was marked with a no-index meta-tag. To index this item, remove the meta-tag and recrawl.

Solution 1: If the source is an external web site then check the robots.txt > http://bit.ly/PomtFg

Solution 2: If the source is a SharePoint site or library then see http://bit.ly/1i99dBs

As a common measure I would recommend applying SharePoint Cumulative Updates and Operating System Service Packs to the machines.

References

http://technet.microsoft.com/en-us/library/jj219808(v=office.15).aspx

Summary

In this article we have explored how to create a Web Content Source in SharePoint 2013.