Automation Of SharePoint Search Crawl Using PowerShell Scripts - Part One

Introduction

In this article series, you will learn how to create a new content source in SharePoint Search and automate the Crawl using PowerShell scripts.

Pre-requisite

This article assumes the reader has already created and configured Search Service Application successfully.

Description

SharePoint Search in 2013 and 2016 provides the ability to crawl from multiple sources. First, you need to create a content source before starting to crawl. There are six types of content sources from which you can crawl.

  • SharePoint Sites
  • Web Sites
  • File Shares
  • Exchange Public Folders
  • Line of Business Data
  • Custom Repository

SharePoint Sites- This is suitable for any SharePoint Sites, either it can be 2007, 2010, 2013 or 2016 sites.

Web Sites – This is for external websites available on the intranet and the internet.

File Shares- It is suitable for any File / Directory Share available.

Exchange Public Folders- You can crawl outlook files using this option.

Line of Business Data- External content types stored in DB / connected via ECT in SharePoint using Designer can be crawled using this option.

Custom Repository – Any custom connectors that you have created will be available when selecting this option.
SharePoint
Adding new content source in SharePoint Search is possible once you have created Search Service Application.

Go to Central Administration. Click on "Manage Service Application".
SharePoint

Click on "Search Service Application" that has been configured.

SharePoint 

You will get navigated to the "Search Service Application - Search Administration" page.

Click on Content Sources on the left navigation menu under "Crawling" header.

SharePoint

Click on "New Content Sources".
SharePoint

It will prompt you to enter the name, select the type of content source, start Address and the Crawl settings, Crawl schedules, and Content source priority.

SharePoint

Crawl settings will be different based on the content source that you would select. If it is SharePoint site crawl settings, options would be the below.

Crawl everything under the hostname for each start address or Only crawl the Site Collection of each start address

If that is websites, then we can select one of the below options.

  • Only crawl within the server of each start address
  • Only crawl the first page of each start address
  • Custom - specify page depth and server hops

    • Limit Page Depth – This limit can be specified – it is unlimited by default.
    • Limit Server Hops – This server Hops too can be specified – it is unlimited by default.

If your option is file shares and/or Exchange Public Folders, then you can choose between the below.

  • Crawl the folder and all subfolders of each start address
  • Only crawl the folder of each start address

If content source is Line of Business Data, then below options would be available.

  • Crawl all external data sources in this Business Data Connectivity Service Application
  • Crawl selected external data source (Will display the external content sources that you have created)


SharePoint

You can create crawl schedules and if your site is SharePoint site, then you can enable Continuous crawl.

Incremental and Full crawls are available for all the sources. The difference between Incremental and Full Crawl is that only updates would be crawled in Incremental as opposite to complete Full Crawl.

The last one is Content Source Priority. It can be selected based on the importance and priority. There are two options available one is Normal and second one is High Priority.

The Crawl system will prioritize the processing of 'High' priority content sources over 'Normal' priority content sources.

Now, you have created content source. You will see how to automate this content source crawling using PowerShell in the upcoming part.