IFilters in SharePoint 2010

IFilter supports an Indexing Service, SharePoint Portal Server, Search Server, SQL Server, Windows Desktop Search and all other products based on Microsoft Search technology to index various file formats so that they become searchable.

The filter extracts a stream of textual information from a document, discarding all non-textual and formatting information. The filter produces strings of text and property/value pairs to pass in turn to the indexing engine as shown below in the architecture diagram.

IFilters-in-SharePoint.gif

In the Central Administration in SharePoint 2010 -> in the Search Administration -> click on the File Types -> you will see a screen as in the following -> listing the File Types that will be indexed by the engine.

File-Types-in-sharepoint.gif

The crawler uses an IFilter to read individual file type when crawling content. Some IFilters read only one file type, whereas others can read several file types.

If you have to crawl a file type that is not supported by an IFilter that is provided with Microsoft SharePoint Server 2010, you must install and register the appropriate IFilter on the crawl server.

The procedures used to install and register IFilters vary among various IFilters which will be given by the solution providers.

If you add a file type without registering an IFilter, only the file properties are included in the index and not the actual content of the item.

Indexing DWG and DXF files

As an example of non Microsoft file formats if you need to index text in the drawing files, you need to install a DWG IFilter plug-in. This plug-in allows Microsoft Search products and services to index DWG files, enabling customers to search and organize their content.

A DWG IFilter extracts text found in Block elements such as TEXT, MTEXT, RTEXT and ArcAlignedText. Attributes are fully indexed. The DWG IFilter also outputs information about Layers and Views. Installation of AutoCAD or any CAD program is not required. After enabling the filter remember to re-feed your content.

Similarly to indexing PDF files, rar files and Lotus Notes documents there are IFilters available from third parties.

Advanced Filter Pack in FAST Search Server 2010 for SharePoint

Advanced Filter Pack is a FAST Search Server 2010 for SharePoint feature that enables text and metadata extraction from several hundred file formats, complementing the document formats that are supported by the Microsoft Filter Pack.

By default, the Advanced Filter Pack is disabled.

If a file type is not excluded by the File Types list, it will still be crawled, but the actual content and metadata is only extracted if an IFilter is registered for that file type. Consult the Crawl Log after a small test crawl with the file type you want to include, and the messages in the Crawl Log will indicate the next steps.

On the Search Administration page, under Crawling, click Crawl Log. Look for the following messages:

  • Unknown document format, skipping conversion

This message indicates that you have to register and install a third-party IFilter.

  • No filter available. Enable the Advanced Filter Pack for more filters

This message indicates that the content and metadata can be extracted by enabling the Advanced Filter Pack.

  • If none of these messages appear, perform a search and verify that the content is searchable.

SharePoint 2010: Building IFilters for SharePoint 2010 Search and Windows Search

As of Windows 7, you can no longer use managed code to implement an IFilter because for any given process, only one version of the .NET Framework runtime can be loaded at a time. This means that if one IFilter developer uses the 2.0 version of the .NET Framework and another developer uses the 4.0 version, the two IFilters are incompatible.

In Windows 7 and later versions, filters that are written in managed code are explicitly blocked. Filters must be written in native code because there is potential Common Language Runtime (CLR) versioning issues with the process that multiple add-ins run in. Although it might be possible to write an IFilter in Microsoft Visual Basic 6.0, it is likely a very bad option considering the throughput demands that are required to index thousands, or possibly millions, of files (for example, in SharePoint). Therefore, the best option to develop an IFilter is to implement it using C++.

The sample code is available at http://code.msdn.microsoft.com/SharePoint-2010-Building-218ee5aa#content