SQL Server Full-Text Searching - Practical Approach

Introduction

This article covers practical examples using Full-Text Search in SQL Server.

In our practice, we mostly use string functions (substring, left, right, etc) and LIKE predicate for searching any text in the database. This is called Simple Text Searching or Basic searching. This type of searching is enough for small texts, but in large content, it will slow our system. It also doesn’t provide more advanced searching techniques. We always try to organize our searches using SARG (Search Arguments), but in most cases, unfortunately, it is not possible to do it efficiently. Searching efficiently in a big text file, or a (n)(var)char(max) -based column requires more effort than what is provided by string functions and LIKE predicate.

Besides simple text searching, T-SQL provides a “more advanced, deep” searching mechanism that uses a built-in Full-Text Search component. This helps you to search not just in (n)(var)char type-based columns, but you can also conduct a search in varbinary, text, XML, image, and FileStream.

What types of searching does Full-text provide?

You can think about Full-Text search as something like a “search engine." It provides below functionalities:

  1. Finding any phrase and word
  2. Finding words/phrases using an abbreviation (Thesaurus file)
  3. Finding synonyms (Thesaurus file)
  4. Finding words near other words
  5. Finding not just simple, but also most relevant phrases
  6. Finding files by their properties
  7. Affecting search by configuring stop words and stop lists
  8. Finding words by their inflectional structure
  9. And much more!

Most of the above list items require a separate article. I will talk about most of them in the next articles. This article covers general information with practical examples.

Getting started

If you want to use Full-text search, it should be installed with SQL SERVER Management Studio.

If this is not installed, simply reinstall it using Management studio wizard.

You can check if it is installed using the ServerProperty built-in function.

/*
Check if full-text search component has already installed.
Response type:
1 - Installed
0 - Not installed yet
*/
SELECT SERVERPROPERTY('IsFullTextInstalled');

First, let’s define some tables for storing a large amount of data.

I will create two columns for a full-text search.

One of these columns is going to be varchar(max), and the other one is a complete file storage(varbinary).

If you want to store a file directly inside the database and use a full-text search, then you should store this file’s type as a separate column. Binary, XML, and image types require that file type have a separate parameter for the full-text index.

CREATE TABLE Articles
(ArticleId int not null identity,
Title nvarchar(2000) not null,
BriefContent nvarchar(max),
DocType varchar(10) not null,
Document varbinary(max) not null,
CONSTRAINT PK_Articles_ArticleId PRIMARY KEY CLUSTERED (ArticleId))

Let's add some rows to the table:

INSERT INTO Articles
(Title, BriefContent, DocType, Document)
SELECT 'Async and wait in .NET','C# 5 introduced a simplified approach, async programming, 
that leverages asynchronous support in the .NET Framework 4.5 and higher, 
.NET Core, and the Windows Runtime. The compiler does the difficult work that the developer
used to do, and your application retains a logical structure that resembles synchronous code. As a result, 
you get all the advantages of asynchronous programming with a fraction of the effort.'
,'docx',BULKCOLUMN FROM OPENROWSET(bulk 'C:\fulltext\async_await.docx',SINGLE_BLOB) as P

INSERT INTO Articles
(Title, BriefContent, DocType, Document)
SELECT 'Collections in .NET','Collections provide a more flexible way to 
work with groups of objects. Unlike arrays, the group of objects you work
with can grow and shrink dynamically as the needs of the application change.
For some collections, you can assign a key to any object that you put into the collection
so that you can quickly retrieve the object by using the key.'
,'docx',BULKCOLUMN FROM OPENROWSET(bulk 'C:\fulltext\collections.docx',SINGLE_BLOB) as P

The next step for using a Full-Text search is creating a full-text catalog.

The full-Text catalog doesn’t belong to any file group. It is a virtual container for organizing full-text indexes. You can think about a full-text catalog as a box for storing full-text indexes. We have a special t-SQL command for them:

/*
Create full text catalog
ACCENT_sensitivity - enable/disable diacritical characters (on/off)
as default - use as a default catalog for indexes
authorization dbo
*/
CREATE FULLTEXT CATALOG fulltextsearchdbCatalog
WITH ACCENT_sensitivity = on
as default
AUTHORIZATION dbo

The above command contains minimal construction items for creating a full-text catalog.

Here are the parts of the query:

  1. WITH ACCENT_SENSITIVITY – If you have diacritical marks/characters in your language or the language you’re using in SQL SERVER, it should be enabled. Otherwise, just choose OFF.
  2. AS DEFAULT – If you don’t want to point to the catalog from full-text indexes, you can mark your catalog as default.
  3. AUTHORIZATION dbo– same for synonyms. Catalogs require authorization for a single user. In our case, we “register” them under dbo.

The last step is creating full-text indexes.

-- Skeleton
CREATE FULLTEXT INDEX ON table_name(configuration_for_index)
KEY INDEX PK_INDEX_FOR_TABLE
ON CATALOG_NAME

--applied skeleton to our table
CREATE FULLTEXT INDEX ON Articles(BriefContent Language 1033, Document Type Column Doctype)
KEY INDEX PK_Articles_ArticleId
ON fulltextsearchdbCatalog

When creating a full-text index, you must provide a primary key of the main table.

As you see, BriefContent is a varchar(max) column, and it has a Language 1033 indicator. 1033 is the code of the English language. This means we are planning to apply the Fulltext configuration related to the English language to this column. On the other hand, Type Column ( DocumentType ColumnDocType ) indicates that the Document column’s type is DocType column.

Now it is time to see Fulltext in practice.

T-SQL provides four functions to work with fulltext searching: Contains, FreeText, ContainsTable and FreeTextTable. Remembering their names is really easy, but each of them “opens different doors,” so to speak, into the searching galaxy. This is part one of the article, and for that reason I’ll demonstrate most used function: CONTAINS.

Contains and FreeText are boolean returning functions. For that reason we can easely use them in the WHERE clause. The below examples demonstrate how to get response using CONTAINS:

When searching a phrase/word combination, empty spaces will throw errors:

To fix this issue use the " " or % symbol.

Below you can find most used variations of Contains:

--if you know part of the word, then use `*`
SELECT Title, BriefContent, DocType, Document FROM Articles
WHERE CONTAINS(Document,'"advanta*"');

--if you want to find phrase, then replace empty symbol with `%` or use " "
SELECT Title, BriefContent, DocType, Document FROM Articles
--WHERE CONTAINS(Document,'advantages%of%asynchronous');
WHERE CONTAINS(Document,'"advantages of asynchronous"');

--if it is `arcade of asynchronous` or it is `advantages of asynchronous`
SELECT Title, BriefContent, DocType, Document FROM Articles
WHERE CONTAINS(Document,'arcade%of%asynchronous OR advantages%of%asynchronous');

--if the word `avoid` is near `performance` . 
--Precedes is not important
SELECT Title, BriefContent, DocType, Document FROM Articles
WHERE CONTAINS(Document,'NEAR(avoid,performance)');

--if the word `avoid` is close to `performance` within four terms
--Precedes is not important
SELECT Title, BriefContent, DocType, Document FROM Articles
WHERE CONTAINS(Document,'NEAR((performance,avoid),4)');

--searches for "performance" within twenty two terms of "avoid" 
-- but only when "performance" precedes "avoid".
SELECT Title, BriefContent, DocType, Document FROM Articles
WHERE CONTAINS(Document,'NEAR((performance,avoid),22,TRUE)');

--Inflection : is a process of word formation in which a word 
--   is modified to express different grammatical categories
SELECT Title, BriefContent, DocType, Document FROM Articles
WHERE CONTAINS(Document,'FORMSOF(INFLECTIONAL,"responsiveness")');

--Find data regarding to thesaurus file
SELECT Title, BriefContent, DocType, Document FROM Articles
WHERE CONTAINS(Document,'FORMSOF(THESAURUS,axtaracagimizSoz)');

--Find document by "property name = value" combination
SELECT Title, BriefContent, DocType, Document FROM Articles
WHERE CONTAINS(PROPERTY(Document,'property_name'),'PROPVALUE');