Overview Of SharePoint Column Indexing

Introduction

 
Being a SharePoint developer, administrator and architect, we quite often discuss the SharePoint list items loading issue. When it comes to a list items loading performance issue, we straight away go for indexing the columns to speed up the performance. So in this article, I will focus on what exactly happens in the SQL database when we index a column.
 

How does the SharePoint store the list items?

 
Before we head to indexing, we must know the SharePoint list item data structure. This will help to SharePoint architect while designing the application. The SharePoint database is a heavy database designed by Microsoft; it has a complex data structure in the back end. So being a developer or administrator we should never directly talk to the SharePoint database; in fact it's not recommended by Microsoft, for any changes we must go with the SharePoint UI or PowerShell or by any other coding.
 
There is a single table, i.e. "AllUserData" in all content databases which stores all SharePoint list items. I mean this is a common table for all the lists in the site collection. There are lots of internal columns in the table other than the number columns we create in SharePoint list – so what we see in the list is just a few columns. For example: 64 nvarchar, 16 units, 12 floats and so on.
 
If we look at the Microsoft documentation on the "AllUserData" table structure, we will be surprised to see how complex it is. Please click here to know more about "AllUserData" table structure.
 
AllUserTable
 
Overview Of SharePoint Column Indexing
 
As per me this is one of the major reasons of the  slowness. Had Microsoft been designed  in such a way that for each list/library there would  be a dedicated table, then the speed might have been better. Anyway, we need to move on to whatever we have.
 

What is indexing in database table?

 
A database index is a data structure that improves the speed of data retrieval operations from a database table and it works based on a key value pair technique – the core binary search and linear search algorithms are built on the indexing technique. In other words, we can say – an index is a copy of selected columns of data from an original table that can be searched very efficiently, which establishes a direct link to the complete row of data from where it was copied. Again, there are various types of indexing inthe  database like cluster and non cluster, etc. I have not been going into that since the purpose of this article is different. In a short note – indexing is a technique using which we can get the fastest query result from the database table. Indexing costs us additional writes and storage space in the file system.
 
To make it more realistic, we can give the example of the indexing in the Book. Here, the main content inside the book is the main data table and indexed pages which we generally see in the beginning of the each book where it mentions which content or chapter of the book is located in which page, this is just equivalent to hash tables in the database. Using the indexed page number – we can very quickly navigate to a specific chapter or page. Exactly indexing in database table also works in a similar way.
Overview Of SharePoint Column Indexing
 

Without indexing how does the data usually get fetched from a SQL table?

 
Let’s write the below SQL query – this displays all purchase order details from "Purchase Order" table where "OrderName" is 'PO001_Laptop'. If there are 10,000 items (rows) in the SQL table, this query will traverse thru the 10,000 rows one by one to check the below condition, even though the “PO001_Laptop” is not available in many of the rows but still this query will scan through all the rows.
 
Then, finally it displays the result which is a time consuming process and hits the query performance. In this technique we cannot skip even one row, the query has to scan through right from the first row to last row. Imagine if we have millions of records in the database table how long would it take to complete this query?
  1. SELECT * FROM PurchaseOrder 
  2. WHERE OrderName = 'PO001_Laptop'  
For the same query – if we create index on “OrderName” column – as per the above diagram this will create an entry in indexing table (hash table) and pointer will map to the original data row, whenever any query comes to SQL, first this will check the indexing table (hash table) and directly will fetch the matching items from the original data source table and will skip the checking for all unwanted rows where items are not matched. This is how indexing improves the performance of SQL queries.
 

How does indexing column work in SharePoint?

 
The list items are stored in the "AllUserData" table in the SQL. For every defined indexed column,  SharePoint stores the index value for every list item in a separate table, i.e. "NameValuePair" table which we have seen in the above. Let’s say we have 20,0000 items in the list, which means that we have 20,000 rows in “AllUserData” and 20,000 additional rows in the "NameValuePair" table (used for indexing).
 
So the SharePoint list items view requests are served based on the key value in the "NameValuePair" table and pick the appropriate mapping items from the main "AllUserData" table which is the actual data source. This is how the SharePoint indexing column speeds up the query performance.
 
Overview Of SharePoint Column Indexing
 

Is column Indexing recommended in SQL tables for the sake of better performance?

 
So far we have learned what is indexing and how indexed column improves the performance of the query. Now we need to see the the other side of the coin as well – I mean is column indexing recommended?
 
There is no straight answer for this – it all depends on the needs and the way you operate your database table. As we have learned,  the column indexing costs us additional writes and storage space, so if our application requires more insert/update operation, we might want to use tables without indexes, but if it requires more data retrieval operations, we should go for indexing table.
 

Why is SharePoint limited to have 20 indexed columns in a list?

 
There might be other reasons, but this is my analysis. As we have seen the drawback of indexed column is that the indexed column indices require additional space on the disk as the indices are stored together in a table using the MyISAM engine, this file may rapidly reach to the threshold limits of the given file system, which means  it will exhaust all disk space when many columns from the same list are indexed. So we should try to minimize the indexed column number as much as possible.
 

What are the SharePoint columns types that are supported or unsupported for indexing?

 
SUPPORTED COLUMN TYPES
  • Title (but not in a document library)
  • Single line of text
  • Choice field (but not multi choice)
  • Number
  • Currency
  • Date/ Time
  • Lookup (but not multi value)
  • Person or Group (but not multi value)
UNSUPPORTED COLUMN TYPES
  • Multiple lines of text
  • Hyperlink/Picture
  • Custom Field Types
  • Calculated Field
  • Boolean (yes/no)
  • UI version
  • Checked out to
  • Content type ID 

Summary 

 
Therefore, in this article we have learned what exactly the column indexing is in the database and how it improves the performance of SharePoint list view queries and we have covered the below topics,
  • How does the SharePoint store the list items?
  • What is indexing in database table?
  • Usually without indexing how does the data get fetched from a SQL table?
  • How does indexing column work in SharePoint?
  • Column Indexing is recommended in SQL table for the sake of better performance?
  • Why SharePoint is limited to have 20 indexed column in a list?
  • What are the SharePoint columns types are supported or unsupported for indexing?
References
 
https://docs.microsoft.com/en-us/openspecs/sharepoint_protocols/ms-wssfo3/632a6ea0-1890-48d7-bb10-99cbe80ede2a