SharePoint 2010 Internals - Series 2


Introduction

We will see some more important SharePoint 2010 internals in this paper. These will help the SharePoint developer, reviewer and architect to use the correct design, coding structure, troubleshoot issues, fix bugs and help in performance tuning.

Indexing Columns

Normally to increase performance on data retrieval from a relational database we build indexes on columns that are frequently used in queries to retrieve data. As seen in the previous article (Series 1) - for a content database, SharePoint uses a single database table, AllUserData, to store all items for all SharePoint lists and document library list items. Since every list is stored in this table and every list would have different columns with different index settings it will be impossible to define the database column indices on the custom columns of a SharePoint list on this table. 

So let us see how indexing on columns is done in a list or library work.

A couple to things to note before we proceed -

  • The indexing on list columns are different than the indexing by Search Service Application.
  • The list column indexes are maintained by SharePoint and not by SQL Server. i.e. a SQL Server index is not created when we create an index on a SharePoint list column.

Once we create an index all values will be put in the NameValuePair table as rows and SharePoint will use these entries in a join to the AllUserData table to improve data retrieval time. If there are multiple language packs installed then there will be a NameValuePair table for each language like shown below.

img1.jpg

NameValuePair and NameValuePair [Collation] Table

Depending on the data type of the indexed column the values are managed either by NameValuePair table or NameValuePair[Collation] table. The name depends on the collation order name.

If the default collation order is used then the NameValuePair_Latin1_General_CI_AS is used to store textual list item data for indexed fields i.e. single line text fields and choice fields. All other data types for indexed fields are stored in the NameValuePair table.

These are defined as follows and shown in the figure below:

TABLE NameValuePair (
SiteId uniqueidentifier NOT NULL,
WebId uniqueidentifier NOT NULL,
ListId uniqueidentifier NOT NULL,
ItemId int NOT NULL,
Level tinyint DEFAULT 1 NOT NULL,
FieldId uniqueidentifier NOT NULL,
Value sql_variant
);

All columns are the same in the other table except the Value column which is defined as:

Value nvarchar(255) COLLATE Latin1_General_CI_AS. The datatype of this column depends on the language pack installed on that Web Front End Server.

img2.jpg

  • SiteId: The site collection identifier of the site collection containing the list item.
  • WebId: The site identifier of the site containing the list item.
  • ListId: The list identifier of the list containing the list item.
  • ItemId: The item identifier of the list item.
  • Level: The publishing level of the list item.
  • FieldId: The field identifier of an indexed field of the list item.
  • Value: The value of the indexed field specified by the FieldId column.

Supported Column Types

The following is a list of column types that can be indexed:

  • Single line of text
  • Choice field (but not multichoice)
  • Number
  • Currency
  • Date/ Time
  • Lookup (but not multivalue). MultiValue columns are stored as strings and not as arrays. Values are stored as Value1, Value2, Value 3 and so on. So each value cannot be treated individually and hence indexing on this type is not supported.
  • Person or Group (but not multivalue)
  • Title (but not in a document library)

Unsupported Column Types

The following is a list of column types that cannot be indexed:

  • Multiple lines of text
  • Hyperlink/Picture
  • Custom Field Types
  • Calculated Field
  • Boolean (yes/no)
  • Modified by
  • Modified time
  • UI version
  • Created time
  • Checked out to
  • Content type ID

The number of rows in these tables would depend on the number of indexed columns and the number of items in the list. So if 5 columns are indexed on list A which has 2000 rows, the number of rows in the NameValuePair table will be 2000*5 = 10000 rows. Hence we should choose the indexed columns carefully to maximize query performance while avoiding unnecessary overhead.

For ex: Let's say we have a List called News with the following structure and it has 5 rows in it.

img3.jpg

If we create an index on column Title we see that 5 rows are added in the NameValuePair_Latin1_General_CI_AS table (since the data type is text) as shown below.

img4.jpg

Let us create an index on column the Created of type Date/Time. Two additional columns will be seen in the dropdown when you click Indexed columns.
This time 5 rows will be added in the NameValuePair table.

Compound Index

Compound index is based on two columns. Compound indexes are useful when queries are commonly performed on two columns because a query on just one column might not be selective enough. Compound indexes are not utilized by views. However, they are utilized by metadata navigation. Metadata navigation and filtering includes a process that creates list indices automatically, depending on the fields that are promoted as navigational fields in the list.

  • Single column indices are created on all supported key filter fields, with the exception of the Content Type field and the Choice field.
  • Compound indices are created on all supported combinations of navigation hierarchies and key filters.
  • Compound indexes are even more restricted. Visit this for more details: Metadata Navigation and Filtering.

Performance Considerations

Maintaining the index adds processing to creating, updating, or deleting items from a list, and the index itself requires storage space. A list instance supports a maximum of 20 indices like shown below.

last.jpg

Some SharePoint features such as Metadata navigation and filtering requires indices and cannot be enabled on a list where there is no index slot remaining.

After reading these 2 posts we can now be very clear that SharePoint content database is not a relational database. It should not be used for high-volume transactional processing because of the following reasons:

  • Each data element is stored in a single table.
  • Database indices are managed using a second table that is then joined to the main table.
  • Concurrent access to different lists is difficult since the data comes from the same table.

Hence we should use external databases for relational database purposes.

In the future we would look at some more internals.