Investigating Latches in SharePoint Databases

We all know that SQL Server is the heart of the SharePoint product. Two synchronization objects that affect SQL Server's overall performance are locks and latches. In simple words we can say that locks control the state of data to ensure transactional integrity whereas latches control the location of data, in memory or on disk. These are quite complex constructs and need an in-depth study to understand them. This article is restricted to the understanding of the two latches. 

  • Page Latch: This type of wait occurs on a page when it is already available in memory and the worker thread needs to wait for a page to become available.
  • Page IO Latch: This type of wait occurs due to physical I/O, such as when a page needs to be made available in the buffer pool for reading or writing and SQL Server needs to retrieve it from disk.

To investigate performance problems in SharePoint, there are a few columns in "sys.dm_db_index_operational_stats" as follows that are important:

  • page_latch_wait_count: Cumulative number of times the Database Engine waited, because of latch contention
  • page_latch_wait_in_ms: Cumulative number of milliseconds the Database Engine waited, because of latch contention
  • page_io_latch_wait_count : Cumulative number of times the Database Engine waited on an I/O page latch
  • page_io_latch_wait_in_ms: Cumulative number of milliseconds the Database Engine waited on a page I/O latch

Let's run the following query on SharePoint Administrator and the Content Database. First I am generating some page IO latch that waits by uploading a 50Mb file.

SELECT OBJECT_SCHEMA_NAME(ios.object_id) + '.' + OBJECT_NAME(ios.object_id) as table_name,

i.name as index_name, page_io_latch_wait_count, page_io_latch_wait_in_ms

,CAST(1. * page_io_latch_wait_in_ms / NULLIF(page_io_latch_wait_count ,0) AS decimal(12,2)) AS page_io_avg_lock_wait_ms

FROM sys.dm_db_index_operational_stats (DB_ID(), NULL, NULL, NULL) ios

INNER JOIN sys.indexes i ON i.object_id = ios.object_id AND i.index_id = ios.index_id

WHERE OBJECTPROPERTY(ios.object_id,'IsUserTable') = 1

ORDER BY  page_io_avg_lock_wait_ms DESC

In the Content Database, I get the following results:
 
dbo.AllLists AllLists_PK 4 1506 376.5
dbo.AllWebParts PageUrlID_FK 3 946 315.33
dbo.EventBatches EventBatches_PK 1 261 261
dbo.DocStreams DocStreams_CI 1 249 249
dbo.SolutionResourceUsageWindowed SolutionResourceUsageWindowed_Id 1 243 243
dbo.Features Features_PK 2 402 201
dbo.EventCache EventCache_Id 3 594 198
dbo.SolutionResourceUsageLog SolutionResourceUsageLog_Id 1 195 195
dbo.AllWebParts ListIdUserId_NCI 2 370 185
dbo.AllSites Sites_Id 1 179 179
dbo.AllLists Lists_FullText 1 156 156
dbo.AllDocs AllDocs_ParentId 9 1019 113.22
 
And in the Admin Database, I get the following results:
 
dbo.UserInfo UserInfo_PK 6 4474 745.67
dbo.AllLists AllLists_PK 18 7133 396.28
dbo.AllWebParts ListIdUserId_NCI 17 6015 353.82
dbo.AllDocs Docs_IdLevelUnique 14 4285 306.07
dbo.AllDocs AllDocs_Url 23 5696 247.65
dbo.DatabaseInformation DatabaseInformation_Name 1 244 244
dbo.AllUserData AllUserData_ParentId 53 10577 199.57
dbo.AllLinks Links_Forward 8 1534 191.75
dbo.AllWebs Webs_PK 6 982 163.67
dbo.AllDocs AllDocs_ParentId 20 2992 149.6
dbo.UserInfo UserInfo_SID 6 839 139.83
dbo.AllUserData AllUserData_PK 10 1079 107.9
dbo.Features Features_PK 6 641 106.83
dbo.EventCache EventCache_Id 59 6025 102.12
dbo.WebMembers WebMembers_PK 6 606 101
 
Similarly, you can run the following Page Latch query and check the results.
 

SELECT OBJECT_SCHEMA_NAME(ios.object_id) + '.' + OBJECT_NAME(ios.object_id) as table_name

,i.name as index_name

,page_latch_wait_count

,page_latch_wait_in_ms

,CAST(1. * page_latch_wait_in_ms / NULLIF(page_latch_wait_count ,0) AS decimal(12,2)) AS page_avg_lock_wait_ms

FROM sys.dm_db_index_operational_stats (DB_ID(), NULL, NULL, NULL) ios

INNER JOIN sys.indexes i ON i.object_id = ios.object_id AND i.index_id = ios.index_id

WHERE OBJECTPROPERTY(ios.object_id,'IsUserTable') = 1

ORDER BY 5 DESC
  

 SQl Query
 
So, we get these results. What's next? How does it help in investigating performance problems. Looking at the time required to wait for latches, you can decide if you need to increase the memory or change the buffer pool settings. Are the SQL files saved on a faster drive or if on SANs? Are these SANs fast enough? Probably with SQL Server 2014, in-memory procedures and column store indexes, in the future there will be no need to look at these constructs again, but for that we need to wait for the next SharePoint version.