SharePoint SQL Query to Get Details on Disk Utilization and IO

The SharePoint SQL Server performance depends heavily on physical memory available. SQL Server constantly brings database pages in and out of the buffer pool. This causes significant I/O traffic. The log records need to be flushed to the disk before a transaction can be declared to be committed. SQL Server uses TempDB to store intermediate results, to sort these results, to keep row versions and so on. Hence a good I/O subsystem is critical to the performance of SQL Server.

The sys.dm_io_virtual_file_stats DMV gives you details at the IO for both reads and writes as shown in the figure below. The sys.dm_io_virtual_file_stats DMV will show an IO stall when any wait occurs to access a physical data file. Run this on the WSS_Content database:

SELECT *
FROM sys.dm_io_virtual_file_stats (NULL,NULL)
ORDER BY io_stall_read_ms, io_stall_write_ms

IO stalls are recorded at the file level and you can also obtain the IO Stalls at the database level directly out of the DMV. These values are reset when the server restarts.

Result-of-sys.dm_io_virtual_file_stats.jpg

Figure: Result of sys.dm_io_virtual_file_stats

You can get the database name instead of the ids as above by using the following query and get the result as shown in the figure below.

select db_name(a.database_id) AS [DB Name],

a.io_stall, a.io_stall_read_ms, a.num_of_reads, a.io_stall_write_ms,

a.num_of_writes,

( ( a.size_on_disk_bytes / 1024 ) / 1024.0 ) AS size_on_disk_mb

from sys.dm_io_virtual_file_stats(NULL,NULL)  a

JOIN sys.master_files b ON a.file_id = b.file_id

AND a.database_id = b.database_id

ORDER BY a.io_stall DESC

SQL-Query-to-get-IO-details.jpg

Figure: SQL Query to get I/O details

Your users may experience performance problems such as slow response times and tasks that abort due to timeouts if you have a slow I/O subsystem. You can use the following performance counters to identify I/O bottlenecks. But you should not rely on one counter to determine a bottleneck. You need to look for multiple counters to cross check the validity of your findings. You may need to co-ordinate with you Network administrators to understand the details of each of readings.
 
In the example above look at the two columns:
  1. IO_Stall_Read_ms: This shows the total number of milliseconds spent in a wait to read.
  2. IO_Stall_Write_ms: This shows the total number of milliseconds spent in a wait to write.

You can interpret the numbers in this columns as follows:

  • Less than 10 ms - very good
  • Between 10 - 20 ms - okay
  • Between 20 - 50 ms - slow, needs attention
  • Greater than 50 ms - Serious I/O bottleneck