SQL Server I/O Bottlenecks
A newer version of this post is available at This link
I/O Bottleneck
SQL Server performance highly relies on the Disk performance. SQL Server I/O Bottleneck can be identified through
1. PAGEIOLATCH_xx or WRITELOG wait types in Sysprocesses and other DMV’s
2. I/O taking longer than 15 seconds in SQL Server Errorlog.
{
SQL Server has encountered X
occurrence(s) of I/O requests taking longer than 15 seconds to complete on
file [L:\mssql\data\File.ldf] in database [IOTEST (7). The OS
file handle is 0x000006A4. The offset of the latest long I/O is:
0x000001e616fa00
}
3. By looking at I/O latch wait statistics in sys.dm_os_wait_stats
{
Select wait_type,
waiting_tasks_count,
wait_time_ms
from sys.dm_os_wait_stats
where wait_type like ‘PAGEIOLATCH%’
order by wait_type
}
4. By looking at pending I/O requests and isolating the disks,File and database in which we have I/O Bottleneck.
{
select
database_id,
file_id,
io_stall,
io_pending_ms_ticks,
scheduler_address
from sys.dm_io_virtual_file_stats(NULL, NULL)t1,
sys.dm_io_pending_io_requests as t2
where t1.file_handle = t2.io_handle
}
Following are common reasons for I/O Bottleneck in SQL Server.
1. SQL Server is spawning more I/O requests than what I/O disk subsystem could handle.
2 . There could be an Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem so the Disks are performing very slow and hence SQL Server is affected.
3. Some other process on the system is saturating the disks with I/O requests. Common application includes AV Scan,System Backup Etc. So I/O requests posted by SQL Server becomes slow.
How to troubleshoot?
Disk Perfmon counters can be used to identify which of above three is causing I/O Bottleneck.
Disk Bytes /sec — > Total read and write to disk per second in bytes
Process:IO Data Bytes/Sec –> Total read and write to disk per second in bytes by each process.
Buffer Manager: Page Read/sec + Page Writes/sec –>Total read and write to disk per second in bytes by SQL Server process.
Disk sec/Transfer –> Time taken to perform the I/O operation
Ideal value for Disk sec/Transfer is 0.005-0.015 sec. If you consistently notice this counter is beyond 0.015 then there is a serious I/O bottleneck. Look for Disk Bytes /sec immediately If it is below 150 MB for SAN disk and Below 50 MB for Single disk then the problem is with I/O subsystem Engage hardware vendor.
If (Disk sec/Transfer > ==0.015 Consistently) and ( (Disk Bytes /sec < 150 (For San)) or (Disk Bytes /sec < 50 (For Local) or (Disk Bytes /sec < Speed of disk as per Vendor ))
{
There is Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem.
}
If (Disk sec/Transfer > ==0.015 Consistently) and ( (Disk Bytes /sec >= 150 (For San)) or (Disk Bytes /sec >= 50 (For Local) or (Disk Bytes /sec >= Speed of disk as per Vendor ))
{
Identify the process which is posting excessive I/O request using Process:IO Data Bytes/Sec.
If ( Identified process == SQLServer.exe )
{
Identify and tune the queries which is Spawning excessive I/O.
(Reads+Writes column in profiler, Dashboard reports or sys.dm_exec_query_stats and sys.dm_exec_sql_text
can be used to identify the query). Use DTA to tune the query
}
If ( Identified process != SQLServer.exe )
{
Engage the owner of application which is spawning excessive I/O
}
}
Many thanks to Joseph Pilov from whom I learned many techniques like the one above.
Thanks
Karthick P.K
How to Analyze "Deadlocked Schedulers" Dumps? - Karthick PK 's Blog - Site Home - MSDN Blogs said
[…] taking longer than 15 seconds" messages in Errorlog before Deadlocked Schedulers Dumps. Refer https://mssqlwiki.com/sqlwiki/sql-performance/io-bottlenecks/ for troubleshooting I/O […]
Akshay said
Hi,
Thanks for this article. I have read your other articles as well. Its hard to find such concise and practical information about SQL Server performance tuning..
Do you have good info/links about calculation of SQL Server IO read/write performance for RAID systems? I believe they are different as compared to a local disk connected to the server or SAN.
Thanks again,
Akshay.
Abhay said
Nice Article. This error itself tells us where the problem is but not who is causing this problem 🙂 …When we go to the SAN guys they show us the picture which proves that there is no issue.
Regards
Abhay