How to Analyze Deadlocked Schedulers Dumps?
Posted by Karthick P.K on June 15, 2010
How to Analyze "Deadlocked Schedulers" Dumps?
Do you see "Deadlocked Schedulers" errors similar to one below and stuck?
From SQL Server Errorlog
**Dump thread – spid = 0, PSS = 0×0000000000000000, EC = 0×0000000000000000
***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0001.txt
* BEGIN STACK DUMP:
* Deadlocked Schedulers
* Short Stack Dump
Stack Signature for the dump is 0x00000000000003D0
New queries assigned to process on Node 0 have not been picked up by a worker thread in the last 60 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 0%. System Idle: 69%.
New queries assigned to process on Node 3 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 6%. System Idle: 90%.
We get Deadlocked Schedulers error (New queries assigned to process on Node n have not been picked up by a worker thread in the last 300 seconds) and dump when Scheduler Monitor detects threads(workers) are not Progressing on schedulers of one (or) all nodes for 60 seconds (300 seconds in Denali) and there are new work request (tasks) but there is no idle threads to pick up the the work request and SQL Server could not spawn new thread to process the new work request (Max worker threads reached).
Some of common causes are most of the tasks are waiting on a single resource because of resource bottleneck, excessive blocking, excessive parallelism, very long running Queries executed by all workers, all threads waiting on trace write waits etc..
SQL Server will not accept new connections or cannot process new requests when there is deadlocked scheduler condition (Exception: deadlocked condition on single node).
Note: In systems with multiple nodes (NUMA) If all the threads which belong to schedulers of single node is exhausted (or) Schedulers not progressing on single node can cause deadlocked scheduler condition.
Detailed explanation about dead lock schedulers can be found in This blog from my mentor J.P
Steps to analyze "Deadlocked Schedulers" Dumps.
To analyze the dump download and Install Windows Debugger from This link
Open Windbg . Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)
on command window type
Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.
Verify if symbols are loaded for SQL Server by using the debugger command lmvm
0:002> lmvm sqlservr
start end module name
00000000`01000000 00000000`03679000 sqlservr T (pdb symbols) c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb
Loaded symbol image file: sqlservr.exe
Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe
Image name: sqlservr.exe
Timestamp: Wed Oct 07 21:15:52 2009 (4ACD6778)
File version: 2005.90.4266.0
Product version: 9.0.4266.0
File flags: 0 (Mask 3F)
File OS: 40000 NT Base
File type: 1.0 App
File date: 00000000.00000000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Type ~*kL 20 and look at the stack of all the threads to find what majority of threads are doing.
Note: If you find most of your threads are waiting on stack which is not listed below please paste the stack in comments session of this blog (or) In MSSQLWIKI face book group we will try to answer you. If you don’t get prompt reply from the community, you may need to open a support ticket with Microsoft.
1. If it is blocking issue and If most of the threads are waiting to acquire a lock you will find the most of the stack similar to one below. (We try to acquire lock and go to wait, since someone is holding a lock)
2. If most of threads are stuck while trying to write profiler events to the destination you might find stack similar to one below
3. If your stack’s are like one below refer http://support.microsoft.com/default.aspx?scid=kb;EN-US;974205
sqlservr!SpinlockBase::Sleep sqlservr!SpinlockBase::SpinToAcquire sqlservr!TSyncHashTable_EntryAccessorsqlservr!CQSIndexStatsMgr::AddNewMissingIndex sqlservr!CIdxSuggestion::Register sqlservr!COptExpr::PqteConvert sqlservr!CPhyOp_Top::PqteConvert sqlservr!COptExpr::PqteConvert sqlservr!COptExpr::PqteConvertTree sqlservr!COptContext::PcxteOptimizeQuery sqlservr!CQuery::Optimize sqlservr!CQuery::PqoBuild sqlservr!CStmtQuery::InitQuery sqlservr!CStmtSelect::Init
4. If you see many stacks like the one below it could be BPOOL memory pressure (or) Lazy writer waiting on I/O
5. If you see many stacks like the one below it should be because of excessive parallelism
6. If you see many stacks like the one below (Many threads waiting to flush log) it should be because of disk bottleneck’s. Check if you see "I/O requests taking longer than 15 seconds" messages in Errorlog before Deadlocked Schedulers Dumps. Refer Troubleshooting sql server I/O for troubleshooting I/O issues.
- How to analyze QL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion
- How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……
- Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes
- SQL Server generated Access Violation dumps while accessing oracle linked servers.
- SQL Server Latch & Debugging latch time out
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.