How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……
Posted by Karthick P.K on August 17, 2012
Note: If you are interested only in finding a quick resolution for Non-Yielding scheduler or Non-yielding IOCP Listener dumps or Non-yielding resource monitor Jump to THIS LINK. Continue reading this article if you like to learn How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps
How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps?
This blog is targeted towards analyzing Non-Yielding scheduler dumps and not targeted on explaining how Non-Yield Detection works please read http://technet.microsoft.com/en-us/library/cc917684.aspx to understand how the Non-Yield Detection works but let us recollect few key points before we get in to analysis.
1. SQL Server has its own logical schedulers to schedule the SQL Server workers.
2. The scheduler is called the User Mode Scheduler (UMS) in SQL Server 2000 and the SQL Server Operating System (SOS) Scheduler in SQL Server 2005
3. Logical scheduler makes the worker non-preemptive to the database engine. The worker owns the scheduler until it yields to another worker on the same scheduler.
What if the threads which owns the scheduler executes for long time without yielding (or) forever and does not yield to give quantum for the other threads waiting in the scheduler?
Answer: Other threads would not get CPU cycles and starve the SQL Server performance.
What if the thread is not able to finish its work with in quantum –(4 Milliseconds) for example large for loop?
SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) is expected to yield back to SQL Server scheduler when it exceeds 4ms and rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum.
What if the thread did not yield after 4 Milliseconds?
SQL Server has its scheduler monitor to track this. SchedulerMonitor algorithm is to check non-Yield condition every 5 seconds during which the basic check (Check if the thread is executing for >4Ms) is done . When the basic check evaluates to true, tracking of the worker begins and if the thread doesn’t yield beyond 10seconds (Nonyield threshold) after the tracking begins then threshold check becomes true. So there is approximately 15 seconds between the time of the last yield on the scheduler and the time that the threshold check becomes true and tracking continues.
A dump is taken when an specific nonyield situation has reached 60 seconds in total duration. Once a 17883 mini-dump is captured, no further 17883 mini-dumps are captured until trace flag -T1262 is enabled or the SQL Server process is restarted. However, 17883 error message reporting continues, regardless of the mini-dump capture. Also when –T1262 is enabled mini-dump is captured when the Non-Yield threshold check becomes true (15 seconds)
and at subsequent 60-second intervals for the same nonyield occurrence. A new nonyielding occurrence causes dump captures to occur again.
When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure also we can compare the current stack of the thread with the one which is copied and check if the thread is progressing.
Non-yielding IOCP Listener
An identical algorithm is used to detect non-yielding I/O completion routines, counting completed I/O completion routines instead of number of yields. Scheduler Monitor takes a dump when it notices the IOCP has not moved for 10 seconds. Analyzing Non-yielding IOCP Listener is also same as analyzing non-yielding scheduler dump
Let us step in to analysis of non-yielding scheduler dump which I got in SQL Server 2012
Sample 1
When a non-yielding scheduler dump is generated following error message is logged in SQL Error log and SQLDump000n.mdmp is generated in log folder.
{
* *******************************************************************************
*
* BEGIN STACK DUMP:
* 04/16/12 10:09:58 spid 6256
*
* Non-yielding Scheduler
*
* *******************************************************************************
Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0. Thread creation time: 12979065797278. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 70110 ms.
}
To analyze the dump download and Install Windows Debugger from This link
Step 1:
Open Windbg . Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)
Microsoft (R) Windows Debugger Version 6.11.0001.404 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\Users\karthick \Desktop\Karthick\SQLDump0009.mdmp]
User Mini Dump File: Only registers, stack and portions of memory are available
Comment: ‘Stack Trace’
Comment: ‘Non-yielding Scheduler’ èType of the dump
Symbol search path is: *** Invalid ***
Executable search path is:
Windows 7 Version 7601 (Service Pack 1) MP (24 procs) Free x64
Product: Server, suite: Enterprise TerminalServer SingleUserTS à Windows version and system information
Machine Name:
Debug session time: Mon Apr 16 09:09:59.000 2012 (GMT-7)
System Uptime: 9 days 15:57:03.155
Process Uptime: 0 days 0:06:48.000
……………………………………………………….
……………………………..
Step 2:
on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;
Step 3:
Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.
Step 4:
Verify if symbols are loaded for SQL Server by using the debugger command lmvm
0:146> lmvm sqlservr
start end module name
00000000`ffad0000 00000000`ffb0e000 sqlservr T (pdb symbols) c:\websymbols\sqlservr.pdb\21553ADC31784A4D933974A386EE2E052\sqlservr.pdb
Loaded symbol image file: sqlservr.exe
Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.S1\MSSQL\Binn\sqlservr.exe
Image name: sqlservr.exe
Timestamp: Fri Apr 06 08:19:38 2012 (4F7F098A)
CheckSum: 00036498
ImageSize: 0003E000
File version: 2011.110.2316.0
Product version: 11.0.2316.0 èSQL Server Version
File flags: 0 (Mask 3F)
File OS: 40000 NT Base
File type: 1.0 App
File date: 00000000.00000000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4
Step 5:
Use !findstack command to find scheduler monitor thread (sqlservr!SQL_SOSNonYieldSchedulerCallback )
0:146> !findstack sqlservr!SQL_SOSNonYieldSchedulerCallback
Thread 006, 1 frame(s) match è Thread ID of scheduler monitor.
* 07 00000000336be420 000007fee36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f
Step 6:
Switch to scheduler monitor thread using ~[threadID]s command
0:146> ~[006]s
ntdll!NtWaitForSingleObject+0xa:
00000000`76d3135a c3 ret
Step 7:
Use kC or kP command to look at the stack on scheduler monitor thread.
0:006> kP Child-SP RetAddr Call Site
00000000`3369c218 000007fe`fcd210ac ntdll!NtWaitForSingleObject+0xa
00000000`3369c220 00000000`ffaeecce KERNELBASE!WaitForSingleObjectEx+0x79
00000000`3369c2c0 00000000`ffaef1a4 sqlservr!CDmpDump::DumpInternal+0x20e
00000000`3369c360 000007fe`dbe50794 sqlservr!CDmpDump::Dump+0x24
00000000`3369c3a0 000007fe`dbe511e6 sqllang!SQLDumperLibraryInvoke+0x2e4
00000000`3369c640 000007fe`dbe16ddb sqllang!CImageHelper::DoMiniDump+0x426
00000000`3369c830 00000000`ffae307f sqllang!stackTrace+0xbdb
00000000`3369e270 000007fe`e36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f
00000000`336be430 000007fe`e36866da sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0x375
00000000`336bebf0 000007fe`e364b53f sqldk!SchedulerMonitor::CheckScheduler+0x307
00000000`336bed60 000007fe`e364aa8f sqldk!SchedulerMonitor::CheckSchedulers+0x211
00000000`336bf1f0 000007fe`e371c779 sqldk!SchedulerMonitor::Run+0xfb
00000000`336bf320 000007fe`e3642f10 sqldk!SchedulerMonitor::EntryPoint+0x9
00000000`336bf350 000007fe`e3642d04 sqldk!SOS_Task::Param::Execute+0x21e
00000000`336bf950 000007fe`e36429e6 sqldk!SOS_Scheduler::RunTask+0xa8
00000000`336bf9c0 000007fe`e365a29f sqldk!SOS_Scheduler::ProcessTasks+0x299
00000000`336bfa40 000007fe`e365a3b0 sqldk!SchedulerManager::WorkerEntryPoint+0x261
00000000`336bfae0 000007fe`e3659fcf sqldk!SystemThread::RunWorker+0x8f
00000000`336bfb10 000007fe`e365aaf8 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8
00000000`336bfbc0 00000000`76ad652d sqldk!SchedulerManager::ThreadEntryPoint+0x236
Step 8:
Switch to the thread which is reported as Non-Yielding in SQL Server error log using ~~[ThreadID]s command.
Recollect the below error in SQL error log. Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0.
0:006> ~~[0x1cb0]s
ntdll!NtWaitForSingleObject+0xa:
00000000`76d3135a c3 ret
Step 9:
Look at the current stack of Non-yielding thread. using kc command
0:146> kc 10
Call Site
ntdll!NtWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx
sqldk!SOS_Scheduler::SwitchContext
sqldk!SOS_Scheduler::SuspendNonPreemptive
sqldk!WorkDispatcher::DequeueTask
sqldk!SOS_Scheduler::ProcessTasks
sqldk!SchedulerManager::WorkerEntryPoint
sqldk!SystemThread::RunWorker
sqldk!SystemThreadDispatcher::ProcessWorker
sqldk!SchedulerManager::ThreadEntryPoint
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart
Recollect the information which we discussed earlier in this blog. When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure also we can compare the current stack of the thread with the one which is copied and check if the thread is progressing.
Look at the above stack it cannot be Non-Yield thread because we see SwitchNonPreemptive and SwitchContex in the thread.
{
SwitchPreemptive or SuspendNonPreemptive forces another worker to become owner of the scheduler. It does this by making the head of the runnable list the new owner and removing the current worker from logical scheduler control. The worker transitions ownership and is removed from SQL scheduler control until the external activity is complete. When the external activity is complete, the worker returns to the end of the runnable list by calling SwitchNonPreemptive.
}
Step 10:
Search for the copied stack structure using X commad
0:146> X sqlmin!*copiedStack*
000007fe`df11bfe0 sqlmin!g_copiedStackInfo = <no type information>
It is sqlmin!g_copiedStackInfo in this dump because this is dump is from SQL Server2012. In earlier versions of SQL Server it is sqlservr!g_copiedStackInfo
Step 11:
We know copied CONTEXT is stored in g_CopiedStackInfo how to find the valid offset in this structure ? If the CONTEXT is valid Rip,Rbp and RSP registers has to be valid if the dump is from X64 system and Eip,Ebp and Esp has to be valid if it is X86 system.
Let us do dd on sqlmin!g_copiedStackInfo (remember it is sqlservr!g_copiedStackInfo in SQL2008/2005/2000)
0:146> dd sqlmin!g_copiedStackInfo
000007fe`df11bfe0 00000001 00000000 3369e2e0 00000000
000007fe`df11bff0 0000a998 00000000 00000000 00000000
000007fe`df11c000 00000000 00000000 00000000 00000000
000007fe`df11c010 00000000 00000000 00000000 00000000
000007fe`df11c020 00000000 00000000 00000000 00000000
000007fe`df11c030 0010000b 00001f80 00000033 00000000
000007fe`df11c040 002b0000 00000246 00000000 00000000
000007fe`df11c050 00000000 00000000 00000000 00000000
Step 12:
Let us dump each address with context and verify if Rip,Rbp and RSP registers are valid. This dump is from 64-bit SQL Server so we are using Rip,Rbp and RSP registers. If the dump is from x86 system use Eip,Ebp and Esp .
0:146> dt 000007fe`df11bfe0 CONTEXT Rip Rsp Rbp èType cast 000007fe`df11bfe0 with CONTEXT . RsP,Rbp and Rip is invalid so this is not valid offset.
ole32!CONTEXT
+0x098 Rsp : 2
+0x0a0 Rbp : 0x5a4d
+0x0f8 Rip : 0
0:146> dt 000007fe`df11bff0 CONTEXT Rip Rsp Rbp è Type cast 000007fe`df11bff0 with CONTEXT . RIP is invalid
ole32!CONTEXT
+0x098 Rsp : 0x72120000
+0x0a0 Rbp : 0x3369e3cc
+0x0f8 Rip : 0xf2
0:146> dt 000007fe`df11c000 CONTEXT Rip Rsp Rbp è Type cast 000007fe`df11c000 with CONTEXT . RIP ,RSP and Rbp is valid. So this should be a valid context.Let us switch to this context and verify
ole32!CONTEXT
+0x098 Rsp : 0x3369e2e0
+0x0a0 Rbp : 0x3369e498
+0x0f8 Rip : 0x76d3139a
Now we know 000007fe`df11c000 is valid context. So 000007fe`df11c000 -sqlmin!g_copiedStackInfo =0x20 hence we can use .cxr sqlmin!g_copiedStackInfo+0X20 directly to switch to the context of copied stack.
Step 13:
Switch the context of copied stack using .cxr 000007fe`df11c000 or .cxr sqlmin!g_copiedStackInfo+0X20
0:146> .cxr 000007fe`df11c000
rax=0000000000000002 rbx=000000003369e3cc rcx=0000000000005a4d
rdx=0000000072120000 rsi=000000000000026c rdi=0000000000000000
rip=0000000076d3139a rsp=000000003369e2e0 rbp=000000003369e498
r8=00000000000000b0 r9=0000000084a85310 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000004
r14=00000000000000f2 r15=0000000000000001
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246
ntdll!NtWriteFile+0xa:
00000000`76d3139a c3 ret
Step 14:
Dump the stack of copied context using kP or kc (kc displays clean stack trace. each display line includes only the module name and the function name)
0:146> Kc
*** Stack trace for last set context – .thread/.cxr resets it
Call Site
ntdll!NtWriteFile
KERNELBASE!WriteFile
kernel32!WriteFileImplementation
sqllang!CErrorReportingManager::WriteToErrLog
sqllang!CErrorReportingManager::SendErrorToErrLog
sqllang!CErrorReportingManager::CwchFormatAndPrint
sqllang!ReportLoginFailure
sqllang!FRedoLogin
sqllang!login
sqllang!process_login_finish
sqllang!process_commands
sqldk!SOS_Task::Param::Execute
sqldk!SOS_Scheduler::RunTask
sqldk!SOS_Scheduler::ProcessTasks
sqldk!SchedulerManager::WorkerEntryPoint
sqldk!SystemThread::RunWorker
sqldk!SystemThreadDispatcher::ProcessWorker
sqldk!SchedulerManager::ThreadEntryPoint
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart
Now compare the current stack and the copied stack to see if the thread has progressed after No-Yield condition. Stack look completely different So the Non-Yield thread has progressed and completed .It is doing new work now. Also to understand why the thread was Non-Yielding look at the copied stack and not the current unless both the stacks are same.
Current thread stack which we dumped using the thread ID in SQL Errorlog. 0:146> kc |
Copied thread stack which SQL Server copied to global structure before generating the dump. 0:146> Kc 10 |
ntdll!NtWaitForSingleObject |
KERNELBASE!WriteFile |
KERNELBASE!WaitForSingleObjectEx |
kernel32!WriteFileImplementation |
sqldk!SOS_Scheduler::SwitchContext |
sqllang!CErrorReportingManager::WriteToErrLog |
sqldk!SOS_Scheduler::SuspendNonPreemptive |
sqllang!CErrorReportingManager::SendErrorToErrLog |
sqldk!WorkDispatcher::DequeueTask |
sqllang!CErrorReportingManager::CwchFormatAndPrint |
sqldk!SOS_Scheduler::ProcessTasks |
sqllang!ReportLoginFailure |
sqldk!SchedulerManager::WorkerEntryPoint |
sqllang!FRedoLogin |
sqldk!SystemThread::RunWorker |
sqllang!login |
sqldk!SystemThreadDispatcher::ProcessWorker |
sqllang!process_login_finish |
sqldk!SchedulerManager::ThreadEntryPoint |
sqllang!process_commands |
kernel32!BaseThreadInitThunk |
sqldk!SOS_Task::Param::Execute |
ntdll!RtlUserThreadStart |
sqldk!SOS_Scheduler::RunTask |
sqldk!SOS_Scheduler::ProcessTasks |
|
sqldk!SchedulerManager::WorkerEntryPoint |
|
sqldk!SystemThread::RunWorker |
|
sqldk!SystemThreadDispatcher::ProcessWorker |
|
sqldk!SchedulerManager::ThreadEntryPoint |
Now let us read the copied stack and understand what would have caused a Non-Yield condition (read from bottom to top)
ntdll!NtWriteFile -> WriteFile function is at top of the stack and did not complete in expected time.
KERNELBASE!WriteFile
kernel32!WriteFileImplementation
sqllang!CErrorReportingManager::WriteToErrLog ->Write the error to errorlog
sqllang!CErrorReportingManager::SendErrorToErrLog ->Send the error to SQL Server errorlog
sqllang!CErrorReportingManager::CwchFormatAndPrint ->format the error
sqllang!ReportLoginFailure ->Login failed
sqllang!FRedoLogin
sqllang!login ->Login task is processed
From the above stack we are able to understand SQL Server is writing login failed information to SQL Error log (Synchronously) and the writefile function has taken long time and did not return.So there is Non-Yield scheduler dump.
When will writefile operation take long time?
When there is Disk bottleneck. So the obvious solution for this issue is to fix the performance of the disk.
Similarly there could be numerous other reasons for Non-Yield condition so look at the stack of your Non-Yield scheduler dump using the method above and make out what could have caused the Non-Yield condition.
Also refer THIS LINK to check if your stack matches with any of the known issues in SQL Server.
To Be continued…………………………
Related posts:
If you liked this post, do like us on Face Book at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/
Thank you,
Karthick P.K |My Facebook Page |My Site| Blog space| Twitter
AZAD MARATHA said
Crisp and Clear Explanation. Thanks Karthick
Meditation: Is IOCP listener actually listening? - SQL Meditation - Site Home - MSDN Blogs said
[…] There are some great tips on debugging non-yielding IOCP listener scenarios on MSSQLWiki blog […]
RV said
KK,
IOCP Listener dump:
2012-12-16 03:15:37.20 Server Using ‘dbghelp.dll’ version ‘4.0.5’
2012-12-16 03:15:37.54 Server **Dump thread – spid = 0, PSS = 0x0000000000000000, EC = 0x0000000000000000
2012-12-16 03:15:37.59 Server ***Stack Dump being sent to xxxxxxxxxxxxxxxxxx\SQLDump0011.txt
2012-12-16 03:15:37.61 Server * *******************************************************************************
2012-12-16 03:15:37.61 Server *
2012-12-16 03:15:37.61 Server * BEGIN STACK DUMP:
2012-12-16 03:15:37.61 Server * 12/16/12 03:15:37 spid 0
2012-12-16 03:15:37.61 Server *
2012-12-16 03:15:37.61 Server * Non-yielding IOCP Listener
2012-12-16 03:15:37.61 Server *
2012-12-16 03:15:37.61 Server * *******************************************************************************
2012-12-16 03:15:37.61 Server * ——————————————————————————-
2012-12-16 03:15:37.61 Server * Short Stack Dump
2012-12-16 03:15:47.68 Server Stack Signature for the dump is 0x0000000000000038
2012-12-16 03:20:38.10 Server External dump process return code 0x20002001.
The error information has been submitted to Watson error reporting.
2012-12-16 16:15:18.45 spid2s LazyWriter: warning, no free buffers found.
2012-12-16 16:15:20.19 spid2s
Karthick P.K said
Can you please post the stack
Sonu Shah said
Hi Karthick, Can you help me how to proceed with the Exception Analysis analysis for `Stalled IOCP Listener`. (SQL_SOSNonYieldIOCPCallback)
Here is the stack
COMMENT: Stack Trace
NTGLOBALFLAG: 0
PROCESS_BAM_CURRENT_THROTTLED: 0
PROCESS_BAM_PREVIOUS_THROTTLED: 0
CONTEXT: (.ecxr)
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up di pl nz na pe nc
cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
00000000`00000000 ?? ???
Resetting default scope
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ff973d66d73 (sqllang!stackTrace+0x0000000000000103)
ExceptionCode: 00000000
ExceptionFlags: 00000000
NumberParameters: 0
PROCESS_NAME: sqlservr.exe
FAULTING_THREAD: 00004a34
STACK_TEXT:
00000000`00000000 00000000`00000000 sqlservr.exe!unknown_error_in_process+0x0
00000000`00000000 00000000`00000000 unknown![.ecxr]+0x0
0000002f`8d1dcb68 00007ff9`92955d04 ntdll!NtWaitForSingleObject+0x0
0000002f`8d1dcb70 00007ff9`8fbb4d7f KERNELBASE!WaitForSingleObjectEx+0x0
0000002f`8d1dcc10 00007ff7`8dab036f sqlservr!CDmpDump::DumpInternal+0x0
0000002f`8d1dcd10 00007ff7`8daaff84 sqlservr!CDmpDump::Dump+0x0
0000002f`8d1dcd50 00007ff9`730c98ad sqllang!SQLDumperLibraryInvoke+0x0
0000002f`8d1dcd90 00007ff9`73d9d481 sqllang!SQLLangDumperLibraryInvoke+0x0
0000002f`8d1dce50 00007ff9`73d9df34 sqllang!CImageHelper::DoMiniDump+0x0
0000002f`8d1dd060 00007ff9`73d6760e sqllang!stackTrace+0x0
0000002f`8d1dea90 00007ff7`8da96b73 sqlservr!SQL_SOSNonYieldIOCPCallback+0x0
0000002f`8d1feb10 00007ff9`6f7514e3 sqldk!SOS_OS::ExecuteNonYieldIOCPCallbacks+0x0
0000002f`8d1feda0 00007ff9`6f6f4273 sqldk!SchedulerMonitor::CheckIoCompletionHangs+0x0
0000002f`8d1fef10 00007ff9`6f6e0370 sqldk!SchedulerMonitor::Run+0x0
0000002f`8d1ff060 00007ff9`6f7ba719 sqldk!SchedulerMonitor::EntryPoint+0x0
0000002f`8d1ff090 00007ff9`6f6d62fd sqldk!SOS_Task::Param::Execute+0x0
0000002f`8d1ff690 00007ff9`6f6d6a08 sqldk!SOS_Scheduler::RunTask+0x0
0000002f`8d1ff700 00007ff9`6f6d67fd sqldk!SOS_Scheduler::ProcessTasks+0x0
0000002f`8d1ff7f0 00007ff9`6f6ff938 sqldk!SchedulerManager::WorkerEntryPoint+0x0
0000002f`8d1ff8c0 00007ff9`6f6ffa40 sqldk!SystemThread::RunWorker+0x0
0000002f`8d1ff8f0 00007ff9`6f6ff1a7 sqldk!SystemThreadDispatcher::ProcessWorker+0x0
0000002f`8d1ff990 00007ff9`6f6ff6f9 sqldk!SchedulerManager::ThreadEntryPoint+0x0
0000002f`8d1ffa40 00007ff9`91f484d4 kernel32!BaseThreadInitThunk+0x0
0000002f`8d1ffa70 00007ff9`92901791 ntdll!RtlUserThreadStart+0x0
SYMBOL_NAME: sqlservr.exe!unknown_error_in_process
MODULE_NAME: sqlservr
IMAGE_NAME: sqlservr.exe
STACK_COMMAND: .ecxr ; kb ; ** Pseudo Context ** Pseudo ** Value: ffffffff ** ; kb
EXCEPTION_CODE_STR: 0
FAILURE_BUCKET_ID: SQLSERVER_NON_YIELDING_IOCP_INVALID_COPIED_STACK_0_sqlservr.exe!unknown_error_in_process
OS_VERSION: 10.0.14393.5066
BUILDLAB_STR: rs1_release
OSPLATFORM_TYPE: x64
OSNAME: Windows 8
IMAGE_VERSION: 2017.140.3335.7
FAILURE_ID_HASH: {c00aed11-3b4e-2876-9e78-3f138b6bd9b3}
Followup: MachineOwner
———
Sonu Shah said
after above: I ran “!findstack sqlservr!SQL_SOSNonYieldIOCPCallback”
and the output was
”
Thread 006, 1 frame(s) match
* 08 0000002f8d1feb00 00007ff96f7514e3 sqlservr!SQL_SOSNonYieldIOCPCallback+0xf3
”
then I ran “~[006]s”
and the output was
”
ntdll!NtWaitForSingleObject+0x14:
00007ff9`92955d04 c3 ret
”
then I ran “kP Child-SP RetAddr Call Site”
and output was
”
Couldn’t resolve error at ‘Child-SP RetAddr Call Site’
“
How to Analyze "Deadlocked Schedulers" Dumps? « MSSQLWIKI said
[…] How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps …… […]
Ravishankar said
Hi Karthick,
Thank you for the nice explanation on Non-Yielding scheduler
In our clustered environment one of our node was unresponsive caused a minidump
while following your Step 11, I am getting following results, and unable to continue, Can you please give me suggestion to continue from here…
0:088> kc 10
Call Site
ntdll!NtSignalAndWaitForSingleObject
kernel32!SignalObjectAndWait
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::SuspendNonPreemptive
sqlservr!WorkDispatcher::DequeueTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!endthreadex
msvcr80!endthreadex
kernel32!BaseThreadStart
0:088> X sqlservr!*copiedStack*
00000000`01b311a0 sqlservr!IDmpDump::AddMemory =
00000000`01b33190 sqlservr!CopiedStackInfo::CopiedStackInfo =
00000000`01b306a0 sqlservr!DmpPointer::DmpPointer =
00000000`032dbb30 sqlservr!g_copiedStackInfo =
00000000`01b311c0 sqlservr!DMP_MAKE_POINTER =
00000000`01b306b0 sqlservr!DmpPointer::GetRawPointer =
0:088> dd sqlservr!g_copiedStackInfo
00000000`032dbb30 ???????? ???????? ???????? ????????
00000000`032dbb40 ???????? ???????? ???????? ????????
00000000`032dbb50 ???????? ???????? ???????? ????????
00000000`032dbb60 ???????? ???????? ???????? ????????
00000000`032dbb70 ???????? ???????? ???????? ????????
00000000`032dbb80 ???????? ???????? ???????? ????????
00000000`032dbb90 ???????? ???????? ???????? ????????
00000000`032dbba0 ???????? ???????? ???????? ????????
0:088> dt 00000000`032dbb30 CONTEXT Rip Rsp Rbp
ATL80!CONTEXT
+0x098 Rsp : ??
+0x0a0 Rbp : ??
+0x0f8 Rip : ??
Memory read error 00000000032dbc28
Sql server :2005 SP2
Running on x64 Windows 2003 OS.
Karthick P.K said
use .cxr sqlservr!g_copiedStackInfo+0X20 (or) .cxr 00000000`032dbb50 and then kC
Karthick P.K said
Once you get the stack after kC check if it matches with any of the stack posted in https://mssqlwiki.com/2012/08/21/non-yielding-iocp-listener-non-yielding-scheduler-and-non-yielding-resource-monitor-known-issues-and-fixes/ if you do not find a match post the stack in comments section of same blog
Ravi said
Thanks Karthick, We logged incident with Microsoft and they had provided patch to update. Since then we are not seeing abrupt reboot of our servers.
Sans said
Hi Karthick,
Thanks a lot for your article.
I followed the steps outlined by you and below are the dumps for my SQL 2005 Clustered environment.
======================================================================================
Scenario 1:
0:053> kc 10
Call Site
ntdll!ZwWaitForSingleObject
kernel32!WaitForSingleObjectEx
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::Suspend
sqlservr!SOS_Event::Wait
sqlservr!LC::WaitForMemBlk
sqlservr!SQLServerLogMgr::RealReserveAndAppend
sqlservr!SQLServerLogMgr::ReserveAndAppend
sqlservr!XdesRMReadWrite::GenerateLogRec
sqlservr!XdesRMReadWrite::LogBeginXact
sqlservr!XdesRMReadWrite::MakeActive
sqlservr!XdesRMReadWrite::GenerateLogRec
sqlservr!alloca_probe
sqlservr!EntityVerMgr::ValidateAndUpdateEntityVersion
sqlservr!MDL::FLockObjectId
sqlservr!CMEDCatalogObject::FPersistObject
0:053> kc
*** Stack trace for last set context – .thread/.cxr resets it
Call Site
ntdll!NtWriteFile
kernel32!WriteFile
sqlservr!CErrorReportingManager::WriteToErrLog
sqlservr!CErrorReportingManager::SendErrorToErrLog
sqlservr!CErrorReportingManager::CwchFormatAndPrint
sqlservr!ReportLoginFailure
sqlservr!LoginError
sqlservr!alloca_probe
sqlservr!process_login
sqlservr!process_commands
sqlservr!SOS_Task::Param::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!_callthreadstartex
============================================================================
Scenario 2:
0:004> kc 10
Call Site
ntdll!ZwWaitForSingleObject
kernel32!WaitForSingleObjectEx
sqlservr!CDmpDump::DumpInternal
sqlservr!CDmpDump::Dump
sqlservr!CImageHelper::DoMiniDump
sqlservr!stackTrace
sqlservr!SQL_SOSNonYieldSchedulerCallback
sqlservr!SOS_OS::ExecuteNonYieldSchedulerCallbacks
sqlservr!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks
sqlservr!alloca_probe
sqlservr!SchedulerMonitor::CheckSchedulers
sqlservr!SchedulerMonitor::Run
sqlservr!SchedulerMonitor::EntryPoint
sqlservr!SOS_Task::Param::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
0:028> kc
*** Stack trace for last set context – .thread/.cxr resets it
Call Site
ntdll!NtWriteFile
kernel32!WriteFile
sqlservr!CErrorReportingManager::WriteToErrLog
sqlservr!CErrorReportingManager::SendErrorToErrLog
sqlservr!CErrorReportingManager::CwchFormatAndPrint
sqlservr!scierrlog
sqlservr!FCB::ReportLongIO
sqlservr!alloca_probe
sqlservr!BPool::LazyWriter
sqlservr!lazywriter
sqlservr!SOS_Task::Param::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!_callthreadstartex
=============================================================
Could you please tell me if these are due to any memory issues (or) disk subsystem bottleneck.
In Scenario 1,i can see WaitForMemBlk and LogBeginXact :
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::Suspend
sqlservr!SOS_Event::Wait
sqlservr!LC::WaitForMemBlk
sqlservr!SQLServerLogMgr::RealReserveAndAppend
sqlservr!SQLServerLogMgr::ReserveAndAppend
sqlservr!XdesRMReadWrite::GenerateLogRec
sqlservr!XdesRMReadWrite::LogBeginXact
(or) do i need to check for any other issues.
Karthick P.K said
Hello Anu,
Yes they are related to disk and memory bottleneck.
Fix your I/O issues first. Memory pressure on the system can be side effect of disk bottleneck (When there is slow I/O, dirty pages can get accumulated in memory causing memory pressure) . Follow https://mssqlwiki.com/2012/08/27/io-requests-taking-longer-than-15-seconds-to-complete-on-file/ to fix I/O.
If you are using database snapshot if yes then make sure you have http://support.microsoft.com/kb/975860
Sans said
Thanks Karthick
In addition to above I/O message,I am also seeing below error:
Application popup: Windows – Delayed Write Failed : Windows was unable to save all the data for the file D:\Microsoft SQL Server\MSSQL\LOG. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.
The data is getting lost in SQL and Agent Logs which i am suspecting an issue with the disk.Could you please suggest if i need to collect the counters to conclude this.
Karthick P.K said
Hello San, Above error is very dangerous. It will corrupt the database. Move the files to different drive.
Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes - Karthick PK 's Blog - Site Home - MSDN Blogs said
[…] How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps …… […]
Top SQL Server blogs from MSSQLWIKI « MSSQLWIKI said
[…] How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps …… […]
Eric said
Karthick ,
Is there any way to duplicate the Non-Yielding scheduler issue if I want to do some practice by the windbg?
เกมส์สร้างบ้าน said
After looking over a handful of the articles on your web site, I seriously like your technique of blogging.
I book-marked it to my bookmark website list and will be
checking back in the near future. Take a look at my website
as well and tell me how you feel.
porno said
Article fοllement cultivant !!!
Niagara therapy said
Niagara therapy
How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps …… « MSSQLWIKI
اتوشارژ said
EDC and IDC: EDC refers to the external development charge and IDC refers to
the infrastructure development charges. If it is the only negative on your report,
you may be approved for a limited new line of credit. Unfortunately, I do not use a
garage, but ensure that it stays within a carport and covered when it is not
being utilised.
Winkle said
Hello all,
We are seeing the below stack and our SQL Server version is 11.0.3460
0:546> .cxr sqlmin!g_copiedStackInfo+0X20
rax=0000000000000001 rbx=00000000b23e1740 rcx=0000001cfe7fb4c0
rdx=0000001112913880 rsi=000000000000174c rdi=00000000b23e1710
rip=000000007746063a rsp=00000000278dde60 rbp=000000000000174c
r8=0000000063bf09a8 r9=00000000278de080 r10=7fffffffffffffff
r11=0000002ee4e126bd r12=0000000000000000 r13=0000000000000000
r14=000000000000b200 r15=0000000045ba1000
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246
ntdll!NtWriteFile+0xa:
00000000`7746063a c3 ret
0:546> kc
*** Stack trace for last set context – .thread/.cxr resets it
Call Site
ntdll!NtWriteFile
KERNELBASE!WriteFile
kernel32!WriteFileImplementation
sqlmin!DiskWriteAsync
sqlmin!FCB::AsyncWrite
sqlmin!SQLServerLogMgr::FlushLC
sqlmin!SQLServerLogMgr::AppendLogRequest
sqlmin!SQLServerLogMgr::ReserveAndAppend
sqlmin!XdesRMReadWrite::GenerateLogRec
sqlmin!PageRef::ModifyRow
sqlmin!TextPageRef::ModifyTextNode
sqlmin!BlobBase::ModifyDataFragment
sqlmin!BlobBase::ModifyDataNode
sqlmin!Blob::RWBlob
sqlmin!Blob::ReadWriteAt
sqlmin!BlobManager::ReplaceAtWithILB
sqlmin!PrepareIlbForSetData
sqlmin!LongRecord::PrepareNewColumns
sqlmin!LongRecord::SetDataInternal
sqlmin!LongRecord::SetData
Win 2008R2 SP1
Chris Dorch said
Hey Karthick… We started seeing these on SQL 2012 sp1..
Non-Yielding Resource Monitor
ntdll!ZwWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx
sqldk!SOS_Scheduler::SwitchContext
sqldk!SOS_Scheduler::SuspendNonPreemptive
sqldk!EventInternal::Wait
sqldk!SOS_UnfairMutexPair::LongWait
sqldk!CMemThread::GetObjectAccess
sqldk!CMemThread::Free
sqllang!commondelete
sqllang!CSql::`scalar deleting destructor’
sqllang!CSqlHashBkt::ReleaseSql
sqllang!CCompPlanBase::~CCompPlanBase
sqllang!CCompPlan::`vector deleting destructor’
sqllang!CCacheObject::Release
sqllang!CCacheObject::Destroy
sqllang!TCacheStore::RemoveDescriptor
sqllang!TCacheStore::CacheEntryDescriptor::DestroyDescriptor
sqldk!CacheClockHand::MoveInternal
sqldk!CacheClockHand::Move
sqldk!TClockAlgorithm<CacheClockHand >::StartExternalPressure
and this
Non-Yielding Scheduler
sqldk!TVarPageMgr::PbAllocate
sqldk!CMemThread::Alloc
sqllang!CSqlHashBkt::PcsqlCreateSql
sqllang!CSQLStrings::GenerateCSql
sqllang!CSQLStrings::SetInDbccInputBuffer
sqllang!CSQLStrings::DoCacheLookup
sqllang!CSQLSource::Transform
sqllang!CSQLSource::Execute
sqllang!process_request
sqllang!process_commands
sqldk!SOS_Task::Param::Execute
sqldk!SOS_Scheduler::RunTask
sqldk!SOS_Scheduler::ProcessTasks
sqldk!SchedulerManager::WorkerEntryPoint
sqldk!SystemThread::RunWorker
sqldk!SystemThreadDispatcher::ProcessWorker
sqldk!SchedulerManager::ThreadEntryPoint
kernel32!BaseThreadInitThunk
ntdll!RtlUserThreadStart
Any insight would be helpful!
Thanks in advance!
Balaji said
Very Nice Article ..Thanks for the explanation
How To Analyze Crash Dumps | Information said
[…] How to analyze Non-Yielding scheduler or Non … – MSSQLWIKI – July 30, 2013 at 6:38 PM. Hi Karthick, Thanks a lot for your article. I followed the steps outlined by you and below are the dumps for my SQL 2005 Clustered environment. […]
IRL: Non-yielding IOCP Listener - DallasDBAs.com said
[…] PK from MS has some good posts here and here on […]
Russell Young said
Couldn’t resolve error at ‘xr sqlmin!g_copiedStackInfo+0X20’
0:005> kc 100
# Call Site
00 ntdll!NtWaitForSingleObject
01 KERNELBASE!WaitForSingleObjectEx
02 sqlservr!DmpRemoteDumpRequest
03 sqlservr!DmpRemoteDumpRequest
04 sqllang!SQLDumperLibraryInvoke
05 sqllang!CImageHelper::DoMiniDump
06 sqllang!stackTrace
07 sqlservr!_____SQL______Process______Available
08 sqldk!SOS_OS::ExecuteNonYieldRMCallbacks
09 sqldk!SchedulerMonitor::CheckRMHealth
0a sqldk!SchedulerMonitor::Run
0b sqldk!SchedulerMonitor::EntryPoint
0c sqldk!SOS_Task::Param::Execute
0d sqldk!SOS_Scheduler::RunTask
0e sqldk!SOS_Scheduler::ProcessTasks
0f sqldk!SchedulerManager::WorkerEntryPoint
10 sqldk!SystemThread::RunWorker
11 sqldk!SystemThreadDispatcher::ProcessWorker
12 sqldk!SchedulerManager::ThreadEntryPoint
13 kernel32!BaseThreadInitThunk
14 ntdll!RtlUserThreadStart