MSSQLWIKI

Karthick P.K on SQL Server

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

Posted by Karthick P.K on August 17, 2012

Note: If you are interested only in finding a quick resolution for  Non-Yielding scheduler or Non-yielding IOCP Listener dumps or  Non-yielding  resource monitor Jump to THIS LINK. Continue reading this article if you like to learn How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps

How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps?

This blog is targeted towards analyzing Non-Yielding scheduler dumps and not targeted on explaining how Non-Yield Detection works please read http://technet.microsoft.com/en-us/library/cc917684.aspx to understand how the Non-Yield Detection works but let us recollect few key points before we get in to analysis.

 

1.   SQL Server has its own logical schedulers to schedule the SQL Server workers.

2.   The scheduler is called the User Mode Scheduler (UMS) in SQL Server 2000 and the SQL Server Operating System (SOS) Scheduler in SQL Server 2005

3.   Logical scheduler makes the worker non-preemptive to the database engine. The worker owns the scheduler until it yields to another worker on the same scheduler.

 

What if the threads which owns the scheduler executes for long time without yielding (or) forever and does not yield to give quantum for the other threads waiting in the scheduler?

Answer: Other threads would not get CPU cycles and starve the SQL Server performance.

 

What if the thread is not able to finish its work with in quantum –(4 Milliseconds) for example large for loop?

SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) is expected to yield back to SQL Server scheduler when it exceeds 4ms and rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum.

 

What if the thread did not yield after 4 Milliseconds?

SQL Server has its scheduler monitor to track this. SchedulerMonitor algorithm is to check non-Yield condition every 5 seconds during which the basic check (Check if the thread is executing for >4Ms) is done . When the basic check evaluates to true, tracking of the worker begins and if the thread doesn’t yield beyond 10seconds (Nonyield threshold) after the tracking begins then threshold check becomes true. So there is approximately 15 seconds between the time of the last yield on the scheduler and the time that the threshold check becomes true and tracking continues.

 

A dump is taken when an specific nonyield situation has reached 60 seconds in total duration. Once a 17883 mini-dump is captured, no further 17883 mini-dumps are captured until trace flag -T1262 is enabled or the SQL Server process is restarted. However, 17883 error message reporting continues, regardless of the mini-dump capture.  Also when –T1262 is enabled mini-dump is captured when the Non-Yield threshold check becomes true (15 seconds)

and at subsequent 60-second intervals for the same nonyield occurrence. A new nonyielding occurrence causes dump captures to occur again.

When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

Non-yielding IOCP Listener

An identical algorithm is used to detect non-yielding I/O completion routines, counting completed I/O completion routines instead of number of yields. Scheduler Monitor takes a dump when it notices the IOCP has not moved for 10 seconds. Analyzing Non-yielding IOCP Listener is also same as analyzing non-yielding scheduler dump

 

 

Let us step in to analysis of  non-yielding scheduler dump which I got in SQL Server 2012

 

Sample 1

When a non-yielding scheduler dump is generated following error message is logged in SQL Error log and SQLDump000n.mdmp is generated in log folder.

 

{

    * *******************************************************************************

    *

    * BEGIN STACK DUMP:

    *   04/16/12 10:09:58 spid 6256

    *

    * Non-yielding Scheduler

    *

     * *******************************************************************************

Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0. Thread creation time: 12979065797278. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 70110 ms.

}

 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Microsoft (R) Windows Debugger Version 6.11.0001.404 X86

Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Users\karthick \Desktop\Karthick\SQLDump0009.mdmp]

User Mini Dump File: Only registers, stack and portions of memory are available

 

Comment: ‘Stack Trace’

Comment: ‘Non-yielding Scheduler’  èType of the dump

Symbol search path is: *** Invalid ***

 

 

Executable search path is:

Windows 7 Version 7601 (Service Pack 1) MP (24 procs) Free x64

Product: Server, suite: Enterprise TerminalServer SingleUserTS  à Windows version and system information

Machine Name:

Debug session time: Mon Apr 16 09:09:59.000 2012 (GMT-7)

System Uptime: 9 days 15:57:03.155

Process Uptime: 0 days 0:06:48.000

……………………………………………………….

……………………………..

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

0:146> lmvm sqlservr                                               

start             end                 module name

00000000`ffad0000 00000000`ffb0e000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21553ADC31784A4D933974A386EE2E052\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.S1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Apr 06 08:19:38 2012 (4F7F098A)

    CheckSum:         00036498

    ImageSize:        0003E000

    File version:     2011.110.2316.0                                                         

    Product version:  11.0.2316.0 èSQL Server Version

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5:

Use !findstack command to find scheduler monitor thread (sqlservr!SQL_SOSNonYieldSchedulerCallback )

 

 

0:146> !findstack sqlservr!SQL_SOSNonYieldSchedulerCallback

Thread 006, 1 frame(s) match  è Thread ID of scheduler monitor.

        * 07 00000000336be420 000007fee36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

 

Step 6:

Switch to scheduler monitor thread using ~[threadID]s command

 

0:146> ~[006]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

 

Step 7:

Use kC or kP command to look at the stack on scheduler monitor thread.

 

0:006> kP                        Child-SP          RetAddr           Call Site

00000000`3369c218 000007fe`fcd210ac ntdll!NtWaitForSingleObject+0xa

00000000`3369c220 00000000`ffaeecce KERNELBASE!WaitForSingleObjectEx+0x79

00000000`3369c2c0 00000000`ffaef1a4 sqlservr!CDmpDump::DumpInternal+0x20e

00000000`3369c360 000007fe`dbe50794 sqlservr!CDmpDump::Dump+0x24

00000000`3369c3a0 000007fe`dbe511e6 sqllang!SQLDumperLibraryInvoke+0x2e4

00000000`3369c640 000007fe`dbe16ddb sqllang!CImageHelper::DoMiniDump+0x426

00000000`3369c830 00000000`ffae307f sqllang!stackTrace+0xbdb

00000000`3369e270 000007fe`e36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

00000000`336be430 000007fe`e36866da sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0x375

00000000`336bebf0 000007fe`e364b53f sqldk!SchedulerMonitor::CheckScheduler+0x307

00000000`336bed60 000007fe`e364aa8f sqldk!SchedulerMonitor::CheckSchedulers+0x211

00000000`336bf1f0 000007fe`e371c779 sqldk!SchedulerMonitor::Run+0xfb

00000000`336bf320 000007fe`e3642f10 sqldk!SchedulerMonitor::EntryPoint+0x9

00000000`336bf350 000007fe`e3642d04 sqldk!SOS_Task::Param::Execute+0x21e

00000000`336bf950 000007fe`e36429e6 sqldk!SOS_Scheduler::RunTask+0xa8

00000000`336bf9c0 000007fe`e365a29f sqldk!SOS_Scheduler::ProcessTasks+0x299

00000000`336bfa40 000007fe`e365a3b0 sqldk!SchedulerManager::WorkerEntryPoint+0x261

00000000`336bfae0 000007fe`e3659fcf sqldk!SystemThread::RunWorker+0x8f

00000000`336bfb10 000007fe`e365aaf8 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8

00000000`336bfbc0 00000000`76ad652d sqldk!SchedulerManager::ThreadEntryPoint+0x236

 

Step 8:

Switch to the thread which is reported as Non-Yielding in SQL Server error log using ~~[ThreadID]s command.

 

Recollect the below error in SQL error log.  Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0.

 

0:006> ~~[0x1cb0]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

Step 9:

Look at the current stack of Non-yielding thread. using kc command

 

0:146> kc 10

Call Site

ntdll!NtWaitForSingleObject

KERNELBASE!WaitForSingleObjectEx

sqldk!SOS_Scheduler::SwitchContext

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqldk!WorkDispatcher::DequeueTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

 

 

 

 

Recollect the information which we discussed earlier in this blog. When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

 

 

Look at the above stack it cannot be Non-Yield thread because we see SwitchNonPreemptive  and SwitchContex  in the thread. 

{

SwitchPreemptive or SuspendNonPreemptive forces another worker to become owner of the scheduler. It does this by making the head of the runnable list the new owner and removing the current worker from logical scheduler control. The worker transitions ownership and is removed from SQL scheduler control until the external activity is complete. When the external activity is complete, the worker returns to the end of the runnable list by calling SwitchNonPreemptive.

}

 

Step 10:

Search for the copied stack structure using X commad

 

0:146> X sqlmin!*copiedStack*    

000007fe`df11bfe0 sqlmin!g_copiedStackInfo = <no type information>               

 

It is sqlmin!g_copiedStackInfo in this dump because this is dump is from SQL Server2012. In earlier versions of SQL Server it is sqlservr!g_copiedStackInfo

 

Step 11:

We know copied CONTEXT is stored in g_CopiedStackInfo how to find the valid offset in this structure ? If the CONTEXT is valid Rip,Rbp and RSP registers has to be valid if the dump is from X64 system and Eip,Ebp and Esp has to be valid if it is X86 system.

 

 

Let us do dd on sqlmin!g_copiedStackInfo  (remember it is sqlservr!g_copiedStackInfo in SQL2008/2005/2000)

 

0:146> dd sqlmin!g_copiedStackInfo     

000007fe`df11bfe0  00000001 00000000 3369e2e0 00000000

000007fe`df11bff0  0000a998 00000000 00000000 00000000

000007fe`df11c000  00000000 00000000 00000000 00000000

000007fe`df11c010  00000000 00000000 00000000 00000000

000007fe`df11c020  00000000 00000000 00000000 00000000

000007fe`df11c030  0010000b 00001f80 00000033 00000000

000007fe`df11c040  002b0000 00000246 00000000 00000000

000007fe`df11c050  00000000 00000000 00000000 00000000

 

Step 12:

Let us dump each address with context and verify if Rip,Rbp and RSP registers are valid. This dump is from 64-bit SQL Server so we are using Rip,Rbp and RSP registers. If the dump is from x86 system use Eip,Ebp and Esp .

 

0:146> dt 000007fe`df11bfe0 CONTEXT Rip Rsp Rbp     èType cast  000007fe`df11bfe0  with CONTEXT  .  RsP,Rbp and Rip is invalid so this is not valid offset.

ole32!CONTEXT

   +0x098 Rsp : 2

   +0x0a0 Rbp : 0x5a4d

   +0x0f8 Rip : 0

0:146> dt 000007fe`df11bff0 CONTEXT Rip Rsp Rbp      è Type cast 000007fe`df11bff0  with CONTEXT  . RIP is invalid

ole32!CONTEXT

   +0x098 Rsp : 0x72120000

   +0x0a0 Rbp : 0x3369e3cc

   +0x0f8 Rip : 0xf2

0:146> dt 000007fe`df11c000 CONTEXT Rip Rsp Rbp   è Type cast 000007fe`df11c000 with CONTEXT  . RIP ,RSP  and Rbp is valid. So this should be a valid context.Let us switch to this context and verify

ole32!CONTEXT

   +0x098 Rsp : 0x3369e2e0

   +0x0a0 Rbp : 0x3369e498

   +0x0f8 Rip : 0x76d3139a

 

Now we know 000007fe`df11c000 is valid context.  So 000007fe`df11c000 -sqlmin!g_copiedStackInfo =0x20 hence  we can use .cxr sqlmin!g_copiedStackInfo+0X20 directly to switch to the context of copied stack.

 

Step 13:

 

Switch the context of copied stack using .cxr 000007fe`df11c000  or .cxr sqlmin!g_copiedStackInfo+0X20

 

0:146> .cxr 000007fe`df11c000

rax=0000000000000002 rbx=000000003369e3cc rcx=0000000000005a4d

rdx=0000000072120000 rsi=000000000000026c rdi=0000000000000000

rip=0000000076d3139a rsp=000000003369e2e0 rbp=000000003369e498

r8=00000000000000b0  r9=0000000084a85310 r10=0000000000000000

r11=0000000000000000 r12=0000000000000000 r13=0000000000000004

r14=00000000000000f2 r15=0000000000000001

iopl=0         nv up ei pl zr na po nc

cs=0033  ss=002b  ds=0000  es=0000  fs=0000  gs=0000             efl=00000246

ntdll!NtWriteFile+0xa:

00000000`76d3139a c3              ret

 

Step 14:

 

Dump the stack of copied context using kP or kc (kc displays clean stack trace. each display line includes only the module name and the function name)

 

 

0:146> Kc

  *** Stack trace for last set context – .thread/.cxr resets it

Call Site

ntdll!NtWriteFile

KERNELBASE!WriteFile

kernel32!WriteFileImplementation

sqllang!CErrorReportingManager::WriteToErrLog

sqllang!CErrorReportingManager::SendErrorToErrLog

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqllang!ReportLoginFailure

sqllang!FRedoLogin

sqllang!login

sqllang!process_login_finish

sqllang!process_commands

sqldk!SOS_Task::Param::Execute

sqldk!SOS_Scheduler::RunTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

Now compare the current stack and the copied stack to see if the thread has progressed after No-Yield condition. Stack look completely different So the Non-Yield thread has progressed and completed .It is doing new work now. Also to understand why the thread was Non-Yielding look at the copied stack and not the current unless both the stacks are same. 

 

 

Current thread stack which we dumped using the thread ID in SQL Errorlog.

0:146> kc                                                                                    

Copied thread stack which SQL Server copied to global structure before generating the dump.

0:146> Kc 10

ntdll!NtWaitForSingleObject

KERNELBASE!WriteFile

KERNELBASE!WaitForSingleObjectEx

kernel32!WriteFileImplementation

sqldk!SOS_Scheduler::SwitchContext

sqllang!CErrorReportingManager::WriteToErrLog

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqllang!CErrorReportingManager::SendErrorToErrLog

sqldk!WorkDispatcher::DequeueTask

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqldk!SOS_Scheduler::ProcessTasks

sqllang!ReportLoginFailure

sqldk!SchedulerManager::WorkerEntryPoint

sqllang!FRedoLogin

sqldk!SystemThread::RunWorker

sqllang!login

sqldk!SystemThreadDispatcher::ProcessWorker

sqllang!process_login_finish

sqldk!SchedulerManager::ThreadEntryPoint

sqllang!process_commands

kernel32!BaseThreadInitThunk

sqldk!SOS_Task::Param::Execute

ntdll!RtlUserThreadStart

sqldk!SOS_Scheduler::RunTask

 

sqldk!SOS_Scheduler::ProcessTasks

 

sqldk!SchedulerManager::WorkerEntryPoint

 

sqldk!SystemThread::RunWorker

 

sqldk!SystemThreadDispatcher::ProcessWorker

 

sqldk!SchedulerManager::ThreadEntryPoint

 

Now let us read the copied stack and understand what would have caused a Non-Yield condition (read from bottom to top)

 

ntdll!NtWriteFile  -> WriteFile function is at top of the stack and did not complete in expected time.          
KERNELBASE!WriteFile       
kernel32!WriteFileImplementation 
sqllang!CErrorReportingManager::WriteToErrLog  ->Write the error to errorlog
sqllang!CErrorReportingManager::SendErrorToErrLog  ->Send the error to SQL Server errorlog
sqllang!CErrorReportingManager::CwchFormatAndPrint  ->format the error
sqllang!ReportLoginFailure  ->Login failed
sqllang!FRedoLogin        
sqllang!login ->Login task is processed

 

 

From the above stack we are able to understand SQL Server is writing login failed information to SQL Error log (Synchronously) and the writefile function has taken long time and did not return.So there is Non-Yield scheduler dump.

 

When will writefile operation take long time?

When there is Disk bottleneck. So the obvious solution for this issue is to  fix the performance of the disk.

Similarly there could be numerous other reasons for Non-Yield condition so look at the stack of your Non-Yield scheduler dump using the method above and make out what could have caused the Non-Yield condition.

Also refer THIS LINK to check if your stack matches with any of the known issues in SQL Server.

To Be continued…………………………

Related posts:

  • How to Analyze "Deadlocked Schedulers" Dumps?
  • Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes
  • SQL Server generated Access Violation dumps while accessing oracle linked servers.
  • SQL Server Latch & Debugging latch time out

    If you liked this post, do like us on Face Book at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

  • 28 Responses to “How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……”

    1. AZAD MARATHA said

      Crisp and Clear Explanation. Thanks Karthick

    2. […] There are some great tips on debugging non-yielding IOCP listener scenarios on MSSQLWiki blog […]

    3. RV said

      KK,

      IOCP Listener dump:

      2012-12-16 03:15:37.20 Server Using ‘dbghelp.dll’ version ‘4.0.5’
      2012-12-16 03:15:37.54 Server **Dump thread – spid = 0, PSS = 0x0000000000000000, EC = 0x0000000000000000
      2012-12-16 03:15:37.59 Server ***Stack Dump being sent to xxxxxxxxxxxxxxxxxx\SQLDump0011.txt
      2012-12-16 03:15:37.61 Server * *******************************************************************************
      2012-12-16 03:15:37.61 Server *
      2012-12-16 03:15:37.61 Server * BEGIN STACK DUMP:
      2012-12-16 03:15:37.61 Server * 12/16/12 03:15:37 spid 0
      2012-12-16 03:15:37.61 Server *
      2012-12-16 03:15:37.61 Server * Non-yielding IOCP Listener
      2012-12-16 03:15:37.61 Server *
      2012-12-16 03:15:37.61 Server * *******************************************************************************
      2012-12-16 03:15:37.61 Server * ——————————————————————————-
      2012-12-16 03:15:37.61 Server * Short Stack Dump
      2012-12-16 03:15:47.68 Server Stack Signature for the dump is 0x0000000000000038
      2012-12-16 03:20:38.10 Server External dump process return code 0x20002001.
      The error information has been submitted to Watson error reporting.

      2012-12-16 16:15:18.45 spid2s LazyWriter: warning, no free buffers found.
      2012-12-16 16:15:20.19 spid2s

      • Can you please post the stack

        • Sonu Shah said

          Hi Karthick, Can you help me how to proceed with the Exception Analysis analysis for `Stalled IOCP Listener`. (SQL_SOSNonYieldIOCPCallback)

          Here is the stack

          COMMENT: Stack Trace

          NTGLOBALFLAG: 0

          PROCESS_BAM_CURRENT_THROTTLED: 0

          PROCESS_BAM_PREVIOUS_THROTTLED: 0

          CONTEXT: (.ecxr)
          rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
          rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
          rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
          r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
          r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
          r14=0000000000000000 r15=0000000000000000
          iopl=0 nv up di pl nz na pe nc
          cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
          00000000`00000000 ?? ???
          Resetting default scope

          EXCEPTION_RECORD: (.exr -1)
          ExceptionAddress: 00007ff973d66d73 (sqllang!stackTrace+0x0000000000000103)
          ExceptionCode: 00000000
          ExceptionFlags: 00000000
          NumberParameters: 0

          PROCESS_NAME: sqlservr.exe

          FAULTING_THREAD: 00004a34

          STACK_TEXT:
          00000000`00000000 00000000`00000000 sqlservr.exe!unknown_error_in_process+0x0
          00000000`00000000 00000000`00000000 unknown![.ecxr]+0x0
          0000002f`8d1dcb68 00007ff9`92955d04 ntdll!NtWaitForSingleObject+0x0
          0000002f`8d1dcb70 00007ff9`8fbb4d7f KERNELBASE!WaitForSingleObjectEx+0x0
          0000002f`8d1dcc10 00007ff7`8dab036f sqlservr!CDmpDump::DumpInternal+0x0
          0000002f`8d1dcd10 00007ff7`8daaff84 sqlservr!CDmpDump::Dump+0x0
          0000002f`8d1dcd50 00007ff9`730c98ad sqllang!SQLDumperLibraryInvoke+0x0
          0000002f`8d1dcd90 00007ff9`73d9d481 sqllang!SQLLangDumperLibraryInvoke+0x0
          0000002f`8d1dce50 00007ff9`73d9df34 sqllang!CImageHelper::DoMiniDump+0x0
          0000002f`8d1dd060 00007ff9`73d6760e sqllang!stackTrace+0x0
          0000002f`8d1dea90 00007ff7`8da96b73 sqlservr!SQL_SOSNonYieldIOCPCallback+0x0
          0000002f`8d1feb10 00007ff9`6f7514e3 sqldk!SOS_OS::ExecuteNonYieldIOCPCallbacks+0x0
          0000002f`8d1feda0 00007ff9`6f6f4273 sqldk!SchedulerMonitor::CheckIoCompletionHangs+0x0
          0000002f`8d1fef10 00007ff9`6f6e0370 sqldk!SchedulerMonitor::Run+0x0
          0000002f`8d1ff060 00007ff9`6f7ba719 sqldk!SchedulerMonitor::EntryPoint+0x0
          0000002f`8d1ff090 00007ff9`6f6d62fd sqldk!SOS_Task::Param::Execute+0x0
          0000002f`8d1ff690 00007ff9`6f6d6a08 sqldk!SOS_Scheduler::RunTask+0x0
          0000002f`8d1ff700 00007ff9`6f6d67fd sqldk!SOS_Scheduler::ProcessTasks+0x0
          0000002f`8d1ff7f0 00007ff9`6f6ff938 sqldk!SchedulerManager::WorkerEntryPoint+0x0
          0000002f`8d1ff8c0 00007ff9`6f6ffa40 sqldk!SystemThread::RunWorker+0x0
          0000002f`8d1ff8f0 00007ff9`6f6ff1a7 sqldk!SystemThreadDispatcher::ProcessWorker+0x0
          0000002f`8d1ff990 00007ff9`6f6ff6f9 sqldk!SchedulerManager::ThreadEntryPoint+0x0
          0000002f`8d1ffa40 00007ff9`91f484d4 kernel32!BaseThreadInitThunk+0x0
          0000002f`8d1ffa70 00007ff9`92901791 ntdll!RtlUserThreadStart+0x0

          SYMBOL_NAME: sqlservr.exe!unknown_error_in_process

          MODULE_NAME: sqlservr

          IMAGE_NAME: sqlservr.exe

          STACK_COMMAND: .ecxr ; kb ; ** Pseudo Context ** Pseudo ** Value: ffffffff ** ; kb

          EXCEPTION_CODE_STR: 0

          FAILURE_BUCKET_ID: SQLSERVER_NON_YIELDING_IOCP_INVALID_COPIED_STACK_0_sqlservr.exe!unknown_error_in_process

          OS_VERSION: 10.0.14393.5066

          BUILDLAB_STR: rs1_release

          OSPLATFORM_TYPE: x64

          OSNAME: Windows 8

          IMAGE_VERSION: 2017.140.3335.7

          FAILURE_ID_HASH: {c00aed11-3b4e-2876-9e78-3f138b6bd9b3}

          Followup: MachineOwner
          ———

          • Sonu Shah said

            after above: I ran “!findstack sqlservr!SQL_SOSNonYieldIOCPCallback”
            and the output was

            Thread 006, 1 frame(s) match
            * 08 0000002f8d1feb00 00007ff96f7514e3 sqlservr!SQL_SOSNonYieldIOCPCallback+0xf3

            then I ran “~[006]s”
            and the output was

            ntdll!NtWaitForSingleObject+0x14:
            00007ff9`92955d04 c3 ret

            then I ran “kP Child-SP RetAddr Call Site”
            and output was

            Couldn’t resolve error at ‘Child-SP RetAddr Call Site’

    4. […] How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps …… […]

    5. Ravishankar said

      Hi Karthick,
      Thank you for the nice explanation on Non-Yielding scheduler
      In our clustered environment one of our node was unresponsive caused a minidump
      while following your Step 11, I am getting following results, and unable to continue, Can you please give me suggestion to continue from here…

      0:088> kc 10
      Call Site
      ntdll!NtSignalAndWaitForSingleObject
      kernel32!SignalObjectAndWait
      sqlservr!SOS_Scheduler::SwitchContext
      sqlservr!SOS_Scheduler::SuspendNonPreemptive
      sqlservr!WorkDispatcher::DequeueTask
      sqlservr!SOS_Scheduler::ProcessTasks
      sqlservr!SchedulerManager::WorkerEntryPoint
      sqlservr!SystemThread::RunWorker
      sqlservr!SystemThreadDispatcher::ProcessWorker
      sqlservr!SchedulerManager::ThreadEntryPoint
      msvcr80!endthreadex
      msvcr80!endthreadex
      kernel32!BaseThreadStart

      0:088> X sqlservr!*copiedStack*
      00000000`01b311a0 sqlservr!IDmpDump::AddMemory =
      00000000`01b33190 sqlservr!CopiedStackInfo::CopiedStackInfo =
      00000000`01b306a0 sqlservr!DmpPointer::DmpPointer =
      00000000`032dbb30 sqlservr!g_copiedStackInfo =
      00000000`01b311c0 sqlservr!DMP_MAKE_POINTER =
      00000000`01b306b0 sqlservr!DmpPointer::GetRawPointer =

      0:088> dd sqlservr!g_copiedStackInfo
      00000000`032dbb30 ???????? ???????? ???????? ????????
      00000000`032dbb40 ???????? ???????? ???????? ????????
      00000000`032dbb50 ???????? ???????? ???????? ????????
      00000000`032dbb60 ???????? ???????? ???????? ????????
      00000000`032dbb70 ???????? ???????? ???????? ????????
      00000000`032dbb80 ???????? ???????? ???????? ????????
      00000000`032dbb90 ???????? ???????? ???????? ????????
      00000000`032dbba0 ???????? ???????? ???????? ????????
      0:088> dt 00000000`032dbb30 CONTEXT Rip Rsp Rbp
      ATL80!CONTEXT
      +0x098 Rsp : ??
      +0x0a0 Rbp : ??
      +0x0f8 Rip : ??
      Memory read error 00000000032dbc28

      Sql server :2005 SP2
      Running on x64 Windows 2003 OS.

    6. Sans said

      Hi Karthick,

      Thanks a lot for your article.
      I followed the steps outlined by you and below are the dumps for my SQL 2005 Clustered environment.

      ======================================================================================
      Scenario 1:

      0:053> kc 10
      Call Site
      ntdll!ZwWaitForSingleObject
      kernel32!WaitForSingleObjectEx
      sqlservr!SOS_Scheduler::SwitchContext
      sqlservr!SOS_Scheduler::Suspend
      sqlservr!SOS_Event::Wait
      sqlservr!LC::WaitForMemBlk
      sqlservr!SQLServerLogMgr::RealReserveAndAppend
      sqlservr!SQLServerLogMgr::ReserveAndAppend
      sqlservr!XdesRMReadWrite::GenerateLogRec
      sqlservr!XdesRMReadWrite::LogBeginXact
      sqlservr!XdesRMReadWrite::MakeActive
      sqlservr!XdesRMReadWrite::GenerateLogRec
      sqlservr!alloca_probe
      sqlservr!EntityVerMgr::ValidateAndUpdateEntityVersion
      sqlservr!MDL::FLockObjectId
      sqlservr!CMEDCatalogObject::FPersistObject

      0:053> kc
      *** Stack trace for last set context – .thread/.cxr resets it
      Call Site
      ntdll!NtWriteFile
      kernel32!WriteFile
      sqlservr!CErrorReportingManager::WriteToErrLog
      sqlservr!CErrorReportingManager::SendErrorToErrLog
      sqlservr!CErrorReportingManager::CwchFormatAndPrint
      sqlservr!ReportLoginFailure
      sqlservr!LoginError
      sqlservr!alloca_probe
      sqlservr!process_login
      sqlservr!process_commands
      sqlservr!SOS_Task::Param::Execute
      sqlservr!SOS_Scheduler::RunTask
      sqlservr!SOS_Scheduler::ProcessTasks
      sqlservr!SchedulerManager::WorkerEntryPoint
      sqlservr!SystemThread::RunWorker
      sqlservr!SystemThreadDispatcher::ProcessWorker
      sqlservr!SchedulerManager::ThreadEntryPoint
      msvcr80!_callthreadstartex

      ============================================================================
      Scenario 2:

      0:004> kc 10
      Call Site
      ntdll!ZwWaitForSingleObject
      kernel32!WaitForSingleObjectEx
      sqlservr!CDmpDump::DumpInternal
      sqlservr!CDmpDump::Dump
      sqlservr!CImageHelper::DoMiniDump
      sqlservr!stackTrace
      sqlservr!SQL_SOSNonYieldSchedulerCallback
      sqlservr!SOS_OS::ExecuteNonYieldSchedulerCallbacks
      sqlservr!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks
      sqlservr!alloca_probe
      sqlservr!SchedulerMonitor::CheckSchedulers
      sqlservr!SchedulerMonitor::Run
      sqlservr!SchedulerMonitor::EntryPoint
      sqlservr!SOS_Task::Param::Execute
      sqlservr!SOS_Scheduler::RunTask
      sqlservr!SOS_Scheduler::ProcessTasks

      0:028> kc
      *** Stack trace for last set context – .thread/.cxr resets it
      Call Site
      ntdll!NtWriteFile
      kernel32!WriteFile
      sqlservr!CErrorReportingManager::WriteToErrLog
      sqlservr!CErrorReportingManager::SendErrorToErrLog
      sqlservr!CErrorReportingManager::CwchFormatAndPrint
      sqlservr!scierrlog
      sqlservr!FCB::ReportLongIO
      sqlservr!alloca_probe
      sqlservr!BPool::LazyWriter
      sqlservr!lazywriter
      sqlservr!SOS_Task::Param::Execute
      sqlservr!SOS_Scheduler::RunTask
      sqlservr!SOS_Scheduler::ProcessTasks
      sqlservr!SchedulerManager::WorkerEntryPoint
      sqlservr!SystemThread::RunWorker
      sqlservr!SystemThreadDispatcher::ProcessWorker
      sqlservr!SchedulerManager::ThreadEntryPoint
      msvcr80!_callthreadstartex

      =============================================================

      Could you please tell me if these are due to any memory issues (or) disk subsystem bottleneck.

      In Scenario 1,i can see WaitForMemBlk and LogBeginXact :

      sqlservr!SOS_Scheduler::SwitchContext
      sqlservr!SOS_Scheduler::Suspend
      sqlservr!SOS_Event::Wait
      sqlservr!LC::WaitForMemBlk
      sqlservr!SQLServerLogMgr::RealReserveAndAppend
      sqlservr!SQLServerLogMgr::ReserveAndAppend
      sqlservr!XdesRMReadWrite::GenerateLogRec
      sqlservr!XdesRMReadWrite::LogBeginXact

      (or) do i need to check for any other issues.

      • Hello Anu,

        Yes they are related to disk and memory bottleneck.

        Fix your I/O issues first. Memory pressure on the system can be side effect of disk bottleneck (When there is slow I/O, dirty pages can get accumulated in memory causing memory pressure) . Follow https://mssqlwiki.com/2012/08/27/io-requests-taking-longer-than-15-seconds-to-complete-on-file/ to fix I/O.

        If you are using database snapshot if yes then make sure you have http://support.microsoft.com/kb/975860

        • Sans said

          Thanks Karthick

          In addition to above I/O message,I am also seeing below error:

          Application popup: Windows – Delayed Write Failed : Windows was unable to save all the data for the file D:\Microsoft SQL Server\MSSQL\LOG. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.

          The data is getting lost in SQL and Agent Logs which i am suspecting an issue with the disk.Could you please suggest if i need to collect the counters to conclude this.

    7. […] How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps …… […]

    8. […] How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps …… […]

    9. Eric said

      Karthick ,
      Is there any way to duplicate the Non-Yielding scheduler issue if I want to do some practice by the windbg?

    10. After looking over a handful of the articles on your web site, I seriously like your technique of blogging.
      I book-marked it to my bookmark website list and will be
      checking back in the near future. Take a look at my website
      as well and tell me how you feel.

    11. porno said

      Article fοllement cultivant !!!

    12. Niagara therapy

      How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps …… « MSSQLWIKI

    13. EDC and IDC: EDC refers to the external development charge and IDC refers to
      the infrastructure development charges. If it is the only negative on your report,
      you may be approved for a limited new line of credit. Unfortunately, I do not use a
      garage, but ensure that it stays within a carport and covered when it is not
      being utilised.

    14. Winkle said

      Hello all,

      We are seeing the below stack and our SQL Server version is 11.0.3460

      0:546> .cxr sqlmin!g_copiedStackInfo+0X20
      rax=0000000000000001 rbx=00000000b23e1740 rcx=0000001cfe7fb4c0
      rdx=0000001112913880 rsi=000000000000174c rdi=00000000b23e1710
      rip=000000007746063a rsp=00000000278dde60 rbp=000000000000174c
      r8=0000000063bf09a8 r9=00000000278de080 r10=7fffffffffffffff
      r11=0000002ee4e126bd r12=0000000000000000 r13=0000000000000000
      r14=000000000000b200 r15=0000000045ba1000
      iopl=0 nv up ei pl zr na po nc
      cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246
      ntdll!NtWriteFile+0xa:
      00000000`7746063a c3 ret
      0:546> kc
      *** Stack trace for last set context – .thread/.cxr resets it
      Call Site
      ntdll!NtWriteFile
      KERNELBASE!WriteFile
      kernel32!WriteFileImplementation
      sqlmin!DiskWriteAsync
      sqlmin!FCB::AsyncWrite
      sqlmin!SQLServerLogMgr::FlushLC
      sqlmin!SQLServerLogMgr::AppendLogRequest
      sqlmin!SQLServerLogMgr::ReserveAndAppend
      sqlmin!XdesRMReadWrite::GenerateLogRec
      sqlmin!PageRef::ModifyRow
      sqlmin!TextPageRef::ModifyTextNode
      sqlmin!BlobBase::ModifyDataFragment
      sqlmin!BlobBase::ModifyDataNode
      sqlmin!Blob::RWBlob
      sqlmin!Blob::ReadWriteAt
      sqlmin!BlobManager::ReplaceAtWithILB
      sqlmin!PrepareIlbForSetData
      sqlmin!LongRecord::PrepareNewColumns
      sqlmin!LongRecord::SetDataInternal
      sqlmin!LongRecord::SetData

      Win 2008R2 SP1

    15. Hey Karthick… We started seeing these on SQL 2012 sp1..

      Non-Yielding Resource Monitor

      ntdll!ZwWaitForSingleObject
      KERNELBASE!WaitForSingleObjectEx
      sqldk!SOS_Scheduler::SwitchContext
      sqldk!SOS_Scheduler::SuspendNonPreemptive
      sqldk!EventInternal::Wait
      sqldk!SOS_UnfairMutexPair::LongWait
      sqldk!CMemThread::GetObjectAccess
      sqldk!CMemThread::Free
      sqllang!commondelete
      sqllang!CSql::`scalar deleting destructor’
      sqllang!CSqlHashBkt::ReleaseSql
      sqllang!CCompPlanBase::~CCompPlanBase
      sqllang!CCompPlan::`vector deleting destructor’
      sqllang!CCacheObject::Release
      sqllang!CCacheObject::Destroy
      sqllang!TCacheStore::RemoveDescriptor
      sqllang!TCacheStore::CacheEntryDescriptor::DestroyDescriptor
      sqldk!CacheClockHand::MoveInternal
      sqldk!CacheClockHand::Move
      sqldk!TClockAlgorithm<CacheClockHand >::StartExternalPressure

      and this

      Non-Yielding Scheduler

      sqldk!TVarPageMgr::PbAllocate
      sqldk!CMemThread::Alloc
      sqllang!CSqlHashBkt::PcsqlCreateSql
      sqllang!CSQLStrings::GenerateCSql
      sqllang!CSQLStrings::SetInDbccInputBuffer
      sqllang!CSQLStrings::DoCacheLookup
      sqllang!CSQLSource::Transform
      sqllang!CSQLSource::Execute
      sqllang!process_request
      sqllang!process_commands
      sqldk!SOS_Task::Param::Execute
      sqldk!SOS_Scheduler::RunTask
      sqldk!SOS_Scheduler::ProcessTasks
      sqldk!SchedulerManager::WorkerEntryPoint
      sqldk!SystemThread::RunWorker
      sqldk!SystemThreadDispatcher::ProcessWorker
      sqldk!SchedulerManager::ThreadEntryPoint
      kernel32!BaseThreadInitThunk
      ntdll!RtlUserThreadStart

      Any insight would be helpful!

      Thanks in advance!

    16. Balaji said

      Very Nice Article ..Thanks for the explanation

    17. […] How to analyze Non-Yielding scheduler or Non … – MSSQLWIKI – July 30, 2013 at 6:38 PM. Hi Karthick, Thanks a lot for your article. I followed the steps outlined by you and below are the dumps for my SQL 2005 Clustered environment. […]

    18. […] PK from MS has some good posts here and here on […]

    19. Russell Young said

      Couldn’t resolve error at ‘xr sqlmin!g_copiedStackInfo+0X20’
      0:005> kc 100
      # Call Site
      00 ntdll!NtWaitForSingleObject
      01 KERNELBASE!WaitForSingleObjectEx
      02 sqlservr!DmpRemoteDumpRequest
      03 sqlservr!DmpRemoteDumpRequest
      04 sqllang!SQLDumperLibraryInvoke
      05 sqllang!CImageHelper::DoMiniDump
      06 sqllang!stackTrace
      07 sqlservr!_____SQL______Process______Available
      08 sqldk!SOS_OS::ExecuteNonYieldRMCallbacks
      09 sqldk!SchedulerMonitor::CheckRMHealth
      0a sqldk!SchedulerMonitor::Run
      0b sqldk!SchedulerMonitor::EntryPoint
      0c sqldk!SOS_Task::Param::Execute
      0d sqldk!SOS_Scheduler::RunTask
      0e sqldk!SOS_Scheduler::ProcessTasks
      0f sqldk!SchedulerManager::WorkerEntryPoint
      10 sqldk!SystemThread::RunWorker
      11 sqldk!SystemThreadDispatcher::ProcessWorker
      12 sqldk!SchedulerManager::ThreadEntryPoint
      13 kernel32!BaseThreadInitThunk
      14 ntdll!RtlUserThreadStart

    Leave a comment