MSSQLWIKI

Karthick P.K on SQL Server

Archive for the ‘Debugging’ Category

Debugging memory Leaks using Debug diagnostic tool.

Posted by Karthick P.K on December 6, 2012

In my previous post (SQL Server memory leak ) I explained how to identify the modules  which are leaking the memory using ‘!heap’ commands.  Sometimes we may not be able to find the cause by displaying the memory using ‘!d’ commands to find the patterns or using search memory commands (s).

In such scenarios  we can use  Debug Diagnostic Tool or UMDH to track memory leaks.   This blog will explain how to identify the memory leaks using Debug diagnostics tools.

 

Download and install Debug Diagnostic Tool from http://www.microsoft.com/en-us/download/details.aspx?id=26798

 

1. Go to ToolsàOptions ->Preferences àSelect Record call stacks immediately when monitoring the leaks.

 

clip_image002

 

2. Go to the rules tab and select add rule.

 

3. Choose Native (non .Net) memory leak and handle leak.

 

4. Select the SQL Server or any process which has to be tracked for memory leak.

 

5. Click next and leave the default options (you can choose auto-unload Leak track when rule is completed or deactivated).

 

6. Click next and Activate the rule now.

 

7. Leaktrack.dll would have loaded to the process for which we are tracking the allocations.

 

8.  Now you can wait for the leak to happen again.

 

{

–If you are learning how to troubleshoot SQL Server memory leak follow the steps which we followed in previous post (https://mssqlwiki.com/2012/12/04/sql-server-memory-leak/)to leak the memory.

 

–Download HeapLeak.dll from this link.

–Create  an extended stored procedure in SQL Server

sp_addextendedproc ‘HeapLeak’,‘C:\HeapLeakdll\HeapLeak.dll’

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

}

 

9. Once you suspect memory is leaked. Go to the rules and take a full user dump by right clicking the Leak rule.

 

clip_image004

 

 

10. After the dump is captured , go to the advanced analysis tab, Add data files and select the dump which we generated.

 

11. Go to ToolsàOptions ->set the symbol path for analysis. Default Microsoft symbol path is below.

 

srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;c:\Release

 

Important: Replace c:\Release with symbol path of dll’s which you have loaded in SQL Server (optional)

 

 

11. In the available analysis script select memory pressure analyzers (memory analysis.asp).

 

12. Click start analysis.

 

clip_image006

 

 

13. Analysis might take a while depending on time it takes to load the symbols. Once the analysis is completed it would generate and open a HTML report.

This HTML report is  stored in C:\Program Files\DebugDiag\Reports\ by default and can be used for later reference.

 

I have attached a sample report which I collected when leaking memory using heapleak.dll in This link. You can use it for reference.

 

 

Report generated by debug diagnostic tool memory pressure analyzer will have the analysis summary and below Table Of Contents

 

sqlservr.exe__…………dmp

   Virtual Memory Analysis Report

   Heap Analysis Report

   Leak Analysis Report

   Outstanding allocation summary

    Detailed module report (Memory)

    Detailed module report (Handles)

 

 

 

14.  Analysis summary is  good portion in the report to start with and would give the module which is leaking the memory. Look at the below report.

 

clip_image008

 

 

15. Report has clearly indicated HeapLeak.dll has 255 MB of outstanding allocations. In heapleak.dll “Sub“ is the function which allocated this memory at offset 23.

 

 

16. Look at the virtual memory summary. It gives complete picture about memory distribution in the virtual address space. In the below summary memory reserved is 1.57 GB which is normal in 32-Bit SQL Server, but native heaps is 272.94 MB which is not normal.

Look at the heap summary there are 50 heaps.

 

 

 

 

 

clip_image010

 

clip_image012

 

17. Now look at the Outstanding allocation summary. It gives top 10 modules by allocation count and allocation size. In below summary HeapLeak has 26,182 allocations with size of 255.6 MB.

Note: In this report it is HeapLeak but in real time it might be any module which is leaking the memory

.

 

clip_image014

 

18. You can  also look at detailed module report(Memory).  It gives the memory allocation from each module along with function  and source line which allocated the memory (If you set the symbols for all the modules loaded).

 

clip_image016

 

 

By now we are sure that sub function in HeapLeak.dll has allocated 255 MB in line number 14 and has not released. The report also gives you the callstack samples that show the code path when functions was doing allocations. Refer This sample HTML report file. 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

Related posts:

Basics of SQL Server Memory Architecture

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

 

Posted in Debugging, Memory, Performance, SQL Server memory | Tagged: , , , , , | 5 Comments »

SQL Server memory leak

Posted by Karthick P.K on December 4, 2012

What is memory leak?

When a process allocates memory it is supposed to de-allocate it and release it back to OS. If it misses to de-allocate the memory due to flaw in code it is called as leak and It can cause memory pressure both to the operating system and application.

 

Myth about SQL Server memory leak

SQL Server memory management is designed to dynamically grow and shrink its memory based on the amount of available memory on the system and Max server memory setting in SQLServer.

Many times system admins look at the memory usage of SQLServer and assume SQLServer is leaking memory if they find SQL Server memory usage is high.

This is incorrect SQL Server is server based application and its memory manager is designed in such a way that it will keep growing its memory usage on need (Exception large pages) and will not scale down its usage unless there is low memory notification from Windows. We can control the memory usage of SQL Server using Max server memory setting in SQLServer. This setting limits the Bpool usage of SQL Server and doesn’t control the overall memory usage of SQLServer. There are portions of SQLServer memory that is allocated outside BPOOL (aks: MTL or MTR) we do not have a way to control how much memory SQL Server can use outside bpool, but non bool memory usage will be normally low and can be easily estimated by studying the components running in SQL Server.

Ex: If you want to set SQLServer to use only 10GB RAM on server. Consider how much memory SQL Server might need outside Bpool and set the “max server memory” setting accordingly. In this case if you estimate SQL Server will use 1.5GB outside Bpool then set the Max server memory to 8.5GB.

What can cause SQL Server Memory leak?

SQL Server code has a logic to allocate memory but doesn’t de-allocate it. If any of the components in SQL Server is causing a memory leak in SQL Server it can be identified easily using the DMV’s like sys.dm_os_memory_allocation,sys.dm_os_memory_clerks and sys.dm_os_memory_objects etc., but most of the memory leaks in SQL Server is caused by 3rd party Dll’s which are loaded in SQL Server process.

 

Note: All the memory allocations by Non SQL server Dll’s loaded in SQL Server will happens in “Mem to Leave”(outside the Bpool) and they are called as direct windows allocations (DWA) 

 

When there is out of memory conditions in SQL Server and if you suspect there is a memory leak.First thing to determine is who is consuming the memory. If SQL Server is not using majority of the memory in MemToLeave and still you get Mem to leave errors probably there is a leak and it caused by some DLL’s loaded in

SQL Server. Refer Section 1 (MTL error) in https://mssqlwiki.com/sqlwiki/sql-performance/troubleshooting-sql-server-memory/

 

Below query can be used to determine actual memory consumption by SQL Server in MTL.

select sum(multi_pages_kb)  from sys.dm_os_memory_clerks

 

If the memory consumption by SQL Server is very low and still if you see SQL Server memory errors like few below then focus on Leaks.

 

Ex:

SQL Server 2000

                WARNING: Failed to reserve contiguous memory of Size= 65536.

                WARNING: Clearing procedure cache to free contiguous memory.

                Error: 17802 “Could not create server event thread.”

                SQL Server could not spawn process_loginread thread.

SQL Server 2005/2008

                Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 122880

 

 

 

How to identify and troubleshoot the memory leak?

 

There are multiple ways in windows to identify who is leaking memory in process. We will discuss how to identify the memory leak using  

 

1. Windows debugger 2. Debug diagnostics tools for windows and 3. UMDH in this blog.

 

Let us create a sample DLL to load in SQL server process to leak memory and see how to use the tools I mentioned above to troubleshoot the leak. 

 

Download HeapLeak.dll from This link and install Microsoft Visual C++ 2010 Redistributable Package from this links 32-Bit or 64-Bit to make this DLL work.

 

–Create  an extended stored procedure in SQL Server

exec sp_addextendedproc  'HeapLeak','C:\HeapLeakdll\HeapLeak.dll'

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

 

 

We will also enable below trace flags in SQL Server to automatically generate filter dump when there  is out of memory errors and see how to identify who is leaking.

 

 

dbcc traceon (2551,-1) — 2551 is used to enable filter dump.

go

dbcc traceon (8004,-1) –8004 is used to take memory dump on first occurrence of OOM condition

go

–Note: Both the trace flags listed above are un-documented, So use it at your own risk and there is no guarantee that this trace flags will work in future versions of SQL Server

 

 

Once we enable the trace flag . We have to cause out memory error in SQL Server to generate OOM memory dump. We have leaked around 300 MB of memory from MTL by executing above extended SP 30 times.

 

Let use execute below script which create XML handles. Memory for xml handles is allocated from MTL we will get out of memory errors very soon because extended stored procedure which we executed has already leaked the memory.

(Do not run below XML script directly with out executing HeapLeak  Below script will cause OOM error because of handle created for each execution, but it is accounted as SQL Server allocation so will not help us to understand the  how to debug leaks caused by 3rd party DLL’s)

 

Note: 1. SQL Server memory dump will be generated in SQL Server error log folder. 
2. Size of MTL is 256 MB + Max worker threads *.5  in 32-Bit SQL Server.  So approximately 384 MB unless modified using –g switch.

 

DECLARE @idoc int
 
DECLARE @doc varchar(1000)
 
SET @doc ='<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
<Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
     <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
      <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
   </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
   <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
   <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
   </Order>           
</Customer>
</ROOT>'
 
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
 
go 10000

We will receive below error after few executions.

Msg 6624, Level 16, State 12, Procedure sp_xml_preparedocument, Line 1

XML document could not be created because server memory is low.

To analyze the dump download and Install Windows Debugger from http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

 

Step 1 (Load the memory dump file to debugger):

 

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

 

Step 2 (Set the symbol path to Microsoft symbols server):

 

on command window type

 

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

 

Step 3 (Load the symbols from Microsoft symbols server):

 

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

 

 

Step 4 (check if symbols are loaded):

 

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

:028> lmvm sqlservr

start    end        module name

01000000 02ba8000   sqlservr   (pdb symbols)          c:\websymbols\sqlservr.pdb\93AACB610C614E1EBAB0FFB42031691D2\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Oct 14 15:35:29 2005 (434F82E9)

    CheckSum:         01B73B9B

    ImageSize:        01BA8000

    File version:     2005.90.1399.0

    Product version:  9.0.1399.0

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0409.04e4

    CompanyName:      Microsoft Corporation

    ProductName:      Microsoft SQL Server

    InternalName:     SQLSERVR

    OriginalFilename: SQLSERVR.EXE

    ProductVersion:   9.00.1399.06

    FileVersion:      2005.090.1399.00

    FileDescription:  SQL Server Windows NT

    LegalCopyright:   © Microsoft Corp. All rights reserved.

    LegalTrademarks:  Microsoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

    Comments:         NT INTEL X86

 

Step 5 : (!address to display the memory information)

 

Use !address command to display the memory information of the process from dump.

 

 

0:028> !address -summary

 

——————– Usage SUMMARY ————————–

    TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage

   686a7000 ( 1710748) : 81.58%    81.80%    : RegionUsageIsVAD

     579000 (    5604) : 00.27%    00.00%    : RegionUsageFree

    4239000 (   67812) : 03.23%    03.24%    : RegionUsageImage

     ea6000 (   15000) : 00.72%    00.72%    : RegionUsageStack

      1e000 (     120) : 00.01%    00.01%    : RegionUsageTeb

   122d0000 (  297792) : 14.20%    14.24%    : RegionUsageHeap

          0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap

       1000 (       4) : 00.00%    00.00%    : RegionUsagePeb

       1000 (       4) : 00.00%    00.00%    : RegionUsageProcessParametrs

       1000 (       4) : 00.00%    00.00%    : RegionUsageEnvironmentBlock

       Tot: 7fff0000 (2097088 KB) Busy: 7fa77000 (2091484 KB)

 

——————– Type SUMMARY ————————–

    TotSize (      KB)   Pct(Tots)  Usage

     579000 (    5604) : 00.27%   : <free>

    4239000 (   67812) : 03.23%   : MEM_IMAGE

     5fc000 (    6128) : 00.29%   : MEM_MAPPED

   7b242000 ( 2017544) : 96.21%   : MEM_PRIVATE

 

——————– State SUMMARY ————————–

    TotSize (      KB)   Pct(Tots)  Usage

   1b7bd000 (  450292) : 21.47%   : MEM_COMMIT

     579000 (    5604) : 00.27%   : MEM_FREE

   642ba000 ( 1641192) : 78.26%   : MEM_RESERVE

 

Largest free region: Base 00000000 – Size 00010000 (64 KB)

 

 

Look at the RegionUsageHeap it is around 297792 KB and largest free region is just 64KB. We know SQL Server doesn’t use Heap’s extensively so normally the heap allocated by SQL Server will not go beyond few MB. In this case it is consuming around 290 MB and so other components which use MTL can easily fail.

Let us try to understand why the Heap is around 297792 KB and try to identify if there is  a pattern.

 

Step 6: (Let us use !heap –s to display summary information about the heap)

 

 

0:028> !heap -s

LFH Key                   : 0x672ddb11

  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast

                    (k)     (k)    (k)     (k) length      blocks cont. heap

—————————————————————————–

000d0000 00000002    1024    896    896      6     1     1    0      0   L 

001d0000 00008000      64     12     12     10     1     1    0      0     

002c0000 00001002    1088     96     96      2     1     1    0      0   L 

002e0000 00001002      64     52     52      3     2     1    0      0   L 

007c0000 00001002      64     64     64     56     1     0    0      0   L 

00d10000 00001002     256     24     24      8     1     1    0      0   L 

340b0000 00001002      64     28     28      1     0     1    0      0   L 

340c0000 00041002     256     12     12      4     1     1    0      0   L 

342a0000 00000002    1024     24     24      3     1     1    0      0   L 

34440000 00001002      64     48     48     40     2     1    0      0   L 

61cd0000 00011002     256     12     12      4     1     1    0      0   L 

61d10000 00001002      64     16     16      7     1     1    0      0   L 

61d20000 00001002      64     12     12      4     1     1    0      0   L 

62a90000 00001002    1024   1024   1024   1016     2     0    0      0   L 

62b90000 00001002    1024   1024   1024   1016     2     0    0      0   L 

62c90000 00001002     256     40     40      7     1     1    0      0   LFH

00770000 00001002      64     16     16      2     2     1    0      0   L 

63820000 00001002      64     24     24      3     1     1    0      0   L 

63830000 00001001   10240  10240  10240    160    21     0    0    bad     

64230000 00001001   10240  10240  10240    160    21     0    0    bad     

64c30000 00001001   10240  10240  10240    160    21     0    0    bad     

65630000 00001001   10240  10240  10240    160    21     0    0    bad     

66030000 00001001   10240  10240  10240    160    21     0    0    bad     

66a30000 00001001   10240  10240  10240    160    21     0    0    bad     

67430000 00001001   10240  10240  10240    160    21     0    0    bad     

68130000 00001001   10240  10240  10240    160    21     0    0    bad     

68b30000 00001001   10240  10240  10240    160    21     0    0    bad     

69530000 00001001   10240  10240  10240    160    21     0    0    bad     

69f30000 00001001   10240  10240  10240    160    21     0    0    bad     

6a930000 00001001   10240  10240  10240    160    21     0    0    bad     

6b330000 00001001   10240  10240  10240    160    21     0    0    bad     

6bd30000 00001001   10240  10240  10240    160    21     0    0    bad     

6c730000 00001001   10240  10240  10240    160    21     0    0    bad     

6d130000 00001001   10240  10240  10240    160    21     0    0    bad     

6db30000 00001001   10240  10240  10240    160    21     0    0    bad     

6e530000 00001001   10240  10240  10240    160    21     0    0    bad     

6ef30000 00001001   10240  10240  10240    160    21     0    0    bad     

6f930000 00001001   10240  10240  10240    160    21     0    0    bad     

70330000 00001001   10240  10240  10240    160    21     0    0    bad     

70d30000 00001001   10240  10240  10240    160    21     0    0    bad     

7a160000 00001001   10240  10240  10240    160    21     0    0    bad     

7ab60000 00001001   10240  10240  10240    160    21     0    0    bad     

7b560000 00001001   10240  10240  10240    160    21     0    0    bad     

7d0d0000 00001001   10240  10240  10240    160    21     0    0    bad     

7e030000 00001001   10240  10240  10240    160    21     0    0    bad     

7ea30000 00001001   10240  10240  10240    160    21     0    0    bad     

67f90000 00001003     256     16     16     14     1     1    0    bad     

71850000 00001003     256      4      4      2     1     1    0    bad     

71890000 00001003     256      4      4      2     1     1    0    bad     

67fd0000 00001002      64     16     16      4     1     1    0      0   L 

718d0000 00001003     256     40     40      3     1     1    0    bad     

71910000 00001003     256      4      4      2     1     1    0    bad     

71950000 00001003     256      4      4      2     1     1    0    bad     

71990000 00001003     256      4      4      2     1     1    0    bad     

67ff0000 00001002      64     16     16      4     1     1    0      0   L 

719d0000 00001003    1792   1352   1352      5     2     1    0    bad     

71a10000 00001003     256      4      4      2     1     1    0    bad     

71a50000 00001003     256      4      4      2     1     1    0    bad     

71a90000 00001002      64     16     16      1     0     1    0      0   L 

—————————————————————————–

 

 

If you look at the above out put you can clearly identify a pattern. There are multiple created and each of them is 10 MB. But how to identify who actually created them?

 

Step 7:

 

Let us pickup one of the heap which is 10 MB and display all the entries (allocations) with in this 10 MB heap using !heap with –h parameter

 

Heap I have picked is 63830000.

 

 

0:028> !heap -h 63830000

Index   Address  Name      Debugging options enabled

19:   63830000

    Segment at 63830000 to 64230000 (00a00000 bytes committed)

    Flags:                00001001

    ForceFlags:           00000001

    Granularity:          8 bytes

    Segment Reserve:      00100000

    Segment Commit:       00002000

    DeCommit Block Thres: 00000200

    DeCommit Total Thres: 00002000

    Total Free Size:      00005048

    Max. Allocation Size: 7ffdefff

    Lock Variable at:     00000000

    Next TagIndex:        0000

    Maximum TagIndex:     0000

    Tag Entries:          00000000

    PsuedoTag Entries:    00000000

    Virtual Alloc List:   63830050

    UCR FreeList:        63830588

    FreeList Usage:      00000000 00000000 00000000 00000000

    FreeList[ 00 ] at 63830178: 6422de88 . 638ad7e0      Unable to read nt!_HEAP_FREE_ENTRY structure at 638ad7e0

(1 block )

    Heap entries for Segment00 in Heap 63830000

        63830608: 00608 . 00040 [01] – busy (40)

        63830648: 00040 . 02808 [01] – busy (2800)

        641b6698: 02808 . 02808 [01] – busy (2800)

        ……………………………………

        ……………………………………

        ……………………………………

        ……………………………………

       

Step 8: (Let us pickup one of the heap entry (allocation) and try to identify what is in it)

 

 

0:028> db 641b6698

641b6698  01 05 01 05 93 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

641b66a8  6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20  ng the heap for

641b66b8  64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b  demo at MSSQLWIK

641b66c8  49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00  I.COM………..

641b66d8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b66e8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b66f8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b6708  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

 

0:028> db 63830648

63830648  01 05 08 00 89 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

63830658  6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20  ng the heap for

63830668  64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b  demo at MSSQLWIK

63830678  49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00  I.COM………..

63830688  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

63830698  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306a8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306b8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

 

 

Similarly you can dump multiple heap allocations to identify a pattern.

 

Now if you look at the memory dumped you see a string which might help you to identify the DLL which created the heap. There is a pattern in above heaps. All the heap allocations have below string

“Iam filing the heap for demo at MSSQLWIKI.COM”

 

Note : You can use L Size to dump more memory using db or dc command’s example db 63830648 L1500

 

Step 9:

Let us open the DLL which we loaded in SQL Server for testing using notepad and see if there is string which matches the pattern

 

clip_image002

 

 

Yes there is which proves that this DLL’s has caused the leak. In real time you may have to play with different heap allocations to identify the pattern.

 

This is one way to find the leaks from the memory dump after the leak has actually happened. It may not be always easy to find a pattern and identify the modules who allocated the memory, In such scenarios  you may have to track the leak using the tools like debug diagnostic tool, UMDH etc.In the my next blog I will post how to track memory leak using Debug diagnostics tool.

Continued in Debugging memory Leaks using Debug diagnostic tool

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

Related posts:

Basics of SQL Server Memory Architecture

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, Memory, Performance, SQL General, SQL Server Engine | Tagged: , , , , | 46 Comments »

SQL Server NUMA load distribution

Posted by Karthick P.K on November 22, 2012

When port affinity is not configured all the connection to SQL Server enters through single port and connections are tied to nodes in round robin basis.

 

We might end with Imbalance of Workload in NUMA systems under below conditions.

 

1. When a connection is tied (or) affinitized to a node, all the work from that connection will be completed on the same node (in which connection is directed)  if plans are serial.  We don’t consider the CPU load across the NUMA to pick up the node for serial plans, We use the same node in which connection is made for serial plan execution. Parallel query would use any NUMA node regardless of node this query came from. When all the queries execute from connections made to same node and if plans are also serial we might end up with overloading one Node while others are not fully used.

 

2. State of each nodes is internally maintained by SQL Server and updated every 2 seconds so there is  remote possibility that all parallel queries end with same node some times and cause spike in one node, while the other nodes are unused.

 

3. When there is imbalance between the number of online schedulers in each node (Ex: 16-CPU in Node1 and 4-CPU in Node2 ) and if all plans are serial (assume we have set Max DOP 1) We might end up with overloading the schedulers in node with least schedulers. while the schedulers on other node is underused, similarly when memory is shared across nodes we share it equally irrespective of number of schedulers on each node so in this case first 16 schedulers would have got  half of memory and 4 schedulers of second node would have got remaining half. So ensure you choose the CPU affinity carefully (Specially when you have installed SQL Server with limited processor license on system with larger number of CPU’s). 

 

 

Image 1:  sys.dm_os_schedulers (6 – CPU’S on node-0 and  1- CPU on node-1. Look at current task count)

 

clip_image002

 

Image 2 (Look at the current and pending task in node 0 and in node 1)

clip_image002[4]

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

 

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Debugging, Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: , , , | 3 Comments »

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

Posted by Karthick P.K on October 16, 2012

 

I have got few request’s from  SQL Server DBA’s in past to blog about analyzing SQL Server exceptions and assertions . After seeing lot of DBA’s getting stuck when they get EXCEPTION_ACCESS_VIOLATION (or) Assertion in SQL ServersI decided to write this blog.

This blog is published with intention to make DBA’s analyze and resolve EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion before contacting Microsoft support.  Exception and assertion are two different things. SQL handles both assertions and exceptions by writing the current thread’s stack to the Error log and generating a dump.  In simple An exception is an event that occurs during the execution of a program, and requires the execution of code outside the normal flow of control and assertion is the check that the programmer inserted into the code to make sure that some condition is true, If it returns false an assert is raised. SQL handles both assertions and exceptions by writing the current thread’s stack to the Error log and generating a dump, so trouble shooting steps are similar. 

 

You will find messages similar to one below in SQL Serve error logs when you get Exception or EXCEPTION_ACCESS_VIOLATION .

{

Error

External dump process returned no errors.
Using ‘dbghelp.dll’ version ’4.0.5′
SqlDumpExceptionHandler: Process 510 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
* *******************************************************************************
* BEGIN STACK DUMP:
*  Exception Address = 000000007752485C Module(ntdll+000000000002285C)

*   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION

*   Access Violation occurred reading address 0000041EA9AE2EF0

* Input Buffer 510 bytes –

ex_terminator – Last chance exception handling

}

You will find messages similar to one below in SQL Server error logs when you get an Assertion.

{

Error

spid323     Error: 17065, Severity: 16, State: 1.

spid323     SQL Server Assertion: File: < .cpp>, line = 2576 Failed Assertion = ‘fFalse’  This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted

SQL Server Assertion: File: <   .cpp>, line=2040 Failed Assertion =

}

To analyze the dump download and Install Windows Debugger from This Link 

 

Step 1 (Load the memory dump file to debugger):

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

Step 2 (Set the symbol path to Microsoft symbols server):

on command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3 (Load the symbols from Microsoft symbols server):

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

 

Step 4 (check if symbols are loaded):

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr

start             end                 module name

00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)

    CheckSum:         025FEB5E

    ImageSize:        02679000

    File version:     2005.90.4266.0

    Product version:  9.0.4266.0

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5 (Switch to exception context):

Type .ecxr

Step 6(Get the stack of thread which caused exception or assertion):

Type  kC  1000    //You will get the stack of thread which raised exception or assertion .

I have pasted one of the sample stack below, from the exception dump which I worked recently.  First thing to identify from stack is who is raising the exception. In the below stack look at the portion which is highlighted in red (In each frame before the ! symbol), that is the module which raised the exception (Exe or DLL name ).

If Exe/DLL name is Non Microsoft  module (Exe or DLL name ) then the exception is being caused by a third party component, you will need to work with the company that provided that component to get a solution. lmvm Exe/DLL name will give you the company name. For example: lmvm wininet

If Exe/DLL name is  SQLServr  (or) any other SQL Server modules then the exception is raised by SQL Server, In that case type kC 1000 and paste the stack in comments session of this blog (or) When you start thread in MSDN forums (or) In This face book group. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

Note: When you get Assertion make sure you post message line which contains   SQL Server Assertion: File: <Filename.cpp>, line = 2576 Failed Assertion =  ”  

 

0:000> kC 1000

Call Site

wininet!InternetFreeThreadInfo+0x26

wininet!InternetDestroyThreadInfo+0x40

wininet!DllMain_wininet+0xb5

wininet!__DllMainCRTStartup+0xdb

ntdll!LdrShutdownThread+0x155

ntdll!RtlExitUserThread+0x38

msvcr80!_endthreadex+0x27

msvcr80!_callthreadstartex+0x1e

msvcr80!_threadstartex+0x84

kernel32!BaseThreadInitThunk+0xd

ntdll!RtlUserThreadStart+0x1d

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Related posts:

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Debugging, SQL General, SQL Server Engine, Startup failures | Tagged: , , , , , , , , , , , , , , , | 254 Comments »

SQL Server Latch & Debugging latch time out

Posted by Karthick P.K on September 7, 2012

  

 

In a multithreaded process what would happens when a one thread updates a data or index page in memory while second thread is reading the same page?

What will happen when 1st  thread reads a data/index page in memory while 2nd thread is freeing the same page from memory?

Answer: We would end up with data or data structure inconsistency. To avoid inconsistency SQL Server uses Synchronization Mechanisms like Locks,Latches and Spinlocks.

 

We will discuss few key points about latches and how to debug latch timeout dumps in this blog.

What is Latch ?

They control the concurrent access to data pages and structures by multiple threads.  Latches provide physical data consistency of data pages and provide synchronization for data structures. Latches are not controllable by user like locks.

 

Types of the Latch:

Buffer (BUF) Latch

Used to synchronize access to BUF structures and their associated database pages.

 

Buffer “IO” Latch

A subset of BUF latches used when the BUF and associated data/index page is in the middle of an IO operation (Reading page from disk or writing page to disk).

 

Non-Buffer (Non-BUF) Latch

These are latches that are used to synchronize general in-memory data structures generally used by queries/tasks executed by parallel threads, auto grow operations , shrink operations etc. 

 

Latch modes

 

Keep (KP) Latches

Used to ensure that the page is not released from memory while it is in use. 

Shared (SH) Latches

Used for read-only access to data structures and prevent write access by others threads.

This mode allows shared access. 

SH is compatible with KP, SH, and UP.  It should be noted that although in general SH implies read-only access, it is not always the case. For buffer latches SH is the minimum mode required in order to read a data page.

Update (UP) Latches

Allows read access to the data structure(Compatible with SH and KP), but prevents other EX-latch access. 

Used for write operations when torn page detection is off and when AWE is not enabled.

Exclusive (EX) Latches

Prevents any read activity from occurring on the latched structure. EX is only compatible with KP.

Used during read IO during write IO when torn page detection is on or AWE is enabled.

Destroy (DT) Latches

Used when removing BUFs from the buffer pool, either by adding them to the free list or unmapping AWE buffers. 

 

 

Latch compatibility

  KP SH UP EX DT
KP Y Y Y Y N
SH Y Y Y N N
UP Y Y N N N
EX Y N N N N
DT N N N N N

 

 

How do you identify Latch contention?

 

Latch contention can be identified using below wait types in sysprocesses.

 

PAGEIOLATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on a physical I/O of a buffer pool page to complete. 

                                            1. PAGEIOLATCH_* are commonly solved by tuning the queries which are performing heavy IO (Commonly by adding, changing and removing indexes (or) statistics to reduce the amount of physical IO).

                                 2. Identifying if there is disk bottleneck and fixing them (Pageiolatch wait times (ex > 30 ms))

                      

                      

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO. 

1.       This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid. The wait_resource is the page number (the format is  dbid:file:pageno)

          that is being accessed. 

2.       We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine  whether contention  is for allocation, data or text.

3.       If the pages that SQL Server is most frequently waiting on are in tempdb database ,check the wait resource column for a page number in dbid 2. You may be facing tempdb allocation latch contention mentioned in    http://support.microsoft.com/kb/328551

4.       If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table.  In this case we need to choose a different

          clustered index key to spread the work across different pages.

 

LATCH_*:    Non-buf latch waits can be caused by variety of things.  We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101). 

1.       A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

2.       Auto Grow and auto shrink.

 

 

 

When a latch is requested by thread and If  that latch cannot be granted immediately because of some other thread holding a incompatible latch on same page or data structure then  the requestor must wait for the latch to be grantable.  Warning messages like one below is printed in SQL Server error log and a mini dump with all the threads is captures if the wait interval reaches 5 minutes (waittime 300). The warning message differs for buffer and non-buffer latches.

 

 

844: Time out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p.  Continuing to wait.

 

846: A time-out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit Id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Not continuing to wait.

 

847: Timeout occurred while waiting for latch: class ‘%ls’, id %p, type %d, Task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Continuing to wait.

 

Break up of above warning

type

The latch mode of the current latch acquire request.  This is a numerical value with the following mapping:  0 – NL (not used); 1 – KP; 2 – SH; 3 – UP; 4 – EX; 5 – DT.

 

task

Task for which we are trying to acquire latch.

 

Waittime

The total time waited for this latch acquire request in seconds.

 

owning task

The address of the Task that owns the latch, if available.

 

bp (Buffer latches only)

The address of the BUF structure corresponding to this buffer latch.

 

page (Buffer latches only.)

The page id for the page currently contained in the BUF structure.

 

database id (Buffer latches only.)

The database id for the page in the BUF.

 

 

Like troubleshooting blocking issues in SQL Server when there is a latch contention or timeout dump identify the owner of latch and troubleshoot why the latch is held by the owner for long time.

 

When there is latch timeout dump you will see a warning message similar to one below. Warning error message printed in SQL server errorlog before the dump is very important to find the owner thread of latch.

 

{

 

2012-01-18 00:52:03.16 spid69      A time-out occurred while waiting for buffer latch — type 4, bp 00000000ECFDAA00, page 1:6088, stat 0x4c1010f, database id: 4, allocation unit Id: 72057594043367424, task 0x0000000006E096D8 : 0, waittime 300, flags 0x19,

owning task 0x0000000006E08328. Not continuing to wait.

spid21s     **Dump thread – spid = 21, PSS = 0x0000000094622B60, EC = 0x0000000094622B70

spid21s     ***Stack Dump being sent to E:\Data\Disk1\MSSQL.1\MSSQL\LOG\SQLDump0009.txt

spid21s     * *******************************************************************************

spid21s     * BEGIN STACK DUMP:

spid21s     *   02/28/12 00:32:03 spid 21

spid21s     * Latch timeout

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_COUNT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_VIRTUAL_ROOT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

}

From the error message above we can easily understand we are trying to acquire latch on database id: 4, page 1:6088 (6088 page of first file) and we timed out because task 0x0000000006E08328 (owning task 0x0000000006E08328 in warning message)  is holding a latch on it.

Note: Task is simply a work request to be performed by the thread. (such as system tasks, login task, Ghost cleanup task etc.). Threads which execute the task will take required latches on need.

Let us see how to analyze latch timeout dump and get the owning thread of the Latch using the  owning task 0x0000000006E08328.

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr
start             end                 module name
00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb
    Loaded symbol image file: sqlservr.exe
    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe
    Image name: sqlservr.exe
    Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)
    CheckSum:         025FEB5E
    ImageSize:        02679000
    File version:     2005.90.4266.0
    Product version:  9.0.4266.0
    File flags:       0 (Mask 3F)
    File OS:          40000 NT Base
    File type:        1.0 App
    File date:        00000000.00000000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Step 5:

Use the below command to search thread stack to identify the thread which has reference to the owning task and it will be the thread which is owning the latch. Replace 0X0000000006E08328 with owning task in your errorlog

~*e .echo ThreadId:; ?? @$tid; r? @$t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? @$t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; s -d @$t1 @$t2 0X0000000006E08328 

ThreadId:
unsigned int 0x93c
ThreadId:
unsigned int 0x9a0
ThreadId:
unsigned int 0x9b4
00000000`091fdaf0  06e08328 00000000 00000000 00000000  (……………
00000000`091fdcb8  06e08328 00000000 091fdd70 00000000  (…….p…….
00000000`091fded0  06e08328 00000000 06e0e798 00000000  (……………
00000000`091fdf38  06e08328 00000000 00000002 00000000  (……………
00000000`091fec60  06e08328 00000000 0168883a 00000000  (…….:.h…..
00000000`091ff260  06e08328 00000000 000007d0 00000000  (……………
00000000`091ff2d0  06e08328 00000000 00000020 00000000  (……. …….
00000000`091ff5f8  06e08328 00000000 800306c0 00000000  (……………
00000000`091ff6c0  06e08328 00000000 00000000 00000000  (……………
00000000`091ff930  06e08328 00000000 00000000 00000001  (……………
00000000`091ff9b8  06e08328 00000000 00000000 00000000  (……………
00000000`091ffa38  06e08328 00000000 00000000 00000000  (……………
00000000`091ffc10  06e08328 00000000 03684080 00000000  (……..@h…..
00000000`091ffc90  06e08328 00000000 00000000 00000000  (……………
ThreadId:
unsigned int 0x9b8
ThreadId:
unsigned int 0x9bc
ThreadId:
unsigned int 0x9c0
……………
…………..

 

Step 6:

From the above out put we see thread 0x9b4 has reference to the pointer of owning task and it will be the thread which is owning the latch. Let us switch to the thread(0x9b4 ) which is executing the owning task and

then go through the stack to see why the thread is owning the latch for long time.

Step 7:

   ~~[0x9b4]s      ==> Switching to the thread (Replace 0x9b4 with your thread id which has reference to the po
ntdll!ZwWaitForSingleObject+0xa:
00000000`77ef047a c3              ret

Step 8:

0:002> kC  ==>  Print the stack
Call Site
ntdll!ZwWaitForSingleObject
kernel32!WaitForSingleObjectEx
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::Suspend
sqlservr!SOS_Event::Wait
sqlservr!BPool::FlushCache
sqlservr!checkpoint2
sqlservr!alloca_probe
sqlservr!ProcessCheckpointRequest
sqlservr!CheckpointLoop
sqlservr!ckptproc
sqlservr!SOS_Task::Param
::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!endthreadex
msvcr80!endthreadex

From the above stack we can understand that the thread which is owning the latch is executing checkpoint and flushing cache (Dirty buffers) to disk. If flushing buffers to disk (checkpoint) is taking a long time, then obviously there is disk bottleneck.

Similarly for any other latch time out issues first identify the owner thread of latch, read the stack of owner thread to understand the task performed by owner thread and troubleshoot the performance of task performed by owner thread.

If you want to see the stack of thread which is waiting, then pickup the task (task 0x0000000006E096D8 )from latch timeout warning message in errorlog instead of owning task (task 0x0000000006E08328) and use the command mentioned in step 5.

I hope this post will help you to learn and debug the latch timeout issues.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Related posts: Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, SQL General, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , , , , | 22 Comments »

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

Posted by Karthick P.K on August 21, 2012

Do you see below errors in SQL error along with dumps and stuck?

Non-yielding IOCP Listener

* BEGIN STACK DUMP:
*   05/06/12 03:54:59 spid 0
* Non-yielding IOCP Listener

Non-yielding Scheduler
* BEGIN STACK DUMP:
*   04/16/12 10:09:58 spid 6256
* Non-yielding Scheduler

Non-yielding Resource Monitor

* BEGIN STACK DUMP

*   01/22/09 19:11:16 spid 0

* Non-yielding Resource Monitor

External dump process returned no errors.
Date Time Server Process 0:0:0 (0x31e8) Worker 0x000000016F41d140 appears to be non-yielding on Scheduler 4. Thread creation time: 12010668087858. Approx Thread CPU Used: kernel 2 ms, user 60516 ms. Process Utilization 11%. System Idle 83%. Interval: 71227 ms.

Refer “How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps”  for analyzing the Non-yielding Scheduler, Non-yielding IOCP Listener and Non-yielding Resource Monitor Dumps.

If you are interested in just finding a quick resolution follow the below steps to get the Non-Yield stack from the dump and check if it is matching with any existing known issues in SQL Server. 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg

step 2:

Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 3:

on command window type    
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 4:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 5:

Type  .cxr sqlservr!g_copiedStackInfo+0X20   for SQL Server2005 and SQL Server2008/2008R2  (or)   .cxr sqlmin!g_copiedStackInfo+0X20  for SQL Server2012.

Type kc 100 and look at the stack to see if it matches with the stack of any of known issues in SQL Server listed below.

If kc 100 doesn’t display any stack and throws “WARNING: Frame IP not in any known module. Following frames may be wrong”  type .cxr to reset to default scope and try  .cxr sqlservr!g_copiedStackInfo+0X00c (In 32-Bit (X86)  SQL server valid offset for context is 0X00c Look at This blog to see how we identified the offset)

Note:If your stack doesn’t match with any of the stack  listed below then paste the stack in comments session of this blog (or) In This face book group.We will try to find the cause for you. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

 

Stack 1

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!ExecuteSql

sqlservr!CSpecProc::ExecuteSpecial

sqlservr!CXProc::Execute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB :2344600:FIX: "Non-yielding Scheduler" error may occur when you use the CONTAINSTABLE function together with many OR and AND predicates in SQL Server 2008 or in SQL Server 2008 R2

Stack 2

sqlservr!TMatchPattern

sqlservr!FMatchStrTxt

sqlservr!I8CharindexStrBhI8

sqlservr!CEs::GeneralEval4

sqlservr!CXStmtCond::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands 0x12a

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

2633357 FIX: "Non-yielding Scheduler" error might occur when you run a query that uses the CHARINDEX function in SQL Server 2008 R2

Stack 3

sqlservr!CItvlVal::Copy

sqlservr!CConstraintItvl::PcnstrItvlUnion

sqlservr!CConstraintProp::FBuildItvlFromOr

sqlservr!CConstraintProp::FBuildItvlFromPexpr

sqlservr!CConstraintProp::FAndItvlConstraint

sqlservr!CConstraintProp::AndNewConstraint

sqlservr!CConstraintProp::PcnstrDeriveSelect

sqlservr!CLogOp_Select::PcnstrDerive

sqlservr!CLogOpArg::PcnstrDeriveHandler

sqlservr!CLogOpArg::DeriveGroupProperties

sqlservr!COpArg::DeriveNormalizedGroupProperties

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!COptExpr::DeriveGroupProperties 0xc6

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

KB: 982376 FIX: A non-yielding scheduler error or an error 8623 occurs when you run a query that contains a large IN clause in SQL Server 2005,SQL Server 2008, or SQL Server 2008 R2

Stack 4

sqlservr!COptExpr::AdjustParallelPlan

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtSelect::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB: 943060 FIX: A query that has many outer joins takes a long time to compile in SQL Server 2005

Stack 5

sqlservr!CXid::GetBlockingTask

sqlservr!SNode::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchAndResolve

sqlservr!DeadlockMonitor::SearchTaskAndResolve

sqlservr!DeadlockMonitor::WorkLoop

sqlservr!lockMonitor

sqlservr!lockMonitorThread

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

956854 Cumulative update package 10 for SQL Server 2005 Service Pack 2

Stack 6

ntdll!ZwQueryAttributesFile

ntdll!RtlDoesFileExists_UstrEx

ntdll!LdrpSearchPath

ntdll!LdrpCheckForLoadedDll

ntdll!LdrpLoadDll

ntdll!LdrLoadDll

kernel32!LoadLibraryExW

mswsock!SockLoadHelperDll

mswsock!SockGetTdiName

mswsock!SockSocket

mswsock!WSPSocket

ws2_32!WSASocketW

ws2_32!WSASocketA

sqlservr!CreateSocket

sqlservr!AcceptObject::AsyncAccept

sqlservr!Tcp::AcceptDone

sqlservr!SNIAcceptDoneWithReturnCode

sqlservr!SNIAcceptDoneWrapper

sqlservr!SNIAcceptDoneRouter

sqlservr!SOS_Node::ListenOnIOCompletionPort

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 2711549 FIX: An error message is logged when you start SQL Server 2008 R2 or when a client sends a request to SQL Server 2008 R2

Stack 7

ntdll!ZwOpenKey

advapi32!LocalBaseRegOpenKey

advapi32!RegOpenKeyExW

sqlservr!COledbConnect::GetProviderOptions

sqlservr!COledbConnect::SetClsidFromProvider

sqlservr!COledbConnect::Init

sqlservr!CStmtExecProc::XretRemoteExec

sqlservr!CRemoteProcExecLevel::Execute

sqlservr!CStmtExecProc::XretWrapRemoteExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExec::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

KB2468047 FIX: Error code 17883 or "Non-yielding Scheduler" error may occur when you use the OPENQUERY function on SQL Server 2005

Stack 8

ntdll!ZwQueryVirtualMemory

psapi!QueryWorkingSetEx

sqlservr!BPool::Shrink

sqlservr!BPool::ReleaseAwayBufs

sqlservr!BPool::LazyWriter

sqlservr!lazywriter

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

967908 Cumulative update package 13 for SQL Server 2005 Service Pack 2 or 970279 Cumulative update package 4 for SQL Server 2005 Service Pack 3

Stack 9

sqlservr!LatchBase::ReleaseInternal

sqlservr!XVB::GetRecord

sqlservr!RowsetVersionScan::GetData

sqlservr!CQScanRowsetNew::GetRowWithPrefetch

sqlservr!CQScanRowsetNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::InitForExecute

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CExecuteStatement::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecStr::XretExecStrExecute

sqlservr!CXStmtExecStr::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB : 949595 FIX: Error message when you run a query that uses a join condition in SQL Server 2005: "Non-yielding Scheduler"

Stack 10

sqlservr!SQLServerLogIter::LookupScanCache

sqlservr!SQLServerLogIterForward::GetNextBlock

sqlservr!SQLServerLogIterForward::GetNext

sqlservr!LsMgr::GetEndOfLog

sqlservr!LsMgr::ProcessInternalRollForward

sqlservr!LsWorkRequest::Execute

sqlservr!LsWorker::ThreadRoutine

sqlservr!LsWorker::ThreadRoutine

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 970044 FIX: Error message when you try to break database mirroring between two servers that are running SQL Server 2008: "Non-yielding Scheduler"

Stack 11

sqlservr!CLinkedMap

sqlservr!CCheckReadersAndWriters::Release

sqlservr!CMainIlb::~CMainIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 12

ntdll!ZwFreeVirtualMemory

KERNELBASE!VirtualFree

sqlservr!MemoryNode::VirtualFree

sqlservr!ReservedMemBlock::FreeMemory

sqlservr!MultiPageAllocator::FreePagesInternal

sqlservr!MultiPageAllocator::FreePages

sqlservr!MemoryNode::FreePagesInternal

sqlservr!MemoryClerkInternal::FreePagesInline

sqlservr!CVarPageMgr::Release

sqlservr!CMemObj::Free

sqlservr!CMemThread<CMemObj>::Free

sqlservr!LockBytesSS::~LockBytesSS

sqlservr!LockBytesHolder::`scalar deleting destructor’

sqlservr!LockBytesHolder::DestroyCallback

sqlservr!CacheLbss

sqlservr!LockBytesSS::Release

sqlservr!CQueryIlb::~CQueryIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 13

sqlservr!CompareStringWEnglishNoCase

sqlservr!CTypeInfo::ICompW

sqlservr!CDefaultCollation::ICompW

sqlservr!CDependElem::ICompare

sqlservr!CDependList::Find

sqlservr!CDependList::Insert

sqlservr!CDependList::Concat

sqlservr!CDependList::CollectDependencies

sqlservr!FillSysdepends

sqlservr!CProchdr::CreateProc

sqlservr!CSQLSource::PerformPphFakeExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2306162 FIX: Poor performance and some occasional non-yielding scheduler errors occur when you create a complex view that references a large amount of nested views or tables in SQL Server 2008 or in SQL Server 2008 R2

Stack 14

sqlservr!BaseSharedHoBt::GetHoBtId

sqlservr!HoBtFactory::GetDeferredDropCacheHobt

sqlservr!DropDeferredWorkTables

sqlservr!GhostRecordCleanupTask

sqlservr!CGhostCleanupTask::ProcessTskPkt

sqlservr!TaskReqPktTimer::ExecuteTask

sqlservr!OnDemandTaskContext::ProcessTskPkt

sqlservr!SystemTaskContext::ExecuteFunc

sqlservr!SystemTaskEntryPoint

sqlservr!OnDemandTaskContext::FuncEntryPoint

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SchedulerManager::FiberEntryPoint

kernel32!BaseFiberStart

kernel32!RtlCompareMemoryStub

KB 2505256 FIX: Poor performance when worktables that are marked for deferred drop are cleaned up in SQL Server 2008 R2

Stack 15

ntdll!ZwReadFile

kernel32!ReadFile

sqlservr!DiskReadAsync

sqlservr!FCB::AsyncRead

sqlservr!BackupIoRequest::StartDatabaseRead

sqlservr!BackupCopyMachine::CopyFileToBackupSet0

sqlservr!BackupCopyMachine::CopyFileToBackupSet

KB 960543 FIX: SQL Server 2005 or SQL Server 2008 may stop responding when you are performing a backup

Stack 16

sqlservr!Worker::ProfilingCPUTicks::ProfilingCpuTicksCallback

sqlservr!SOS_Scheduler::TaskTransition

sqlservr!SOS_Scheduler::Switch

sqlservr!SOS_Scheduler::SuspendNonPreemptive

sqlservr!SOS_Scheduler::Suspend

sqlservr!SOS_Task::Sleep

sqlservr!BTreeMgr::Seek

sqlservr!BTreeMgr::GetHPageIdWithKey

sqlservr!IndexPageManager::GetPageWithKey

sqlservr!GetRowForKeyValue

sqlservr!IndexRowScanner::EstablishInitialKeyOrderPosition

sqlservr!IndexDataSetSession::GetNextRowValuesInternal

sqlservr!RowsetNewSS::GetNextRows

sqlservr!CMEDScan::FGetRow

sqlservr!CMEDCatalogOwner::GetOwnerAliasIdFromSid

sqlservr!CMEDCatalogOwner::LookupPrimaryIdInCatalog

sqlservr!CMEDCacheEntryFactory::GetProxiedCacheEntryByAltKey

sqlservr!CMEDCatalogOwner::GetProxyOwnerBySID

sqlservr!CMEDProxyDatabase::GetOwnerBySID

sqlservr!GetDefaultSchemaIdCrossDb

sqlservr!GetCtxtSchemaId

sqlservr!CMEDAccess::GetMultiNameObject

sqlservr!CRangeObject::CImplName::FSameObject

sqlservr!CRangeObject::FCheckImplNames

sqlservr!CRangeObject::XretPostSchemaChecks

sqlservr!CRangeObject::XretSchemaChanged

sqlservr!CRangeTable::XretSchemaChanged

sqlservr!CEnvCollection::XretSchemaChanged

sqlservr!CXStmtQuery::XretSchemaChanged

sqlservr!CXStmtSelect::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

KB 2699013 FIX: SQL Server 2008 R2 or SQL Server 2008 stops responding and a "Non-yielding Scheduler" error is logged

Stack 17

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

KB 967169 FIX: When you run an UPDATE statement against a table that has a FOR UPDATE trigger that joins the DELETED and INSERTED tables, the query takes a long time to finish

Stack 18

msvcr80!memcpy

BackupString::vswcatf

BackupString::swcatf

BackupHistory::GenerateBackupDetails

sqlservr!BackupHistory::GenerateBackupSet

KB 917971 FIX: You may receive more than 100,000 page faults when you try to back up a SQL Server 2005 database that contains hundreds of files and file groups.

Stack 20

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

Stack 21

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

 

Related blogs: 

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

SQL Server Latch & Debugging latch time out

How to Analyze "Deadlocked Schedulers" Dumps

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group to get answers for all your SQL Server related questions.

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Debugging, Performance, SQL General, SQL Server Tools | Tagged: , , , , , , , , , , , , , , , , , , , , , , , , , , , | 161 Comments »

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

Posted by Karthick P.K on August 17, 2012

Note: If you are interested only in finding a quick resolution for  Non-Yielding scheduler or Non-yielding IOCP Listener dumps or  Non-yielding  resource monitor Jump to THIS LINK. Continue reading this article if you like to learn How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps

How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps?

This blog is targeted towards analyzing Non-Yielding scheduler dumps and not targeted on explaining how Non-Yield Detection works please read http://technet.microsoft.com/en-us/library/cc917684.aspx to understand how the Non-Yield Detection works but let us recollect few key points before we get in to analysis.

 

1.   SQL Server has its own logical schedulers to schedule the SQL Server workers.

2.   The scheduler is called the User Mode Scheduler (UMS) in SQL Server 2000 and the SQL Server Operating System (SOS) Scheduler in SQL Server 2005

3.   Logical scheduler makes the worker non-preemptive to the database engine. The worker owns the scheduler until it yields to another worker on the same scheduler.

 

What if the threads which owns the scheduler executes for long time without yielding (or) forever and does not yield to give quantum for the other threads waiting in the scheduler?

Answer: Other threads would not get CPU cycles and starve the SQL Server performance.

 

What if the thread is not able to finish its work with in quantum –(4 Milliseconds) for example large for loop?

SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) is expected to yield back to SQL Server scheduler when it exceeds 4ms and rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum.

 

What if the thread did not yield after 4 Milliseconds?

SQL Server has its scheduler monitor to track this. SchedulerMonitor algorithm is to check non-Yield condition every 5 seconds during which the basic check (Check if the thread is executing for >4Ms) is done . When the basic check evaluates to true, tracking of the worker begins and if the thread doesn’t yield beyond 10seconds (Nonyield threshold) after the tracking begins then threshold check becomes true. So there is approximately 15 seconds between the time of the last yield on the scheduler and the time that the threshold check becomes true and tracking continues.

 

A dump is taken when an specific nonyield situation has reached 60 seconds in total duration. Once a 17883 mini-dump is captured, no further 17883 mini-dumps are captured until trace flag -T1262 is enabled or the SQL Server process is restarted. However, 17883 error message reporting continues, regardless of the mini-dump capture.  Also when –T1262 is enabled mini-dump is captured when the Non-Yield threshold check becomes true (15 seconds)

and at subsequent 60-second intervals for the same nonyield occurrence. A new nonyielding occurrence causes dump captures to occur again.

When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

Non-yielding IOCP Listener

An identical algorithm is used to detect non-yielding I/O completion routines, counting completed I/O completion routines instead of number of yields. Scheduler Monitor takes a dump when it notices the IOCP has not moved for 10 seconds. Analyzing Non-yielding IOCP Listener is also same as analyzing non-yielding scheduler dump

 

 

Let us step in to analysis of  non-yielding scheduler dump which I got in SQL Server 2012

 

Sample 1

When a non-yielding scheduler dump is generated following error message is logged in SQL Error log and SQLDump000n.mdmp is generated in log folder.

 

{

    * *******************************************************************************

    *

    * BEGIN STACK DUMP:

    *   04/16/12 10:09:58 spid 6256

    *

    * Non-yielding Scheduler

    *

     * *******************************************************************************

Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0. Thread creation time: 12979065797278. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 70110 ms.

}

 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Microsoft (R) Windows Debugger Version 6.11.0001.404 X86

Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Users\karthick \Desktop\Karthick\SQLDump0009.mdmp]

User Mini Dump File: Only registers, stack and portions of memory are available

 

Comment: ‘Stack Trace’

Comment: ‘Non-yielding Scheduler’  èType of the dump

Symbol search path is: *** Invalid ***

 

 

Executable search path is:

Windows 7 Version 7601 (Service Pack 1) MP (24 procs) Free x64

Product: Server, suite: Enterprise TerminalServer SingleUserTS  à Windows version and system information

Machine Name:

Debug session time: Mon Apr 16 09:09:59.000 2012 (GMT-7)

System Uptime: 9 days 15:57:03.155

Process Uptime: 0 days 0:06:48.000

……………………………………………………….

……………………………..

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

0:146> lmvm sqlservr                                               

start             end                 module name

00000000`ffad0000 00000000`ffb0e000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21553ADC31784A4D933974A386EE2E052\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.S1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Apr 06 08:19:38 2012 (4F7F098A)

    CheckSum:         00036498

    ImageSize:        0003E000

    File version:     2011.110.2316.0                                                         

    Product version:  11.0.2316.0 èSQL Server Version

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5:

Use !findstack command to find scheduler monitor thread (sqlservr!SQL_SOSNonYieldSchedulerCallback )

 

 

0:146> !findstack sqlservr!SQL_SOSNonYieldSchedulerCallback

Thread 006, 1 frame(s) match  è Thread ID of scheduler monitor.

        * 07 00000000336be420 000007fee36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

 

Step 6:

Switch to scheduler monitor thread using ~[threadID]s command

 

0:146> ~[006]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

 

Step 7:

Use kC or kP command to look at the stack on scheduler monitor thread.

 

0:006> kP                        Child-SP          RetAddr           Call Site

00000000`3369c218 000007fe`fcd210ac ntdll!NtWaitForSingleObject+0xa

00000000`3369c220 00000000`ffaeecce KERNELBASE!WaitForSingleObjectEx+0x79

00000000`3369c2c0 00000000`ffaef1a4 sqlservr!CDmpDump::DumpInternal+0x20e

00000000`3369c360 000007fe`dbe50794 sqlservr!CDmpDump::Dump+0x24

00000000`3369c3a0 000007fe`dbe511e6 sqllang!SQLDumperLibraryInvoke+0x2e4

00000000`3369c640 000007fe`dbe16ddb sqllang!CImageHelper::DoMiniDump+0x426

00000000`3369c830 00000000`ffae307f sqllang!stackTrace+0xbdb

00000000`3369e270 000007fe`e36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

00000000`336be430 000007fe`e36866da sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0x375

00000000`336bebf0 000007fe`e364b53f sqldk!SchedulerMonitor::CheckScheduler+0x307

00000000`336bed60 000007fe`e364aa8f sqldk!SchedulerMonitor::CheckSchedulers+0x211

00000000`336bf1f0 000007fe`e371c779 sqldk!SchedulerMonitor::Run+0xfb

00000000`336bf320 000007fe`e3642f10 sqldk!SchedulerMonitor::EntryPoint+0x9

00000000`336bf350 000007fe`e3642d04 sqldk!SOS_Task::Param::Execute+0x21e

00000000`336bf950 000007fe`e36429e6 sqldk!SOS_Scheduler::RunTask+0xa8

00000000`336bf9c0 000007fe`e365a29f sqldk!SOS_Scheduler::ProcessTasks+0x299

00000000`336bfa40 000007fe`e365a3b0 sqldk!SchedulerManager::WorkerEntryPoint+0x261

00000000`336bfae0 000007fe`e3659fcf sqldk!SystemThread::RunWorker+0x8f

00000000`336bfb10 000007fe`e365aaf8 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8

00000000`336bfbc0 00000000`76ad652d sqldk!SchedulerManager::ThreadEntryPoint+0x236

 

Step 8:

Switch to the thread which is reported as Non-Yielding in SQL Server error log using ~~[ThreadID]s command.

 

Recollect the below error in SQL error log.  Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0.

 

0:006> ~~[0x1cb0]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

Step 9:

Look at the current stack of Non-yielding thread. using kc command

 

0:146> kc 10

Call Site

ntdll!NtWaitForSingleObject

KERNELBASE!WaitForSingleObjectEx

sqldk!SOS_Scheduler::SwitchContext

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqldk!WorkDispatcher::DequeueTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

 

 

 

 

Recollect the information which we discussed earlier in this blog. When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

 

 

Look at the above stack it cannot be Non-Yield thread because we see SwitchNonPreemptive  and SwitchContex  in the thread. 

{

SwitchPreemptive or SuspendNonPreemptive forces another worker to become owner of the scheduler. It does this by making the head of the runnable list the new owner and removing the current worker from logical scheduler control. The worker transitions ownership and is removed from SQL scheduler control until the external activity is complete. When the external activity is complete, the worker returns to the end of the runnable list by calling SwitchNonPreemptive.

}

 

Step 10:

Search for the copied stack structure using X commad

 

0:146> X sqlmin!*copiedStack*    

000007fe`df11bfe0 sqlmin!g_copiedStackInfo = <no type information>               

 

It is sqlmin!g_copiedStackInfo in this dump because this is dump is from SQL Server2012. In earlier versions of SQL Server it is sqlservr!g_copiedStackInfo

 

Step 11:

We know copied CONTEXT is stored in g_CopiedStackInfo how to find the valid offset in this structure ? If the CONTEXT is valid Rip,Rbp and RSP registers has to be valid if the dump is from X64 system and Eip,Ebp and Esp has to be valid if it is X86 system.

 

 

Let us do dd on sqlmin!g_copiedStackInfo  (remember it is sqlservr!g_copiedStackInfo in SQL2008/2005/2000)

 

0:146> dd sqlmin!g_copiedStackInfo     

000007fe`df11bfe0  00000001 00000000 3369e2e0 00000000

000007fe`df11bff0  0000a998 00000000 00000000 00000000

000007fe`df11c000  00000000 00000000 00000000 00000000

000007fe`df11c010  00000000 00000000 00000000 00000000

000007fe`df11c020  00000000 00000000 00000000 00000000

000007fe`df11c030  0010000b 00001f80 00000033 00000000

000007fe`df11c040  002b0000 00000246 00000000 00000000

000007fe`df11c050  00000000 00000000 00000000 00000000

 

Step 12:

Let us dump each address with context and verify if Rip,Rbp and RSP registers are valid. This dump is from 64-bit SQL Server so we are using Rip,Rbp and RSP registers. If the dump is from x86 system use Eip,Ebp and Esp .

 

0:146> dt 000007fe`df11bfe0 CONTEXT Rip Rsp Rbp     èType cast  000007fe`df11bfe0  with CONTEXT  .  RsP,Rbp and Rip is invalid so this is not valid offset.

ole32!CONTEXT

   +0x098 Rsp : 2

   +0x0a0 Rbp : 0x5a4d

   +0x0f8 Rip : 0

0:146> dt 000007fe`df11bff0 CONTEXT Rip Rsp Rbp      è Type cast 000007fe`df11bff0  with CONTEXT  . RIP is invalid

ole32!CONTEXT

   +0x098 Rsp : 0x72120000

   +0x0a0 Rbp : 0x3369e3cc

   +0x0f8 Rip : 0xf2

0:146> dt 000007fe`df11c000 CONTEXT Rip Rsp Rbp   è Type cast 000007fe`df11c000 with CONTEXT  . RIP ,RSP  and Rbp is valid. So this should be a valid context.Let us switch to this context and verify

ole32!CONTEXT

   +0x098 Rsp : 0x3369e2e0

   +0x0a0 Rbp : 0x3369e498

   +0x0f8 Rip : 0x76d3139a

 

Now we know 000007fe`df11c000 is valid context.  So 000007fe`df11c000 -sqlmin!g_copiedStackInfo =0x20 hence  we can use .cxr sqlmin!g_copiedStackInfo+0X20 directly to switch to the context of copied stack.

 

Step 13:

 

Switch the context of copied stack using .cxr 000007fe`df11c000  or .cxr sqlmin!g_copiedStackInfo+0X20

 

0:146> .cxr 000007fe`df11c000

rax=0000000000000002 rbx=000000003369e3cc rcx=0000000000005a4d

rdx=0000000072120000 rsi=000000000000026c rdi=0000000000000000

rip=0000000076d3139a rsp=000000003369e2e0 rbp=000000003369e498

r8=00000000000000b0  r9=0000000084a85310 r10=0000000000000000

r11=0000000000000000 r12=0000000000000000 r13=0000000000000004

r14=00000000000000f2 r15=0000000000000001

iopl=0         nv up ei pl zr na po nc

cs=0033  ss=002b  ds=0000  es=0000  fs=0000  gs=0000             efl=00000246

ntdll!NtWriteFile+0xa:

00000000`76d3139a c3              ret

 

Step 14:

 

Dump the stack of copied context using kP or kc (kc displays clean stack trace. each display line includes only the module name and the function name)

 

 

0:146> Kc

  *** Stack trace for last set context – .thread/.cxr resets it

Call Site

ntdll!NtWriteFile

KERNELBASE!WriteFile

kernel32!WriteFileImplementation

sqllang!CErrorReportingManager::WriteToErrLog

sqllang!CErrorReportingManager::SendErrorToErrLog

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqllang!ReportLoginFailure

sqllang!FRedoLogin

sqllang!login

sqllang!process_login_finish

sqllang!process_commands

sqldk!SOS_Task::Param::Execute

sqldk!SOS_Scheduler::RunTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

Now compare the current stack and the copied stack to see if the thread has progressed after No-Yield condition. Stack look completely different So the Non-Yield thread has progressed and completed .It is doing new work now. Also to understand why the thread was Non-Yielding look at the copied stack and not the current unless both the stacks are same. 

 

 

Current thread stack which we dumped using the thread ID in SQL Errorlog.

0:146> kc                                                                                    

Copied thread stack which SQL Server copied to global structure before generating the dump.

0:146> Kc 10

ntdll!NtWaitForSingleObject

KERNELBASE!WriteFile

KERNELBASE!WaitForSingleObjectEx

kernel32!WriteFileImplementation

sqldk!SOS_Scheduler::SwitchContext

sqllang!CErrorReportingManager::WriteToErrLog

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqllang!CErrorReportingManager::SendErrorToErrLog

sqldk!WorkDispatcher::DequeueTask

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqldk!SOS_Scheduler::ProcessTasks

sqllang!ReportLoginFailure

sqldk!SchedulerManager::WorkerEntryPoint

sqllang!FRedoLogin

sqldk!SystemThread::RunWorker

sqllang!login

sqldk!SystemThreadDispatcher::ProcessWorker

sqllang!process_login_finish

sqldk!SchedulerManager::ThreadEntryPoint

sqllang!process_commands

kernel32!BaseThreadInitThunk

sqldk!SOS_Task::Param::Execute

ntdll!RtlUserThreadStart

sqldk!SOS_Scheduler::RunTask

 

sqldk!SOS_Scheduler::ProcessTasks

 

sqldk!SchedulerManager::WorkerEntryPoint

 

sqldk!SystemThread::RunWorker

 

sqldk!SystemThreadDispatcher::ProcessWorker

 

sqldk!SchedulerManager::ThreadEntryPoint

 

Now let us read the copied stack and understand what would have caused a Non-Yield condition (read from bottom to top)

 

ntdll!NtWriteFile  -> WriteFile function is at top of the stack and did not complete in expected time.          
KERNELBASE!WriteFile       
kernel32!WriteFileImplementation 
sqllang!CErrorReportingManager::WriteToErrLog  ->Write the error to errorlog
sqllang!CErrorReportingManager::SendErrorToErrLog  ->Send the error to SQL Server errorlog
sqllang!CErrorReportingManager::CwchFormatAndPrint  ->format the error
sqllang!ReportLoginFailure  ->Login failed
sqllang!FRedoLogin        
sqllang!login ->Login task is processed

 

 

From the above stack we are able to understand SQL Server is writing login failed information to SQL Error log (Synchronously) and the writefile function has taken long time and did not return.So there is Non-Yield scheduler dump.

 

When will writefile operation take long time?

When there is Disk bottleneck. So the obvious solution for this issue is to  fix the performance of the disk.

Similarly there could be numerous other reasons for Non-Yield condition so look at the stack of your Non-Yield scheduler dump using the method above and make out what could have caused the Non-Yield condition.

Also refer THIS LINK to check if your stack matches with any of the known issues in SQL Server.

To Be continued…………………………

Related posts:

  • How to Analyze "Deadlocked Schedulers" Dumps?
  • Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes
  • SQL Server generated Access Violation dumps while accessing oracle linked servers.
  • SQL Server Latch & Debugging latch time out

    If you liked this post, do like us on Face Book at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

  • Posted in Configuration, Debugging, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , , | 28 Comments »

    SQL Server generated Access Violation dumps while accessing oracle linked servers.

    Posted by Karthick P.K on August 8, 2012

     

    When you run queries against Oracle linked servers from SQL Server you see errors like one below and access violation dumps are generated (SQLDump00XX.mdmp files in SQL Server error log folder).

    {

    External dump process returned no errors.
    Using ‘dbghelp.dll’ version ‘4.0.5’
    SqlDumpExceptionHandler: Process 510 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
    * *******************************************************************************
    *
    * BEGIN STACK DUMP:
    *  Exception Address = 000000007752485C Module(ntdll+000000000002285C)

    *   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION

    *   Access Violation occurred reading address 0000041EA9AE2EF0

    * Input Buffer 510 bytes –

    }

     

    To analyze the dump download and Install Windows Debugger from This  link

    1. Open Windbg

    2. Choose File menu –> select Open crash dump –>Select the Dump file
    (SQLDump000#.mdmp)

    3. on command window type    
    .sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

    4. Type .reload /f and hit enter. This will force debugger to immediately
    load all the symbols.

    5. Type .ecxr

    6. Type  kL    and look at the stack

     

    {

    ntdll!RtlpFreeUserBlock
    ntdll!RtlFreeHeap

    }

    If you see above frames in the top of the stack and if you are using Oracle Provider for OLE DB – Version: 11.2.0.1 and later.

    There is a known issue with Oracle Provider for OLE DB – Version: 11.2.0.1 and later   when  — STYLE COMMENTS are used in linked server queries ,SP’s Etc .

    Resolution

    1. Remove the – -style comments
    OR
    2) use /* */ for the comments instead of —

     

    If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

     

    Regards

    Karthick P.K

    Posted in Connectivity, Debugging, SQL Server Engine | Tagged: , , , , | 4 Comments »

    How to Analyze Deadlocked Schedulers Dumps?

    Posted by Karthick P.K on June 15, 2010

    How to Analyze "Deadlocked Schedulers" Dumps?

    Do you see "Deadlocked Schedulers" errors similar to one below and stuck?

    From SQL Server Errorlog

    **Dump thread – spid = 0, PSS = 0x0000000000000000, EC = 0x0000000000000000

    ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0001.txt

    * BEGIN STACK DUMP:

    * Deadlocked Schedulers

    * Short Stack Dump

    Stack Signature for the dump is 0x00000000000003D0

    New queries assigned to process on Node 0 have not been picked  up by a worker thread in the last 60 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time.  Use the "max worker threads" configuration option to increase number  of allowable threads, or optimize current running queries.  SQL Process Utilization: 0%. System Idle: 69%.

    New queries assigned to process on Node 3 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 6%. System Idle: 90%.

    Cause

    We get Deadlocked Schedulers error (New queries assigned to process on Node n have not been picked up by a worker thread in the last 300 seconds) and dump when Scheduler Monitor detects threads(workers) are not Progressing on schedulers of one (or) all nodes for 60 seconds (300 seconds in Denali) and there are new work request (tasks) but there is no idle threads to pick up the the work request and SQL Server could not spawn new thread to process the new work request (Max worker threads reached).

    Some of common causes are most of the tasks are waiting on a single resource because of resource bottleneck, excessive blocking, excessive parallelism, very long running Queries executed by all workers, all threads waiting on trace write waits etc..

    SQL Server will not accept new connections or cannot process new requests  when there is deadlocked scheduler condition (Exception: deadlocked condition on single node). 

    Note: In systems with multiple nodes (NUMA)  If all the threads which belong to schedulers of single node is exhausted (or) Schedulers not progressing on single node can cause deadlocked scheduler condition.

    Detailed explanation about dead lock schedulers can be found in This blog from my mentor J.P

    Steps to analyze "Deadlocked Schedulers" Dumps.

    To analyze the dump download and Install Windows Debugger from This  link

    Step 1:

    Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

    Step 2:

    on command window type
    .sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

    Step 3:

    Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

    Step 4:

    Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

    0:002> lmvm sqlservr
    start             end                 module name
    00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb
        Loaded symbol image file: sqlservr.exe
        Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe
        Image name: sqlservr.exe
        Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)
        CheckSum:         025FEB5E
        ImageSize:        02679000
        File version:     2005.90.4266.0
        Product version:  9.0.4266.0
        File flags:       0 (Mask 3F)
        File OS:          40000 NT Base
        File type:        1.0 App
        File date:        00000000.00000000
        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

    Step 5:

    Type  ~*kL 20   and look at the stack of all the threads  to find what majority of threads are doing.

    Note: If you find most of your threads are waiting on stack which is not listed below please paste the stack in comments session of this blog (or) In MSSQLWIKI  face book group we will try to answer you. If you don’t get prompt reply from the community, you may need to open a support ticket with Microsoft.

    1. If it is blocking issue and If most of the threads are  waiting to acquire a lock you will find the most of the stack similar to one below. (We try to acquire lock and go to wait, since someone is holding a lock)

    ntdll!ZwSignalAndWaitForSingleObject

    kernel32!SignalObjectAndWait

    sqlservr!SOS_Scheduler::SwitchContext

    sqlservr!SOS_Scheduler::Suspend

    sqlservr!SOS_Event::Wait

    sqlservr!LockOwner::Sleep

    sqlservr!lck_lockInternal

    sqlservr!GetLock

    2. If most of threads are stuck while trying to write profiler events to the destination you might find stack similar to one below

    ntdll!ZwSignalAndWaitForSingleObject

    kernel32!SignalObjectAndWait

    sqlservr!SOS_Scheduler::SwitchContext

    sqlservr!SOS_Task::Sleep

    sqlservr!CTraceRowsetIoProvider::GetFreeBuffers

    sqlservr!CTraceWriteRequest::InitForRowsetTrace

    sqlservr!CTraceRowsetIoProvider::InitializeWriteRequest

    sqlservr!CTrace::WriteRecord

    sqlservr!CTraceController::ProduceRecord

    sqlservr!CTraceData::TracePreBatchEvent

    sqlservr!CSQLSource::Execute

    sqlservr!process_request

    sqlservr!process_commands

    sqlservr!SOS_Task::Param::Execute

    sqlservr!SOS_Scheduler::RunTask

    sqlservr!SOS_Scheduler::ProcessTasks

    3. If your stack’s  are like one below refer http://support.microsoft.com/default.aspx?scid=kb;EN-US;974205

    sqlservr!SpinlockBase::Sleep
    sqlservr!SpinlockBase::SpinToAcquire
    sqlservr!TSyncHashTable_EntryAccessorsqlservr!CQSIndexStatsMgr::AddNewMissingIndex
    sqlservr!CIdxSuggestion::Register
    sqlservr!COptExpr::PqteConvert
    sqlservr!CPhyOp_Top::PqteConvert
    sqlservr!COptExpr::PqteConvert
    sqlservr!COptExpr::PqteConvertTree
    sqlservr!COptContext::PcxteOptimizeQuery
    sqlservr!CQuery::Optimize
    sqlservr!CQuery::PqoBuild
    sqlservr!CStmtQuery::InitQuery
    sqlservr!CStmtSelect::Init

    4. If you see many stacks like the one below it could be BPOOL memory pressure (or) Lazy writer  waiting on I/O

    sqlservr!BPool::Steal

    sqlservr!SQLSinglePageAllocator::AllocatePages

    sqlservr!MemoryNode::AllocatePagesInternal

    sqlservr!MemoryClerkInternal::AllocatePages

    sqlservr!IMemObj::PbGetNewPages

    sqlservr!CSlotPageMgr::PbAllocate

    5. If you see many stacks like the one below it should be because of excessive parallelism

    sqlservr!CQScanXProducerNew::Open

    sqlservr!FnProducerOpen

    sqlservr!FnProducerThread

    sqlservr!SubprocEntrypoint

    6. If you see many stacks like the one below (Many threads waiting to flush log) it should be because of disk bottleneck’s. Check if you see "I/O requests taking longer than 15 seconds" messages in Errorlog before Deadlocked Schedulers Dumps. Refer Troubleshooting sql server I/O for troubleshooting I/O issues.

    sqlservr!SOS_Event::Wait

    sqlservr!SQLServerLogMgr::WaitLCFlush

    sqlservr!SQLServerLogMgr::LogFlush

    sqlservr!SQLServerLogMgr::WaitLogFlush

    sqlservr!XdesRMFull::Commit

     

    If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI to post your SQL Server questions to SQL Server experts

    Related posts:

     

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Disclaimer

    The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

    Posted in Debugging, Performance, SQL Server Engine | Tagged: , , , , , , , , , , | 85 Comments »