MSSQLWIKI

Karthick P.K on SQL Server

Archive for August, 2012

SQL Server: Ghost records

Posted by Karthick P.K on August 30, 2012

 

What is Ghost record?

When a record is deleted from a clustered index data page or non-clustered index leaf page or a versioned heap page or a forwarded record is recalled, the record is logically removed by marking them as deleted but not physically removed from the page immediately.

Pages which are marked as deleted but actually not deleted physically are called Ghost Records.

 

Why?

Allowing delete or update operation run faster and to ensure faster rollback if the transaction is cancelled or roll backed.

 

Who will remove the records which are marked for deletion?

Ghostcleanuptask: SQL Server Ghostcleanuptask thread physically removes the records which are marked as deleted.

 

How Ghost cleanup task works?

1.       Ghostcleanuptask thread wakes up every 10 seconds.

2.       Sweep databases one by one starting from master.

3.       Skip the database if it is not able to take a shared lock for database (LCK_M_S) or database is not in Open read/write state.

4.       Scans the PFS pages of the current database to get the pages which has ghost records.

PFS Page: A PFS page occurs once in 8088 pages. SSQL Server will attempt to place a PFS page on the first page of every PFS interval(8088Pages). The only time a PFS page is not the first page in its interval is in the first interval for a file.

In this case, the file header page is first, and the PFS page is second. (Page ID starts from 0 so the first PFS page is at Page ID 1)

clip_image002_thumb

5.       Remove the records which are marked as deleted (ghosted) physically.

 

 

The GHOST_VERSION_RECORD is used in cases where versioned heap records, or BLOB fragments are deleted and when versioned forwarded heap rows are recalled back to their originating page. The records must be ghosted rather than deleted so the versioning chain can be maintained. The GHOST_VERSION_RECORDs cannot be expunged until all transactions that have references to the original data record have finished.

 

 

Let us see how the ghost records are marked for deletion and deleted later using below example.

 

Note: We will be using few trace flags for testing purpose. Use this trace flags and below steps only in your testing servers.

Trace flag 3604 Directs SQL Server to send output some of the command’s to client executing the command. Ex.  When we execute  DBCC PAGE or DBCC TRACESTATUS(-1) we will not see results in SSMS or any other client unless Trace flag 3604 is on

Trace flag   661 Disables ghost record cleanup task.  When this trace flag is enabled Ghost cleanup task will not run. 

Trace flag   662 along with 3605 prints the output of Ghost cleanup task to SQL Server error log ( DBCC TRACEON (662, 3605,-1))

 

 

 

Step1:

//Create a database named Ghost test

create database Ghosttest

go

use Ghosttest

go

 

Step 2:

//Create a table named ‘a’ with clustered and non-clustered index.

CREATE TABLE [dbo].[a](

      [a] [char](3000) NULL,

      [b] [char](3000) NULL,

      [c] [int] NULL

) ON [PRIMARY]

 

CREATE Clustered INDEX [CIX_a] ON [dbo].[a]

(

      [c] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

 

 

 

CREATE  INDEX [IX_a] ON [dbo].[a]

(

      [c] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

 

 

 

Go

 

Step 3:

//Insert 1000 records in to the table.

insert into  a values (‘Ghost’,‘Record’,1);

go 1000

 

Step 4:

//Enable trace flag 3604 to see the output of DBCC command from client like SSMS.

 

dbcc traceon (3604,-1)

 

go

 

Step 5:

// Disable Ghost record cleanup task to see how the ghost records are marked in data/index pages.   (Don’t enable in production)

 

dbcc traceon (661,-1)

go

 

Step 6:

//Delete the 1000 records which we inserted in table ‘a’ to generate ghost records (Records marked for deletion but not physically deleted).

 

delete from a

go

 

Step 7:

//Ghost cleanup task first scans the PFS pages in  current database to get the pages which has ghost records. Let us dump the PFS pages to see how the pages with ghost records are marked

// Watch the text which are in red or search for Has Ghost in the output of DBCC page.

DBCC PAGE(‘Ghosttest’,1,1,3)

Go

 

PAGE: (1:1)

 

 

BUFFER:

 

 

BUF @0x00000000FEC0ED80

 

bpage = 0x00000000CA024000           bhash = 0x0000000000000000           bpageno = (1:1)

bdbid = 9                            breferences = 0                      bcputicks = 0

bsampleCount = 0                     bUse1 = 62975                        bstat = 0xc0010b

blog = 0x79797979                    bnext = 0x0000000000000000          

 

PAGE HEADER:

 

 

Page @0x00000000CA024000

 

m_pageId = (1:1)                     m_headerVersion = 1                  m_type = 11

m_typeFlagBits = 0x1                 m_level = 0                          m_flagBits = 0x0

m_objId (AllocUnitId.idObj) = 99     m_indexId (AllocUnitId.idInd) = 0    Metadata: AllocUnitId = 6488064

Metadata: PartitionId = 0            Metadata: IndexId = 0                Metadata: ObjectId = 99

m_prevPage = (0:0)                   m_nextPage = (0:0)                   pminlen = 0

m_slotCnt = 1                        m_freeCnt = 2                        m_freeData = 8188

m_reservedCnt = 0                    m_lsn = (50:335:27)                  m_xactReserved = 0

m_xdesId = (0:0)                     m_ghostRecCnt = 0                    m_tornBits = 412090936

 

Allocation Status

 

GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED           PFS (1:1) = 0x44 ALLOCATED 100_PCT_FULL

DIFF (1:6) = CHANGED                 ML (1:7) = NOT MIN_LOGGED           

 

PFS: Page Alloc Status  @0x000000002205A000

 

(1:0)        – (1:3)        =     ALLOCATED 100_PCT_FULL                             

(1:4)        – (1:5)        = NOT ALLOCATED   0_PCT_FULL                             

(1:6)        – (1:7)        =     ALLOCATED 100_PCT_FULL                             

(1:8)        -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:9)        -              =     ALLOCATED 100_PCT_FULL                     Mixed Ext

(1:10)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:11)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:12)       -              =     ALLOCATED 100_PCT_FULL           IAM Page  Mixed Ext

(1:13)       – (1:14)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:15)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:16)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:17)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:18)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:19)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:20)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:21)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:22)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:23)       – (1:25)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:26)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:27)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:28)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:29)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:30)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:31)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:32)       -              =     ALLOCATED  50_PCT_FULL                     Mixed Ext

(1:33)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:34)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:35)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:36)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:37)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:38)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:39)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:40)       – (1:41)       =     ALLOCATED   0_PCT_FULL                             

(1:42)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:43)       -              = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:44)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:45)       – (1:46)       = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:47)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:48)       – (1:50)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:51)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:52)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:53)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:54)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:55)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:56)       – (1:61)       =     ALLOCATED   0_PCT_FULL                             

(1:62)       -              = NOT ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:63)       -              = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:64)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:65)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:66)       – (1:71)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:72)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:73)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:74)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:75)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:76)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:77)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:78)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:79)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:80)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:81)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:82)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:83)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:84)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:85)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:86)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:87)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:88)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:89)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:90)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:91)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:92)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:93)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:94)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:95)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:96)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:97)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:98)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:99)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:100)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:101)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:102)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:103)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:104)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:105)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:106)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:107)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:108)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:109)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:110)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:111)      – (1:116)      =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:117)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:118)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:119)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:120)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:121)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:122)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:123)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:124)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:125)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:126)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:127)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:128)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:129)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:130)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:131)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:132)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:133)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:134)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:135)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:136)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:137)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:138)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:139)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:140)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:141)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:142)      – (1:143)      =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:144)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:145)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:146)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:147)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:148)      -              =     ALLOCATED  50_PCT_FULL                     Mixed Ext

(1:149)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:150)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:151)      -              =     ALLOCATED 100_PCT_FULL                     Mixed Ext

(1:152)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:153)      – (1:158)      =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:159)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:160)      – (1:383)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:384)      – (1:385)      =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:386)      – (1:391)      = NOT ALLOCATED   0_PCT_FULL                             

(1:392)      – (1:497)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:498)      – (1:499)      =     ALLOCATED   0_PCT_FULL                             

(1:500)      – (1:655)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:656)      -              =     ALLOCATED   0_PCT_FULL                             

(1:657)      – (1:663)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:664)      – (1:671)      = NOT ALLOCATED   0_PCT_FULL                             

 

Step 8:

//Dump one of the page which is marked as having ghost record in PFS page. I have picked Page 158 for example.

//Look at m_ghostRecCnt

 

dbcc page(‘Ghosttest’,1,158,3)

go

 

 

PAGE: (1:158)

 

 

BUFFER:

 

 

BUF @0x00000000FEC02CC0

 

bpage = 0x00000000C9E22000           bhash = 0x0000000000000000           bpageno = (1:158)

bdbid = 9                            breferences = 0                      bcputicks = 933

bsampleCount = 1                     bUse1 = 64101                        bstat = 0xc0010b

blog = 0x2159bb79                    bnext = 0x0000000000000000          

 

PAGE HEADER:

 

 

Page @0x00000000C9E22000

 

m_pageId = (1:158)                   m_headerVersion = 1                  m_type = 1

m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x0

m_objId (AllocUnitId.idObj) = 28     m_indexId (AllocUnitId.idInd) = 256 

Metadata: AllocUnitId = 72057594039762944                                

Metadata: PartitionId = 72057594038845440                                 Metadata: IndexId = 1

Metadata: ObjectId = 2105058535      m_prevPage = (1:157)                 m_nextPage = (1:160)

pminlen = 3508                       m_slotCnt = 2                        m_freeCnt = 1054

m_freeData = 7134                    m_reservedCnt = 0                    m_lsn = (37:160:37)

m_xactReserved = 0                   m_xdesId = (0:1983)                  m_ghostRecCnt = 2

m_tornBits = 602498023              

 

Allocation Status

 

GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED          

PFS (1:1) = 0x68 MIXED_EXT ALLOCATED   0_PCT_FULL                         DIFF (1:6) = CHANGED

ML (1:7) = NOT MIN_LOGGED           

 

Slot 0 Offset 0x60 Length 3519

 

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Record Size = 3519                  

Memory Dump @0x000000001CEBA060

 

 

//There are 2 ghost records in page 158 (m_ghostRecCnt=2)

 

Step 8:

//Additionally we can query sys.dm_db_index_physical_stats to find the number of ghost records.

 

SELECT object_name(object_id) as Name, record_count, GHOST_RECORD_COUNT,Version_ghost_record_count,INDEX_TYPE_DESC, ALLOC_UNIT_TYPE_DESC

FROM sys.dm_db_index_physical_stats (DB_ID(N‘Ghosttest’), NULL, NULL, NULL , ‘DETAILED’) where index_level=0

go

clip_image004_thumb

 

 

Step 8:

//Let us enable trace flag 662 and 3605  to print the output of ghost cleanup task to SQL error log

  

dbcc traceon (662,3605,-1)

go

 

Step 9:

//Let us disable trace flag 661 to start the Ghost cleanup task again and let us disable 3604  which we enabled to print output of DBCC command in client (SSMS).

 

dbcc traceoff (661,3604,-1)

go

 

Purging page Dbid 9, File 1, Page 144

Purging page Dbid 9, File 1, Page 146

Purging page Dbid 9, File 1, Page 153

Purging page Dbid 9, File 1, Page 154

Purging page Dbid 9, File 1, Page 155

Purging page Dbid 9, File 1, Page 156

Purging page Dbid 9, File 1, Page 157

Purging page Dbid 9, File 1, Page 158

Purging page Dbid 9, File 1, Page 160

Purging page Dbid 9, File 1, Page 161

 

We will see the pages purged by the ghost cleanup task in the SQL Server error log.

Step 10:

//  We can also capture Ghost cleanup task in profiler by selecting SQL Transaction and filter by object name like  “GhostCleanupTask”

image

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in DBCC, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , , , | 15 Comments »

I/O requests taking longer than 15 seconds to complete on file

Posted by Karthick P.K on August 27, 2012

Do you see warnings like one below in your SQL Server error log?

SQL Server has encountered  x occurrence(s) of I/O requests taking longer than 15 seconds to complete on file .

The OS file handle is 0x000006A4. The offset of the latest long I/O is: 0x00000

(or)

BobMgr::GetBuf: Sort Big Output Buffer write not complete after n seconds.

This indicates SQL Server I/O Bottlenecks. SQL Server performance highly relies on the Disk performance.  SQL Server I/O Bottleneck can be identified through

1. PAGEIOLATCH_xx or WRITELOG wait types in Sys.Sysprocesses and other DMV’s

2. I/O taking longer than 15 seconds in SQL Server Error log.

{

SQL Server has encountered X occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [ ] in database [IOTEST (7). The OS file handle is 0x000006A4. The offset of the latest long I/O is:
0x000001

}

3. By looking at I/O latch wait statistics in sys.dm_os_wait_stats

{

Select  wait_type,         waiting_tasks_count,         wait_time_ms  from    sys.dm_os_wait_stats where    wait_type like ‘PAGEIOLATCH%’
order by wait_type

}

4. By looking at pending I/O requests and isolating the disks,File and database in which we have I/O Bottleneck.

{

select     database_id,     file_id,     io_stall,     io_pending_ms_ticks,     scheduler_address from    sys.dm_io_virtual_file_stats(NULL, NULL)t1,         sys.dm_io_pending_io_requests as t2
where    t1.file_handle = t2.io_handle

}

Following are common reasons for I/O Bottleneck in SQL Server.

1. SQL Server is spawning more I/O requests than what I/O disk subsystem could handle.

2 . There could be an Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem (or) Compression and  so the Disks are performing very slow and hence SQL Server is affected.

3. Some other process on the system is saturating the disks with I/O requests. Common application includes AV Scan,System Backup Etc. So I/O requests posted by SQL Server becomes slow.

 

How to  troubleshoot?

1.  Exclude SQL Server files from antivirus scan.

2. Do not place SQL Server FILES on compressed drives.

3. Distribute SQL Server data files and transaction log files across drives.

4. If the “I/O request taking longer” warning is for tempdb , Enable trace flag 1118 and increase the tempdb data files refer:http://support.microsoft.com/kb/2154845

5. If none of the above resolves the issue collect the below perfmon counters.

 

Perfmon counters can help us in understanding “If disk is slow” or  “SQL Server is spawning more I/O then what disk could handle” or “Some other process is saturating disk with I/O”

Note:It is important to get  throughput of the disk subsystem in MB/SEC before we look at disk counters. Normally it will be more than 150 MB for SAN disk and greater 50 MB for Single disk .When you look at the perfmon counter look at Max value.

 

Avg. Disk sec/Transfer –> Time taken to perform the I/O operation

Ideal value for Disk sec/Transfer is 0.005-0.010 sec. If you consistently notice this counter is beyond 0.015 then there is a serious I/O bottleneck.

1. Look for Disk Bytes /sec when Avg. Disk sec/Transfer  is greater than 0.015. If it is below 200 MB for SAN disk and Below 50 MB for Single disk then the problem is with I/O subsystem Engage hardware vendor.

2. If the Disk Bytes /sec  is greater than  200 MB for SAN disk or greater than 50 MB for Single disk when the  Avg. Disk sec/Transfer  is greater than 0.015. Look at the Process:IO Data Bytes/Sec for the same time and identify which process is spawning I/O. If the identified process is not SQL Server involve the team which supports that process. If the  the identified process is SQL Server tune SQL Server queries which are I/O intensive by creating dropping indexes etc.

 

Disk Bytes /sec  –> Total read and write to disk per second in bytes.

Collect the values for each logical disks in which SQL Server files are placed and look at the Max value for this counter ideally it has to be greater than the throughput of the disk subsystem. If you don’t have the throughput for the disk then this value to be greater than 200MB for SAN or greater than 50 MB for single disk.

If it is below the expected value you can consider that your disks are not performing well. Involve the hardware vendor. 

Important: Value for this counter will be low when there is no I/O happening on the drives. So you have to look at the this counter during the time you see I/O warnings or When Disk sec/Transfer >0.010 for the same drive. 

 

Process:IO Data Bytes/Sec –> Total read and write to disk per second in bytes by each process.

Collect this counter for all the processes running on the server. This counter will help us understand if any other process is saturating the disk with excessive I/O.

Example: Let us consider a disk with max throughput of 250MB per second. If antivirus is spawning 200MB of I/O per second and if SQL Server data files are placed in same drive and SQL Server is spawning 150MB obviously there will be I/O waits.

 

Buffer Manager: Page Read/sec + Page Writes/sec –>Total read and write to disk per second in bytes by SQL Server process.

Note: If you are analyzing the .BLG file collected and not live perfmon focus on Maximum value for each counter don’t look at average.  

 

If (Avg. Disk sec/Transfer> ==0.015 ) and ( (Disk Bytes /sec < 150MB (For San)) or (Disk Bytes /sec < 50MB (For Local) or (Disk Bytes /sec < Speed of disk as per Vendor ))

{

There is Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem.

}

If (Disk sec/Transfer > ==0.015 Consistently) and ( (Disk Bytes /sec >= 150 (For San)) or (Disk Bytes /sec >= 50MB (For Local) or (Disk Bytes /sec >= Speed of disk as per Vendor ))

{

Identify the process which is posting excessive I/O request using Process:IO Data Bytes/Sec.

If ( Identified process == SQLServer.exe )

{

Identify and tune the queries which is Spawning excessive I/O.

(Reads+Writes column in profiler, Dashboard reports or sys.dm_exec_query_stats and sys.dm_exec_sql_text

can be used to identify the query). Use DTA to tune the query

}

If ( Identified process != SQLServer.exe )

{

Engage the owner of application which is spawning excessive I/O

}

}

Many thanks to Joseph Pilov from whom I learned many techniques like the one above.

 

If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K  |My Facebook Page |My Site| Blog space| Twitter

Posted in Performance, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , | 22 Comments »

Database Mail errors in SQL Server (Troubleshooting steps)

Posted by Karthick P.K on August 25, 2012

Troubleshooting Database Mail issues in SQL Server

 

 

Use the Database Mail Configuration Wizard, change the Logging Level to Verbose and send a test mail to investigate the point of failure.

 

Right click database mail –View database mail log to see error or we can SELECT * FROM msdb.dbo.sysmail_event_log ;

 

Check the sent_Status column in the sysmail_allitems table. The four values are sent, unsent, retrying and failed.

If the status is sent and the recipients  hasn’t received the email yet, that the Database Mail external program successfully delivered the e-mail message to the SMTP server but it failed to deliver the message to the final recipient. At this point, the SMTP needs to be troubleshooted (perhaps engaged your Exchange or Mail server team)

 

If the status is unsent or retrying, it means that the Database Mail has not yet processed the e-mail message or is in the process of retrying after a failed attempt. This could be due to network conditions, volume of messages, SMTP server issues, etc. If the problem persists, use another profile or another mail host database.

 

If the status is failed, it means that the Database Mail was unable to deliver the message to the SMTP server. Check the sysmail_log table and the destination address. Also be sure that there are no Network or SMTP issues.

 

Send a test email outside SQL Server using below script or Other mail clients and check if the  recipients   are receiving mails. If they do not receive problem is outside SQL Server. Engage Exchange or other mail server teams to identify why we are not able to send emails from below script or Office outlook or Other mail clients.

Set objMessage = CreateObject("CDO.Message")
 objMessage.Subject = "Hello"
 objMessage.From = """SENDER NAME""<e-mail ID>"
 objMessage.To = "To address@mssqlwiki.com"
 objMessage.HTMLBody = "<h1><font face=arial>Hello,<br>How are you?."
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "smtphost.dns.Mailserver.com"
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 2
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpusessl") = False
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpconnectiontimeout") = 60
objMessage.Configuration.Fields.Update
objMessage.Send

If the mail has successfully reached  to recipients from above script problem is with in SQL Server mail configuration. 
Verify the following

 

1.Verify if  Service Broker is enabled (select is_broker_enabled from sys.databases where name=‘MSDB’ (0 – disabled, 1- enabled).

To enable service broker on your database run the following query: ALTER DATABASE MSDB SET ENABLE_BROKER

Note: You will be required to have exclusive access to the database while running this statement.  If you do not you will get the following error message: 
Msg 5061, Level 16, State 1, Line 1. ALTER DATABASE failed because a lock could not be placed on database MSDB. Try again later.

                               Msg 5069, Level 16, State 1, Line 1

ALTER DATABASE statement failed.

You will have to stop SQL Server agent to enable broker on MSDB

2.Check if Database mail stored procedures are  enabled (Surface Area Configuration >> “Surface Area Configuration for Features” >> Under MSSQLSERVER, expand Database Engine, and then click Database Mail. >> Ensure that Enable Database Mail stored procedures is selected, and then click Apply).

 

3.Check if the user is part of DatabaseMailUserRole.

 

4.Check what parameters and values are used in configuration by running

   exec msdb..sysmail_help_configure_sp

   A list of default values are given in BOL, topic: “sysmail_help_configure_sp (Transact-SQL)”. To modify a parameter or value you can use the following stored procedure

   exec msdb..sysmail_configure_sp ‘parameter_name’, ‘parameter_value’

 Check if ReadFromConfigurationFile is enabled if yes check if the DatabaseMail90.exe.config file (The default path is < drive >\Program Files\Microsoft SQL   Server\MSSQL.1\MSSQL\Binn) and has proper parameters.

 

5.Verify that the Database Mail executable is located in the correct directory – e.g. C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn

 

6.Verify that the service account for SQL Server has permission to run the executable, DatabaseMail90.exe, which requires network access to the SMTP servers specified in Database Mail accounts. Therefore, the service account for SQL Server must have permission to access the network, and the SMTP servers must allow connections from the computer that runs SQL Server.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Configuration, Database mail, SQL General | Tagged: , | 5 Comments »

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

Posted by Karthick P.K on August 21, 2012

Do you see below errors in SQL error along with dumps and stuck?

Non-yielding IOCP Listener

* BEGIN STACK DUMP:
*   05/06/12 03:54:59 spid 0
* Non-yielding IOCP Listener

Non-yielding Scheduler
* BEGIN STACK DUMP:
*   04/16/12 10:09:58 spid 6256
* Non-yielding Scheduler

Non-yielding Resource Monitor

* BEGIN STACK DUMP

*   01/22/09 19:11:16 spid 0

* Non-yielding Resource Monitor

External dump process returned no errors.
Date Time Server Process 0:0:0 (0x31e8) Worker 0x000000016F41d140 appears to be non-yielding on Scheduler 4. Thread creation time: 12010668087858. Approx Thread CPU Used: kernel 2 ms, user 60516 ms. Process Utilization 11%. System Idle 83%. Interval: 71227 ms.

Refer “How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps”  for analyzing the Non-yielding Scheduler, Non-yielding IOCP Listener and Non-yielding Resource Monitor Dumps.

If you are interested in just finding a quick resolution follow the below steps to get the Non-Yield stack from the dump and check if it is matching with any existing known issues in SQL Server. 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg

step 2:

Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 3:

on command window type    
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 4:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 5:

Type  .cxr sqlservr!g_copiedStackInfo+0X20   for SQL Server2005 and SQL Server2008/2008R2  (or)   .cxr sqlmin!g_copiedStackInfo+0X20  for SQL Server2012.

Type kc 100 and look at the stack to see if it matches with the stack of any of known issues in SQL Server listed below.

If kc 100 doesn’t display any stack and throws “WARNING: Frame IP not in any known module. Following frames may be wrong”  type .cxr to reset to default scope and try  .cxr sqlservr!g_copiedStackInfo+0X00c (In 32-Bit (X86)  SQL server valid offset for context is 0X00c Look at This blog to see how we identified the offset)

Note:If your stack doesn’t match with any of the stack  listed below then paste the stack in comments session of this blog (or) In This face book group.We will try to find the cause for you. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

 

Stack 1

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!ExecuteSql

sqlservr!CSpecProc::ExecuteSpecial

sqlservr!CXProc::Execute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB :2344600:FIX: "Non-yielding Scheduler" error may occur when you use the CONTAINSTABLE function together with many OR and AND predicates in SQL Server 2008 or in SQL Server 2008 R2

Stack 2

sqlservr!TMatchPattern

sqlservr!FMatchStrTxt

sqlservr!I8CharindexStrBhI8

sqlservr!CEs::GeneralEval4

sqlservr!CXStmtCond::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands 0x12a

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

2633357 FIX: "Non-yielding Scheduler" error might occur when you run a query that uses the CHARINDEX function in SQL Server 2008 R2

Stack 3

sqlservr!CItvlVal::Copy

sqlservr!CConstraintItvl::PcnstrItvlUnion

sqlservr!CConstraintProp::FBuildItvlFromOr

sqlservr!CConstraintProp::FBuildItvlFromPexpr

sqlservr!CConstraintProp::FAndItvlConstraint

sqlservr!CConstraintProp::AndNewConstraint

sqlservr!CConstraintProp::PcnstrDeriveSelect

sqlservr!CLogOp_Select::PcnstrDerive

sqlservr!CLogOpArg::PcnstrDeriveHandler

sqlservr!CLogOpArg::DeriveGroupProperties

sqlservr!COpArg::DeriveNormalizedGroupProperties

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!COptExpr::DeriveGroupProperties 0xc6

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

KB: 982376 FIX: A non-yielding scheduler error or an error 8623 occurs when you run a query that contains a large IN clause in SQL Server 2005,SQL Server 2008, or SQL Server 2008 R2

Stack 4

sqlservr!COptExpr::AdjustParallelPlan

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtSelect::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB: 943060 FIX: A query that has many outer joins takes a long time to compile in SQL Server 2005

Stack 5

sqlservr!CXid::GetBlockingTask

sqlservr!SNode::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchAndResolve

sqlservr!DeadlockMonitor::SearchTaskAndResolve

sqlservr!DeadlockMonitor::WorkLoop

sqlservr!lockMonitor

sqlservr!lockMonitorThread

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

956854 Cumulative update package 10 for SQL Server 2005 Service Pack 2

Stack 6

ntdll!ZwQueryAttributesFile

ntdll!RtlDoesFileExists_UstrEx

ntdll!LdrpSearchPath

ntdll!LdrpCheckForLoadedDll

ntdll!LdrpLoadDll

ntdll!LdrLoadDll

kernel32!LoadLibraryExW

mswsock!SockLoadHelperDll

mswsock!SockGetTdiName

mswsock!SockSocket

mswsock!WSPSocket

ws2_32!WSASocketW

ws2_32!WSASocketA

sqlservr!CreateSocket

sqlservr!AcceptObject::AsyncAccept

sqlservr!Tcp::AcceptDone

sqlservr!SNIAcceptDoneWithReturnCode

sqlservr!SNIAcceptDoneWrapper

sqlservr!SNIAcceptDoneRouter

sqlservr!SOS_Node::ListenOnIOCompletionPort

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 2711549 FIX: An error message is logged when you start SQL Server 2008 R2 or when a client sends a request to SQL Server 2008 R2

Stack 7

ntdll!ZwOpenKey

advapi32!LocalBaseRegOpenKey

advapi32!RegOpenKeyExW

sqlservr!COledbConnect::GetProviderOptions

sqlservr!COledbConnect::SetClsidFromProvider

sqlservr!COledbConnect::Init

sqlservr!CStmtExecProc::XretRemoteExec

sqlservr!CRemoteProcExecLevel::Execute

sqlservr!CStmtExecProc::XretWrapRemoteExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExec::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

KB2468047 FIX: Error code 17883 or "Non-yielding Scheduler" error may occur when you use the OPENQUERY function on SQL Server 2005

Stack 8

ntdll!ZwQueryVirtualMemory

psapi!QueryWorkingSetEx

sqlservr!BPool::Shrink

sqlservr!BPool::ReleaseAwayBufs

sqlservr!BPool::LazyWriter

sqlservr!lazywriter

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

967908 Cumulative update package 13 for SQL Server 2005 Service Pack 2 or 970279 Cumulative update package 4 for SQL Server 2005 Service Pack 3

Stack 9

sqlservr!LatchBase::ReleaseInternal

sqlservr!XVB::GetRecord

sqlservr!RowsetVersionScan::GetData

sqlservr!CQScanRowsetNew::GetRowWithPrefetch

sqlservr!CQScanRowsetNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::InitForExecute

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CExecuteStatement::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecStr::XretExecStrExecute

sqlservr!CXStmtExecStr::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB : 949595 FIX: Error message when you run a query that uses a join condition in SQL Server 2005: "Non-yielding Scheduler"

Stack 10

sqlservr!SQLServerLogIter::LookupScanCache

sqlservr!SQLServerLogIterForward::GetNextBlock

sqlservr!SQLServerLogIterForward::GetNext

sqlservr!LsMgr::GetEndOfLog

sqlservr!LsMgr::ProcessInternalRollForward

sqlservr!LsWorkRequest::Execute

sqlservr!LsWorker::ThreadRoutine

sqlservr!LsWorker::ThreadRoutine

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 970044 FIX: Error message when you try to break database mirroring between two servers that are running SQL Server 2008: "Non-yielding Scheduler"

Stack 11

sqlservr!CLinkedMap

sqlservr!CCheckReadersAndWriters::Release

sqlservr!CMainIlb::~CMainIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 12

ntdll!ZwFreeVirtualMemory

KERNELBASE!VirtualFree

sqlservr!MemoryNode::VirtualFree

sqlservr!ReservedMemBlock::FreeMemory

sqlservr!MultiPageAllocator::FreePagesInternal

sqlservr!MultiPageAllocator::FreePages

sqlservr!MemoryNode::FreePagesInternal

sqlservr!MemoryClerkInternal::FreePagesInline

sqlservr!CVarPageMgr::Release

sqlservr!CMemObj::Free

sqlservr!CMemThread<CMemObj>::Free

sqlservr!LockBytesSS::~LockBytesSS

sqlservr!LockBytesHolder::`scalar deleting destructor’

sqlservr!LockBytesHolder::DestroyCallback

sqlservr!CacheLbss

sqlservr!LockBytesSS::Release

sqlservr!CQueryIlb::~CQueryIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 13

sqlservr!CompareStringWEnglishNoCase

sqlservr!CTypeInfo::ICompW

sqlservr!CDefaultCollation::ICompW

sqlservr!CDependElem::ICompare

sqlservr!CDependList::Find

sqlservr!CDependList::Insert

sqlservr!CDependList::Concat

sqlservr!CDependList::CollectDependencies

sqlservr!FillSysdepends

sqlservr!CProchdr::CreateProc

sqlservr!CSQLSource::PerformPphFakeExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2306162 FIX: Poor performance and some occasional non-yielding scheduler errors occur when you create a complex view that references a large amount of nested views or tables in SQL Server 2008 or in SQL Server 2008 R2

Stack 14

sqlservr!BaseSharedHoBt::GetHoBtId

sqlservr!HoBtFactory::GetDeferredDropCacheHobt

sqlservr!DropDeferredWorkTables

sqlservr!GhostRecordCleanupTask

sqlservr!CGhostCleanupTask::ProcessTskPkt

sqlservr!TaskReqPktTimer::ExecuteTask

sqlservr!OnDemandTaskContext::ProcessTskPkt

sqlservr!SystemTaskContext::ExecuteFunc

sqlservr!SystemTaskEntryPoint

sqlservr!OnDemandTaskContext::FuncEntryPoint

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SchedulerManager::FiberEntryPoint

kernel32!BaseFiberStart

kernel32!RtlCompareMemoryStub

KB 2505256 FIX: Poor performance when worktables that are marked for deferred drop are cleaned up in SQL Server 2008 R2

Stack 15

ntdll!ZwReadFile

kernel32!ReadFile

sqlservr!DiskReadAsync

sqlservr!FCB::AsyncRead

sqlservr!BackupIoRequest::StartDatabaseRead

sqlservr!BackupCopyMachine::CopyFileToBackupSet0

sqlservr!BackupCopyMachine::CopyFileToBackupSet

KB 960543 FIX: SQL Server 2005 or SQL Server 2008 may stop responding when you are performing a backup

Stack 16

sqlservr!Worker::ProfilingCPUTicks::ProfilingCpuTicksCallback

sqlservr!SOS_Scheduler::TaskTransition

sqlservr!SOS_Scheduler::Switch

sqlservr!SOS_Scheduler::SuspendNonPreemptive

sqlservr!SOS_Scheduler::Suspend

sqlservr!SOS_Task::Sleep

sqlservr!BTreeMgr::Seek

sqlservr!BTreeMgr::GetHPageIdWithKey

sqlservr!IndexPageManager::GetPageWithKey

sqlservr!GetRowForKeyValue

sqlservr!IndexRowScanner::EstablishInitialKeyOrderPosition

sqlservr!IndexDataSetSession::GetNextRowValuesInternal

sqlservr!RowsetNewSS::GetNextRows

sqlservr!CMEDScan::FGetRow

sqlservr!CMEDCatalogOwner::GetOwnerAliasIdFromSid

sqlservr!CMEDCatalogOwner::LookupPrimaryIdInCatalog

sqlservr!CMEDCacheEntryFactory::GetProxiedCacheEntryByAltKey

sqlservr!CMEDCatalogOwner::GetProxyOwnerBySID

sqlservr!CMEDProxyDatabase::GetOwnerBySID

sqlservr!GetDefaultSchemaIdCrossDb

sqlservr!GetCtxtSchemaId

sqlservr!CMEDAccess::GetMultiNameObject

sqlservr!CRangeObject::CImplName::FSameObject

sqlservr!CRangeObject::FCheckImplNames

sqlservr!CRangeObject::XretPostSchemaChecks

sqlservr!CRangeObject::XretSchemaChanged

sqlservr!CRangeTable::XretSchemaChanged

sqlservr!CEnvCollection::XretSchemaChanged

sqlservr!CXStmtQuery::XretSchemaChanged

sqlservr!CXStmtSelect::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

KB 2699013 FIX: SQL Server 2008 R2 or SQL Server 2008 stops responding and a "Non-yielding Scheduler" error is logged

Stack 17

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

KB 967169 FIX: When you run an UPDATE statement against a table that has a FOR UPDATE trigger that joins the DELETED and INSERTED tables, the query takes a long time to finish

Stack 18

msvcr80!memcpy

BackupString::vswcatf

BackupString::swcatf

BackupHistory::GenerateBackupDetails

sqlservr!BackupHistory::GenerateBackupSet

KB 917971 FIX: You may receive more than 100,000 page faults when you try to back up a SQL Server 2005 database that contains hundreds of files and file groups.

Stack 20

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

Stack 21

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

 

Related blogs: 

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

SQL Server Latch & Debugging latch time out

How to Analyze "Deadlocked Schedulers" Dumps

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group to get answers for all your SQL Server related questions.

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Debugging, Performance, SQL General, SQL Server Tools | Tagged: , , , , , , , , , , , , , , , , , , , , , , , , , , , | 161 Comments »

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

Posted by Karthick P.K on August 17, 2012

Note: If you are interested only in finding a quick resolution for  Non-Yielding scheduler or Non-yielding IOCP Listener dumps or  Non-yielding  resource monitor Jump to THIS LINK. Continue reading this article if you like to learn How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps

How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps?

This blog is targeted towards analyzing Non-Yielding scheduler dumps and not targeted on explaining how Non-Yield Detection works please read http://technet.microsoft.com/en-us/library/cc917684.aspx to understand how the Non-Yield Detection works but let us recollect few key points before we get in to analysis.

 

1.   SQL Server has its own logical schedulers to schedule the SQL Server workers.

2.   The scheduler is called the User Mode Scheduler (UMS) in SQL Server 2000 and the SQL Server Operating System (SOS) Scheduler in SQL Server 2005

3.   Logical scheduler makes the worker non-preemptive to the database engine. The worker owns the scheduler until it yields to another worker on the same scheduler.

 

What if the threads which owns the scheduler executes for long time without yielding (or) forever and does not yield to give quantum for the other threads waiting in the scheduler?

Answer: Other threads would not get CPU cycles and starve the SQL Server performance.

 

What if the thread is not able to finish its work with in quantum –(4 Milliseconds) for example large for loop?

SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) is expected to yield back to SQL Server scheduler when it exceeds 4ms and rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum.

 

What if the thread did not yield after 4 Milliseconds?

SQL Server has its scheduler monitor to track this. SchedulerMonitor algorithm is to check non-Yield condition every 5 seconds during which the basic check (Check if the thread is executing for >4Ms) is done . When the basic check evaluates to true, tracking of the worker begins and if the thread doesn’t yield beyond 10seconds (Nonyield threshold) after the tracking begins then threshold check becomes true. So there is approximately 15 seconds between the time of the last yield on the scheduler and the time that the threshold check becomes true and tracking continues.

 

A dump is taken when an specific nonyield situation has reached 60 seconds in total duration. Once a 17883 mini-dump is captured, no further 17883 mini-dumps are captured until trace flag -T1262 is enabled or the SQL Server process is restarted. However, 17883 error message reporting continues, regardless of the mini-dump capture.  Also when –T1262 is enabled mini-dump is captured when the Non-Yield threshold check becomes true (15 seconds)

and at subsequent 60-second intervals for the same nonyield occurrence. A new nonyielding occurrence causes dump captures to occur again.

When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

Non-yielding IOCP Listener

An identical algorithm is used to detect non-yielding I/O completion routines, counting completed I/O completion routines instead of number of yields. Scheduler Monitor takes a dump when it notices the IOCP has not moved for 10 seconds. Analyzing Non-yielding IOCP Listener is also same as analyzing non-yielding scheduler dump

 

 

Let us step in to analysis of  non-yielding scheduler dump which I got in SQL Server 2012

 

Sample 1

When a non-yielding scheduler dump is generated following error message is logged in SQL Error log and SQLDump000n.mdmp is generated in log folder.

 

{

    * *******************************************************************************

    *

    * BEGIN STACK DUMP:

    *   04/16/12 10:09:58 spid 6256

    *

    * Non-yielding Scheduler

    *

     * *******************************************************************************

Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0. Thread creation time: 12979065797278. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 70110 ms.

}

 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Microsoft (R) Windows Debugger Version 6.11.0001.404 X86

Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Users\karthick \Desktop\Karthick\SQLDump0009.mdmp]

User Mini Dump File: Only registers, stack and portions of memory are available

 

Comment: ‘Stack Trace’

Comment: ‘Non-yielding Scheduler’  èType of the dump

Symbol search path is: *** Invalid ***

 

 

Executable search path is:

Windows 7 Version 7601 (Service Pack 1) MP (24 procs) Free x64

Product: Server, suite: Enterprise TerminalServer SingleUserTS  à Windows version and system information

Machine Name:

Debug session time: Mon Apr 16 09:09:59.000 2012 (GMT-7)

System Uptime: 9 days 15:57:03.155

Process Uptime: 0 days 0:06:48.000

……………………………………………………….

……………………………..

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

0:146> lmvm sqlservr                                               

start             end                 module name

00000000`ffad0000 00000000`ffb0e000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21553ADC31784A4D933974A386EE2E052\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.S1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Apr 06 08:19:38 2012 (4F7F098A)

    CheckSum:         00036498

    ImageSize:        0003E000

    File version:     2011.110.2316.0                                                         

    Product version:  11.0.2316.0 èSQL Server Version

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5:

Use !findstack command to find scheduler monitor thread (sqlservr!SQL_SOSNonYieldSchedulerCallback )

 

 

0:146> !findstack sqlservr!SQL_SOSNonYieldSchedulerCallback

Thread 006, 1 frame(s) match  è Thread ID of scheduler monitor.

        * 07 00000000336be420 000007fee36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

 

Step 6:

Switch to scheduler monitor thread using ~[threadID]s command

 

0:146> ~[006]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

 

Step 7:

Use kC or kP command to look at the stack on scheduler monitor thread.

 

0:006> kP                        Child-SP          RetAddr           Call Site

00000000`3369c218 000007fe`fcd210ac ntdll!NtWaitForSingleObject+0xa

00000000`3369c220 00000000`ffaeecce KERNELBASE!WaitForSingleObjectEx+0x79

00000000`3369c2c0 00000000`ffaef1a4 sqlservr!CDmpDump::DumpInternal+0x20e

00000000`3369c360 000007fe`dbe50794 sqlservr!CDmpDump::Dump+0x24

00000000`3369c3a0 000007fe`dbe511e6 sqllang!SQLDumperLibraryInvoke+0x2e4

00000000`3369c640 000007fe`dbe16ddb sqllang!CImageHelper::DoMiniDump+0x426

00000000`3369c830 00000000`ffae307f sqllang!stackTrace+0xbdb

00000000`3369e270 000007fe`e36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

00000000`336be430 000007fe`e36866da sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0x375

00000000`336bebf0 000007fe`e364b53f sqldk!SchedulerMonitor::CheckScheduler+0x307

00000000`336bed60 000007fe`e364aa8f sqldk!SchedulerMonitor::CheckSchedulers+0x211

00000000`336bf1f0 000007fe`e371c779 sqldk!SchedulerMonitor::Run+0xfb

00000000`336bf320 000007fe`e3642f10 sqldk!SchedulerMonitor::EntryPoint+0x9

00000000`336bf350 000007fe`e3642d04 sqldk!SOS_Task::Param::Execute+0x21e

00000000`336bf950 000007fe`e36429e6 sqldk!SOS_Scheduler::RunTask+0xa8

00000000`336bf9c0 000007fe`e365a29f sqldk!SOS_Scheduler::ProcessTasks+0x299

00000000`336bfa40 000007fe`e365a3b0 sqldk!SchedulerManager::WorkerEntryPoint+0x261

00000000`336bfae0 000007fe`e3659fcf sqldk!SystemThread::RunWorker+0x8f

00000000`336bfb10 000007fe`e365aaf8 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8

00000000`336bfbc0 00000000`76ad652d sqldk!SchedulerManager::ThreadEntryPoint+0x236

 

Step 8:

Switch to the thread which is reported as Non-Yielding in SQL Server error log using ~~[ThreadID]s command.

 

Recollect the below error in SQL error log.  Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0.

 

0:006> ~~[0x1cb0]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

Step 9:

Look at the current stack of Non-yielding thread. using kc command

 

0:146> kc 10

Call Site

ntdll!NtWaitForSingleObject

KERNELBASE!WaitForSingleObjectEx

sqldk!SOS_Scheduler::SwitchContext

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqldk!WorkDispatcher::DequeueTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

 

 

 

 

Recollect the information which we discussed earlier in this blog. When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

 

 

Look at the above stack it cannot be Non-Yield thread because we see SwitchNonPreemptive  and SwitchContex  in the thread. 

{

SwitchPreemptive or SuspendNonPreemptive forces another worker to become owner of the scheduler. It does this by making the head of the runnable list the new owner and removing the current worker from logical scheduler control. The worker transitions ownership and is removed from SQL scheduler control until the external activity is complete. When the external activity is complete, the worker returns to the end of the runnable list by calling SwitchNonPreemptive.

}

 

Step 10:

Search for the copied stack structure using X commad

 

0:146> X sqlmin!*copiedStack*    

000007fe`df11bfe0 sqlmin!g_copiedStackInfo = <no type information>               

 

It is sqlmin!g_copiedStackInfo in this dump because this is dump is from SQL Server2012. In earlier versions of SQL Server it is sqlservr!g_copiedStackInfo

 

Step 11:

We know copied CONTEXT is stored in g_CopiedStackInfo how to find the valid offset in this structure ? If the CONTEXT is valid Rip,Rbp and RSP registers has to be valid if the dump is from X64 system and Eip,Ebp and Esp has to be valid if it is X86 system.

 

 

Let us do dd on sqlmin!g_copiedStackInfo  (remember it is sqlservr!g_copiedStackInfo in SQL2008/2005/2000)

 

0:146> dd sqlmin!g_copiedStackInfo     

000007fe`df11bfe0  00000001 00000000 3369e2e0 00000000

000007fe`df11bff0  0000a998 00000000 00000000 00000000

000007fe`df11c000  00000000 00000000 00000000 00000000

000007fe`df11c010  00000000 00000000 00000000 00000000

000007fe`df11c020  00000000 00000000 00000000 00000000

000007fe`df11c030  0010000b 00001f80 00000033 00000000

000007fe`df11c040  002b0000 00000246 00000000 00000000

000007fe`df11c050  00000000 00000000 00000000 00000000

 

Step 12:

Let us dump each address with context and verify if Rip,Rbp and RSP registers are valid. This dump is from 64-bit SQL Server so we are using Rip,Rbp and RSP registers. If the dump is from x86 system use Eip,Ebp and Esp .

 

0:146> dt 000007fe`df11bfe0 CONTEXT Rip Rsp Rbp     èType cast  000007fe`df11bfe0  with CONTEXT  .  RsP,Rbp and Rip is invalid so this is not valid offset.

ole32!CONTEXT

   +0x098 Rsp : 2

   +0x0a0 Rbp : 0x5a4d

   +0x0f8 Rip : 0

0:146> dt 000007fe`df11bff0 CONTEXT Rip Rsp Rbp      è Type cast 000007fe`df11bff0  with CONTEXT  . RIP is invalid

ole32!CONTEXT

   +0x098 Rsp : 0x72120000

   +0x0a0 Rbp : 0x3369e3cc

   +0x0f8 Rip : 0xf2

0:146> dt 000007fe`df11c000 CONTEXT Rip Rsp Rbp   è Type cast 000007fe`df11c000 with CONTEXT  . RIP ,RSP  and Rbp is valid. So this should be a valid context.Let us switch to this context and verify

ole32!CONTEXT

   +0x098 Rsp : 0x3369e2e0

   +0x0a0 Rbp : 0x3369e498

   +0x0f8 Rip : 0x76d3139a

 

Now we know 000007fe`df11c000 is valid context.  So 000007fe`df11c000 -sqlmin!g_copiedStackInfo =0x20 hence  we can use .cxr sqlmin!g_copiedStackInfo+0X20 directly to switch to the context of copied stack.

 

Step 13:

 

Switch the context of copied stack using .cxr 000007fe`df11c000  or .cxr sqlmin!g_copiedStackInfo+0X20

 

0:146> .cxr 000007fe`df11c000

rax=0000000000000002 rbx=000000003369e3cc rcx=0000000000005a4d

rdx=0000000072120000 rsi=000000000000026c rdi=0000000000000000

rip=0000000076d3139a rsp=000000003369e2e0 rbp=000000003369e498

r8=00000000000000b0  r9=0000000084a85310 r10=0000000000000000

r11=0000000000000000 r12=0000000000000000 r13=0000000000000004

r14=00000000000000f2 r15=0000000000000001

iopl=0         nv up ei pl zr na po nc

cs=0033  ss=002b  ds=0000  es=0000  fs=0000  gs=0000             efl=00000246

ntdll!NtWriteFile+0xa:

00000000`76d3139a c3              ret

 

Step 14:

 

Dump the stack of copied context using kP or kc (kc displays clean stack trace. each display line includes only the module name and the function name)

 

 

0:146> Kc

  *** Stack trace for last set context – .thread/.cxr resets it

Call Site

ntdll!NtWriteFile

KERNELBASE!WriteFile

kernel32!WriteFileImplementation

sqllang!CErrorReportingManager::WriteToErrLog

sqllang!CErrorReportingManager::SendErrorToErrLog

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqllang!ReportLoginFailure

sqllang!FRedoLogin

sqllang!login

sqllang!process_login_finish

sqllang!process_commands

sqldk!SOS_Task::Param::Execute

sqldk!SOS_Scheduler::RunTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

Now compare the current stack and the copied stack to see if the thread has progressed after No-Yield condition. Stack look completely different So the Non-Yield thread has progressed and completed .It is doing new work now. Also to understand why the thread was Non-Yielding look at the copied stack and not the current unless both the stacks are same. 

 

 

Current thread stack which we dumped using the thread ID in SQL Errorlog.

0:146> kc                                                                                    

Copied thread stack which SQL Server copied to global structure before generating the dump.

0:146> Kc 10

ntdll!NtWaitForSingleObject

KERNELBASE!WriteFile

KERNELBASE!WaitForSingleObjectEx

kernel32!WriteFileImplementation

sqldk!SOS_Scheduler::SwitchContext

sqllang!CErrorReportingManager::WriteToErrLog

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqllang!CErrorReportingManager::SendErrorToErrLog

sqldk!WorkDispatcher::DequeueTask

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqldk!SOS_Scheduler::ProcessTasks

sqllang!ReportLoginFailure

sqldk!SchedulerManager::WorkerEntryPoint

sqllang!FRedoLogin

sqldk!SystemThread::RunWorker

sqllang!login

sqldk!SystemThreadDispatcher::ProcessWorker

sqllang!process_login_finish

sqldk!SchedulerManager::ThreadEntryPoint

sqllang!process_commands

kernel32!BaseThreadInitThunk

sqldk!SOS_Task::Param::Execute

ntdll!RtlUserThreadStart

sqldk!SOS_Scheduler::RunTask

 

sqldk!SOS_Scheduler::ProcessTasks

 

sqldk!SchedulerManager::WorkerEntryPoint

 

sqldk!SystemThread::RunWorker

 

sqldk!SystemThreadDispatcher::ProcessWorker

 

sqldk!SchedulerManager::ThreadEntryPoint

 

Now let us read the copied stack and understand what would have caused a Non-Yield condition (read from bottom to top)

 

ntdll!NtWriteFile  -> WriteFile function is at top of the stack and did not complete in expected time.          
KERNELBASE!WriteFile       
kernel32!WriteFileImplementation 
sqllang!CErrorReportingManager::WriteToErrLog  ->Write the error to errorlog
sqllang!CErrorReportingManager::SendErrorToErrLog  ->Send the error to SQL Server errorlog
sqllang!CErrorReportingManager::CwchFormatAndPrint  ->format the error
sqllang!ReportLoginFailure  ->Login failed
sqllang!FRedoLogin        
sqllang!login ->Login task is processed

 

 

From the above stack we are able to understand SQL Server is writing login failed information to SQL Error log (Synchronously) and the writefile function has taken long time and did not return.So there is Non-Yield scheduler dump.

 

When will writefile operation take long time?

When there is Disk bottleneck. So the obvious solution for this issue is to  fix the performance of the disk.

Similarly there could be numerous other reasons for Non-Yield condition so look at the stack of your Non-Yield scheduler dump using the method above and make out what could have caused the Non-Yield condition.

Also refer THIS LINK to check if your stack matches with any of the known issues in SQL Server.

To Be continued…………………………

Related posts:

  • How to Analyze "Deadlocked Schedulers" Dumps?
  • Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes
  • SQL Server generated Access Violation dumps while accessing oracle linked servers.
  • SQL Server Latch & Debugging latch time out

    If you liked this post, do like us on Face Book at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

  • Posted in Configuration, Debugging, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , , | 28 Comments »

    SQL Server generated Access Violation dumps while accessing oracle linked servers.

    Posted by Karthick P.K on August 8, 2012

     

    When you run queries against Oracle linked servers from SQL Server you see errors like one below and access violation dumps are generated (SQLDump00XX.mdmp files in SQL Server error log folder).

    {

    External dump process returned no errors.
    Using ‘dbghelp.dll’ version ‘4.0.5’
    SqlDumpExceptionHandler: Process 510 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
    * *******************************************************************************
    *
    * BEGIN STACK DUMP:
    *  Exception Address = 000000007752485C Module(ntdll+000000000002285C)

    *   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION

    *   Access Violation occurred reading address 0000041EA9AE2EF0

    * Input Buffer 510 bytes –

    }

     

    To analyze the dump download and Install Windows Debugger from This  link

    1. Open Windbg

    2. Choose File menu –> select Open crash dump –>Select the Dump file
    (SQLDump000#.mdmp)

    3. on command window type    
    .sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

    4. Type .reload /f and hit enter. This will force debugger to immediately
    load all the symbols.

    5. Type .ecxr

    6. Type  kL    and look at the stack

     

    {

    ntdll!RtlpFreeUserBlock
    ntdll!RtlFreeHeap

    }

    If you see above frames in the top of the stack and if you are using Oracle Provider for OLE DB – Version: 11.2.0.1 and later.

    There is a known issue with Oracle Provider for OLE DB – Version: 11.2.0.1 and later   when  — STYLE COMMENTS are used in linked server queries ,SP’s Etc .

    Resolution

    1. Remove the – -style comments
    OR
    2) use /* */ for the comments instead of —

     

    If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

     

    Regards

    Karthick P.K

    Posted in Connectivity, Debugging, SQL Server Engine | Tagged: , , , , | 4 Comments »