MSSQLWIKI

Karthick P.K on SQL Server

Archive for the ‘SQL General’ Category

Running 1 Million databases on Azure SQL

Posted by Karthick P.K on October 11, 2020

Curious to know how my team manage over 1 million SQL databases across dozens of datacenters in Azure to enable the Common Data Service behind the Power Platform & Dynamics 365 hands free through Spartan and DAMS!

Read for some never before shared details

https://www.linkedin.com/posts/karthick-pk-1b342727_running-1m-databases-on-azure-sql-for-a-large-activity-6720400296566243328-x6_t

Thank you

Karthick P.K

Posted in SQL General | Leave a Comment »

Transactional Replication Part -2

Posted by Karthick P.K on November 22, 2013

Transactional Replication Part -2 of transactional replication series covers

Demo of data flow, configuring distributor, publisher, publication, subscription etc. After watching this video you will be able to correlate the concepts we discussed in earlier video, configure transactional replication on your own , Understand different replication agents like snapshot agent, log reader agent and distribution agent and how to monitor these agents after the transactional replication is configured

 

How to configure transactional replication By Gaurav Mathur

Posted in SQL General | Tagged: , , | 3 Comments »

Transactional Replication Part -1

Posted by Karthick P.K on November 22, 2013

Transactional Replication Part -1 of transactional replication series covers about

1. Architecture and transactional replication data flow.
2. Different entities involved in transactional replication like Publisher Server, Distributor Server and Subscriber Server, publication, publication database, subscription, subscription database, articles are discussed in this video.
3. Replication agents involved in one way transactional replication and their usage of different agents like snapshot agent, log reader agent and distribution agent are also discussed
4. Steps involved in configuring transactional replication like configuring distributor, publisher and subscriber along with configuring publication and subscription are also discussed in this video.
5. Any DBA can look into this video and can learn the Transactional Replication Data flow, working and how to configure Transactional replication.

After watching the below video you can look at the Transactional replication Part 2 demo video which will help you to learn the above concepts practically and will enable you to configure replication on your servers.

Transactional Replication internals and architecture by Gaurav Mathur

Posted in Replication, SQL General | Tagged: , , , , | 2 Comments »

Tempdb latch contention

Posted by Karthick P.K on September 17, 2013

You might see Page latch contention in tempdb when you repeatedly drop and create TempDb objects (Temp tables, table variables etc.).

When you notice PAGELATCH_* contention on tempdb (Wait resource in sysprocesses starts with 2: ) check if the latch wait is on PFS,GAM or SGAM page. When there is latch contention on tempdb you will see lot of sessions waiting on Pagelatch_* similar to one below.

In the below output session is waiting on resource 2:15:121320 . If we decode the wait resource it is 2: database id of tempdb ,  15: file number , 121320 is page number. 121320 is in multiple of 8088 so it is a PFS page, similarly identify if the page we are waiting is GAM or SGAM page if it is not PFS page.

Wait type                            Wait resource

PAGELATCH_UP               2:15:121320

PAGELATCH_UP               2:15:121320

How to identify if page is PFS,GAM or IAM?

PFS Page: A PFS page occurs once in 8088 pages. SQL Server will attempt to place a PFS page on the first page of every PFS interval(8088Pages). The only time a PFS page is not the first page in its interval is in the first interval for a file. File header page is first, and the PFS page is second. (Page ID starts from 0 so the first PFS page is at Page ID 1). If (page number)/8088 is round value then the page is PFS page.

GAM Page: GAM page is page 2 in the data file, next GAM page is placed at 511230 Page after first GAM page (GAM interval). If (page number-1)/511230 is round value then the page is GAM page.

SGAM Page: SGAM page is page 3 in data file , next SGAM page is placed at 511230 Page after first SGAM page. If (page number-2)/511230 is round value then the page is GAM page.

clip_image002

How to resolve?

1. Increase the number of TEMPDB data files files and size them equally. As a general rule, if the number of logical processors is less than or equal to 8, use the same number of data files as logical processors. If the number of logical processors is greater than 8, use 8 data files and then if contention continues further increase the number of data files by multiples of 4 (You may not see improvement once you reach 32 files). 

2. Enable server side trace flag 1118.

3. If you further see latch contention on PFS  page after following above two steps then the only option is to modify your application to limit the tempdb usage.

4. If you see contention on 2:1:103 (Page 103 is for system table sys.sysmultiobjrefs. This table manages the relationship between created objects in every database). The only way to reduce contention on this page is reduce the relation. Example creating lot of temp tables with primary key can cause this contention because the relation between the table and PK constraint has to be updated in sys.sysmultiobjrefs.

What’s the best practice ?

1. Create multiple tempdb data files instead of creating 1 large file and size them equally in all your SQL Server instances.

2. Make TF1118 (Uniform allocation) as default. (Extra space required by this trace flag shouldn’t really matter as amount additional space required is minimal and storage cost is not that high these days).  

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Performance, Space management, SQL General, SQL Server Engine | Tagged: , , , , | 1 Comment »

Troubleshooting Transactional replication Latency using Agent Statistics

Posted by Prabhakar Bhaskaran on September 13, 2013

Troubleshooting latency issues in replication is black box for many DBA’s, In this post I will explain how you can leverage the agent statistics to troubleshoot the latency issues.

Before understanding how to decode the agent statistics, lets take a look at the some of the basic things which will help us to troubleshoot the replication performance issue in better way.

The following MSDN diagram depicts the transactional replication architecture in simple manner.

Transactional replication components and data flow

Troubleshooting latency issues is multi step approach, first step is identify which agent is slow,

  • Log reader Agent (Publisher to Distributor)
  • Distribution Agent (Distributor to Subscriber)

So, the problem can be either log reader or distribution agent, we can identify this by just simply inserting the tracer token.

Once we find out the problematic agent the next step is to identify within the agent which particular thread causing the issue.

Let me introduce you to the important threads and its work on these replication agents in nutshell.

Log Reader Agent

Reader Thread – It scans the publisher database transaction log using sp_replcmds

Writer Thread – Add the queued transactions to Distribution database using sp_MSadd_repl_commands

Distribution Agent

Reader thread – It finds the watermark from the table Msreplication_subscriptions(on subscriber) and uses this information to retrieve pending commands from the Distribution database. It basically uses the stored procedure sp_MSget_replcommands to achieve it.

Writer thread – Writer thread uses the Batched RPC calls to write the information to subscriber database.

Now that we understood the threads in the replication agents.  let’s assume we already identified which agent is slow by inserting tracer token. Next is to dig deeper on thread level, this is where our replication agent statistics comes to rescue us.

Agent statistics entries appended to history tables every 5 minutes by default. It provides the historical view of how the agent has been performing and keeps the last 3 days data. You can keep for more days by changing the history retention period.

MSlogreader_history

MSdistribution_history

the above two tables are located in Distribution database. The statistics information is added as XML blob in comments column of these tables.

Now, lets take a look at how to decipher this XML Data for each agents.

Log Reader Agent statistics

<stats state=”1″ work=”948″ idle=”351940″ >
<reader fetch=”859″ wait=”0″/>
<writer write=”822″ wait=”395390″/>
<sincelaststats elapsedtime=”300″ work=”49″ cmds=”176998″ cmdspersec=”3543.000000″><reader fetch=”17″ wait=”0″/><writer write=”29″ wait=”350833″/></sincelaststats></stats>

– State = 1 means stats after batch commit

–Work = cumulative time spent by the agent since restart – idle time

–Idle = Time spent waiting to call sp_replcmds

–Reader fetch = Time to do execute sp_replcmds

Wait = Time spent waiting on writer to release buffer

–Writer write = Time spent writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer

Note: Each thread will have their own buffer with 40k in size.

Here,we need to look at the wait time to understand where the bottleneck exist.For example, if you notice wait time for Reader thread is high then it essentially means your writer thread is slow since reader thread is waiting for writer to release the buffer. Similarly, if you notice high wait time for writer thread then your reader thread is performing slow.

The simple way to decode this is,

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the commands to distribution database)

HIGH Wait time on Writer thread =  Reader thread is slow ( thread which scans the transaction log)

Distribution Agent Statistics

<stats state=”1″ work=”154″ idle=”351464″>
<reader fetch=”144″ wait=”11″/>
<writer write=”12″ wait=”338″/>
<sincelaststats elapsedtime=”305″ work=”10″ cmds=”81262″ cmdspersec=”8041.000000″><reader fetch=”0″ wait=”9″/><writer write=”10″ wait=”0″/></sincelaststats></stats>
– State =1 means stats after a batch commit

– Work = cumulative time spend by the agent since restart – idle time (seconds)

– Idle = Time spend waiting to call sp_msget_repl_commands

– Reader fetch = Time to do execute sp_msget_repl_commands

Wait = Time spent waiting on writer to release buffer.

– Writer write = Time spend writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer.

Similar to log reader agent, the decoding of wait time is same way we did for log reader agent.

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the subscriber database using batched RPC Calls)

HIGH wait time on Writer thread = Reader thread is slow ( thread which takes the pending commands from Distribution database)

Distributor Writer thread Slow Scenario

We would be able to understand this concepts better by looking at the example statistics, In this below case, I explicitly started the transaction on subscriber table to simulate blocking at the subscriber side making the writer thread of distribution agent to wait and build up latency.

This is how stats looked,

  • <stats state=”1″ work=”755″ idle=”354505″>
  • <reader fetch=”153″ wait=”604″/>
  • <writer write=”613″ wait=”346″/>
  • <sincelaststats elapsedtime=”636″ work=”515″ cmds=”45033″ cmdspersec=”87.000000″><reader fetch=”0″ wait=”515″/><writer write=”515″ wait=”0″/></sincelaststats></stats>

We can clearly see Reader thread wait time is high(515) which means writer thread is slow since we simulated the blocking on subscriber side.

Similarly,we can simulate the blocking on replication tables msrepl_commands and msrepl_transactions which will cause Log reader writer thread to be slow and stats will show Reader thread wait time as high.

Ok, now we isolated the source of bottleneck in thread level, After this we can just follow the standard performance troubleshooting approach described in this Whitepaper to troubleshoot the slowness of the replication session.

For instance, check out the video where Joe Sack talks about using Extended events to troubleshoot the Distributor writer thread slowness.

In Summary

1. Find which agent is causing slowness using tracer token.

2. Leverage the Agent statistics to narrow down problem to thread level .

3. Follow standard performance troubleshooting approach to resolve the issue.

Thanks for reading! I hope this will help you to troubleshoot the replication performance better next time.

Posted in Performance, Replication, SQL General | Tagged: , , , , , | 2 Comments »

The connection to the primary replica is not active. The command cannot be processed

Posted by Karthick P.K on June 20, 2013

When you configure SQL Server always on available group from management studio it may fail with below error while joining secondary replica to the availability group.

 

Error 1

 

{

Joining database on secondary replica resulted in an error.  (Microsoft.SqlServer.Management.HadrTasks)

——————————

ADDITIONAL INFORMATION:

Failed to join the database ‘AG’ to the availability group ‘AG1’ on the availability replica ‘NODE2’. (Microsoft.SqlServer.Smo)

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

——————————

The connection to the primary replica is not active.  The command cannot be processed. (Microsoft SQL Server, Error: 35250)

}

 

Error 2

 

{

TITLE: Microsoft SQL Server Management Studio

——————————

Failed to join the instance ‘NODE2’ to the availability group ‘AG1’. (Microsoft.SqlServer.Management.SDK.TaskForms)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=11.0.2100.60+((SQL11_RTM).120210-1917+)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

——————————

ADDITIONAL INFORMATION:

Failed to join local availability replica to availability group ‘AG1’.  The operation encountered SQL Server error 41106 and has been rolled back.  Check the SQL Server error log for more details.  When the cause of the error has been resolved, retry the ALTER AVAILABILITY GROUP JOIN command. (Microsoft SQL Server, Error: 41158)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=11.00.2100&EvtSrc=MSSQLServer&EvtID=41158&LinkId=20476

}

 

 

You may get below error when you configure AG availability group using  alter database command mentioned below or synchronization might fail with 35250 error mentioned below.

 

ALTER DATABASE [AG] SET HADR AVAILABILITY GROUP = [Group name];

 

Error 1

 

Msg 35250, Level 16, State 7, Line 1

The connection to the primary replica is not active.  The command cannot be processed.

 

 

To resolve  above errors

 

1. Ensure always on endpoint ([Hadr_endpoint]) are not blocked by firewall (Default port 5022).

 

2. Make sure startup account of primary server is added to all secondary server’s and Startup accounts of all secondary servers are added to primary servers.(Startup account of each replica to be added to other replica’s)

 

3. If log on account of SQL Server is “Nt service\” or local system account then ensure system account (Domainname\systemname$) of each replica is added to other replicas.

{

CREATE LOGIN [MSSQLWIKI\node2$] FROM WINDOWS

}

 

4. Grant connect on always on endpoints created on each replicas for startup account of other replica servers (Grant connect on endpoints even if startup account of other replicas are added as sysadmins).

{

GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [MSSQLWIKI\node1$]

}

 

5.  Make sure SQL Server name (select @@servername) matches with hostname.

6. Make sure cluster service startup account is part of SQL Server logins (More details in This link).

 

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

 

Posted in Always On, Configuration, Connectivity, Security, SQL General | Tagged: , , , , , , | 28 Comments »

Inside sys.dm_os_ring_buffers

Posted by Karthick P.K on March 29, 2013

Sys.dm_os_ring_buffers DMV can be used to troubleshoot connectivity errors, track exceptions, monitor system health, memory pressure, Non-yielding/Deadlocked schedulers and a lot more.

You can use below scripts to query the data from sys.dm_os_ring_buffers during troubleshooting.

USE master
go
SET NOCOUNT ON
SET QUOTED_IDENTIFIER ON
GO
PRINT 'Start Time: ' + CONVERT (varchar(30), GETDATE(), 121)
GO
PRINT ''
PRINT '==== SELECT GETDATE()'
SELECT GETDATE()
PRINT ''
PRINT ''
PRINT '==== SELECT @@version'
SELECT @@VERSION
GO
PRINT ''
PRINT '==== SQL Server name'
SELECT @@SERVERNAME
GO
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_CONNECTIVITY - LOGIN TIMERS'
 
SELECT a.* FROM
(SELECT 
x.value('(//Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(30)') AS [RecordType], 
x.value('(//Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(30)') AS [RecordSource], 
x.value('(//Record/ConnectivityTraceRecord/Spid)[1]', 'int') AS [Spid], 
x.value('(//Record/ConnectivityTraceRecord/OSError)[1]', 'int') AS [OSError], 
x.value('(//Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') AS [SniConsumerError], 
x.value('(//Record/ConnectivityTraceRecord/State)[1]', 'int') AS [State], 
x.value('(//Record/ConnectivityTraceRecord/RecordTime)[1]', 'nvarchar(30)') AS [RecordTime],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') AS [TdsInputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') AS [TdsOutputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') AS [TdsInputBufferBytes],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]', 'int') AS [TotalLoginTimeInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]', 'int') AS [LoginTaskEnqueuedInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]', 'int') AS [NetworkWritesInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]', 'int') AS [NetworkReadsInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]', 'int') AS [SslProcessingInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]', 'int') AS [SspiProcessingInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]', 'int') AS [LoginTriggerAndResourceGovernorProcessingInMilliseconds]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers 
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') AS R(x)) a
where a.RecordType = 'LoginTimers'
order by a.recordtime 
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_CONNECTIVITY - TDS Data'
 
SELECT a.* FROM
(SELECT 
x.value('(//Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(30)') AS [RecordType], 
x.value('(//Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(30)') AS [RecordSource], 
x.value('(//Record/ConnectivityTraceRecord/Spid)[1]', 'int') AS [Spid], 
x.value('(//Record/ConnectivityTraceRecord/OSError)[1]', 'int') AS [OSError], 
x.value('(//Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') AS [SniConsumerError], 
x.value('(//Record/ConnectivityTraceRecord/State)[1]', 'int') AS [State], 
x.value('(//Record/ConnectivityTraceRecord/RecordTime)[1]', 'nvarchar(30)') AS [RecordTime],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') AS [TdsInputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') AS [TdsOutputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') AS [TdsInputBufferBytes],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]', 'int') AS [PhysicalConnectionIsKilled],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]', 'int') AS [DisconnectDueToReadError],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]', 'int') AS [NetworkErrorFoundInInputStream],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]', 'int') AS [ErrorFoundBeforeLogin],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]', 'int') AS [SessionIsKilled],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]', 'int') AS [NormalDisconnect]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers 
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') AS R(x)) a
where a.RecordType = 'Error'
order by a.recordtime
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_SECURITY_EORROR'
 
SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, rbf.[timestamp] - tme.ms_ticks, GETDATE()) as [Notification_Time],
cast(record as xml).value('(//SPID)[1]', 'bigint') as SPID,
cast(record as xml).value('(//ErrorCode)[1]', 'varchar(255)') as Error_Code,
cast(record as xml).value('(//CallingAPIName)[1]', 'varchar(255)') as [CallingAPIName],
cast(record as xml).value('(//APIName)[1]', 'varchar(255)') as [APIName],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time],tme.ms_ticks as [Current Time]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
where rbf.ring_buffer_type = 'RING_BUFFER_SECURITY_ERROR'
ORDER BY rbf.timestamp ASC
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_EXCEPTION'
 
SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as Time_Stamp,
cast(record as xml).value('(//Exception//Error)[1]', 'varchar(255)') as [Error],
cast(record as xml).value('(//Exception/Severity)[1]', 'varchar(255)') as [Severity],
cast(record as xml).value('(//Exception/State)[1]', 'varchar(255)') as [State],
msg.description,
cast(record as xml).value('(//Exception/UserDefined)[1]', 'int') AS [isUserDefinedError],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type], 
cast(record as xml).value('(//Record/@time)[1]', 'int') AS [Record Time],
tme.ms_ticks as [Current Time]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
cross join sys.sysmessages msg
where rbf.ring_buffer_type = 'RING_BUFFER_EXCEPTION' 
and msg.error = cast(record as xml).value('(//Exception//Error)[1]', 'varchar(500)') and msg.msglangid = 1033 
ORDER BY rbf.timestamp ASC

PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_RESOURCE_MONITOR to capture external and internal memory pressure'

SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime], 
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as [Notification_Time],  
cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') AS [Notification_type],  
cast(record as xml).value('(//Record/MemoryRecord/MemoryUtilization)[1]', 'bigint') AS [MemoryUtilization %],  
cast(record as xml).value('(//Record/MemoryNode/@id)[1]', 'bigint') AS [Node Id],  
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') AS [Process_Indicator],  
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') AS [System_Indicator], 
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@reversed)[1]', 'int') AS [reserved], 
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[1]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@reversed)[1]', 'int') AS [reserved],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[2]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@reversed)[1]', 'int') AS [reserved],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[3]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/MemoryNode/ReservedMemory)[1]', 'bigint') AS [SQL_ReservedMemory_KB],  
cast(record as xml).value('(//Record/MemoryNode/CommittedMemory)[1]', 'bigint') AS [SQL_CommittedMemory_KB],  
cast(record as xml).value('(//Record/MemoryNode/AWEMemory)[1]', 'bigint') AS [SQL_AWEMemory],  
cast(record as xml).value('(//Record/MemoryNode/SinglePagesMemory)[1]', 'bigint') AS [SinglePagesMemory],  
cast(record as xml).value('(//Record/MemoryNode/MultiplePagesMemory)[1]', 'bigint') AS [MultiplePagesMemory],  
cast(record as xml).value('(//Record/MemoryRecord/TotalPhysicalMemory)[1]', 'bigint') AS [TotalPhysicalMemory_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [AvailablePhysicalMemory_KB],  
cast(record as xml).value('(//Record/MemoryRecord/TotalPageFile)[1]', 'bigint') AS [TotalPageFile_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailablePageFile)[1]', 'bigint') AS [AvailablePageFile_KB],  
cast(record as xml).value('(//Record/MemoryRecord/TotalVirtualAddressSpace)[1]', 'bigint') AS [TotalVirtualAddressSpace_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [AvailableVirtualAddressSpace_KB],  
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],  
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],  
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time], 
tme.ms_ticks as [Current Time] 
FROM sys.dm_os_ring_buffers rbf 
cross join sys.dm_os_sys_info tme 
where rbf.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR' --and cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') = 'RESOURCE_MEMPHYSICAL_LOW' 
ORDER BY rbf.timestamp ASC


PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_SCHEDULER_MONITOR to Monitor system health'

SELECT  CONVERT (varchar(30), GETDATE(), 121) as runtime, DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE()) AS Notification_time,    a.* , sys.ms_ticks AS [Current Time]  
FROM   (SELECT x.value('(//Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS [ProcessUtilization],    
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle %],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/UserModeTime) [1]', 'bigint') AS [UserModeTime],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/KernelModeTime) [1]', 'bigint') AS [KernelModeTime],    
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/PageFaults) [1]', 'bigint') AS [PageFaults],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/WorkingSetDelta) [1]', 'bigint')/1024 AS [WorkingSetDelta],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/MemoryUtilization) [1]', 'bigint') AS [MemoryUtilization (%workingset)],   
x.value('(//Record/@time)[1]', 'bigint') AS [Record Time]  FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers    
WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR') AS R(x)) a  CROSS JOIN sys.dm_os_sys_info sys ORDER BY DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE())


 

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in SQL General, SQL Server Engine, SQL Server memory, SQL Server Tools | Tagged: , , , , , | 2 Comments »

SQL Server memory leak

Posted by Karthick P.K on December 4, 2012

What is memory leak?

When a process allocates memory it is supposed to de-allocate it and release it back to OS. If it misses to de-allocate the memory due to flaw in code it is called as leak and It can cause memory pressure both to the operating system and application.

 

Myth about SQL Server memory leak

SQL Server memory management is designed to dynamically grow and shrink its memory based on the amount of available memory on the system and Max server memory setting in SQLServer.

Many times system admins look at the memory usage of SQLServer and assume SQLServer is leaking memory if they find SQL Server memory usage is high.

This is incorrect SQL Server is server based application and its memory manager is designed in such a way that it will keep growing its memory usage on need (Exception large pages) and will not scale down its usage unless there is low memory notification from Windows. We can control the memory usage of SQL Server using Max server memory setting in SQLServer. This setting limits the Bpool usage of SQL Server and doesn’t control the overall memory usage of SQLServer. There are portions of SQLServer memory that is allocated outside BPOOL (aks: MTL or MTR) we do not have a way to control how much memory SQL Server can use outside bpool, but non bool memory usage will be normally low and can be easily estimated by studying the components running in SQL Server.

Ex: If you want to set SQLServer to use only 10GB RAM on server. Consider how much memory SQL Server might need outside Bpool and set the “max server memory” setting accordingly. In this case if you estimate SQL Server will use 1.5GB outside Bpool then set the Max server memory to 8.5GB.

What can cause SQL Server Memory leak?

SQL Server code has a logic to allocate memory but doesn’t de-allocate it. If any of the components in SQL Server is causing a memory leak in SQL Server it can be identified easily using the DMV’s like sys.dm_os_memory_allocation,sys.dm_os_memory_clerks and sys.dm_os_memory_objects etc., but most of the memory leaks in SQL Server is caused by 3rd party Dll’s which are loaded in SQL Server process.

 

Note: All the memory allocations by Non SQL server Dll’s loaded in SQL Server will happens in “Mem to Leave”(outside the Bpool) and they are called as direct windows allocations (DWA) 

 

When there is out of memory conditions in SQL Server and if you suspect there is a memory leak.First thing to determine is who is consuming the memory. If SQL Server is not using majority of the memory in MemToLeave and still you get Mem to leave errors probably there is a leak and it caused by some DLL’s loaded in

SQL Server. Refer Section 1 (MTL error) in https://mssqlwiki.com/sqlwiki/sql-performance/troubleshooting-sql-server-memory/

 

Below query can be used to determine actual memory consumption by SQL Server in MTL.

select sum(multi_pages_kb)  from sys.dm_os_memory_clerks

 

If the memory consumption by SQL Server is very low and still if you see SQL Server memory errors like few below then focus on Leaks.

 

Ex:

SQL Server 2000

                WARNING: Failed to reserve contiguous memory of Size= 65536.

                WARNING: Clearing procedure cache to free contiguous memory.

                Error: 17802 “Could not create server event thread.”

                SQL Server could not spawn process_loginread thread.

SQL Server 2005/2008

                Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 122880

 

 

 

How to identify and troubleshoot the memory leak?

 

There are multiple ways in windows to identify who is leaking memory in process. We will discuss how to identify the memory leak using  

 

1. Windows debugger 2. Debug diagnostics tools for windows and 3. UMDH in this blog.

 

Let us create a sample DLL to load in SQL server process to leak memory and see how to use the tools I mentioned above to troubleshoot the leak. 

 

Download HeapLeak.dll from This link and install Microsoft Visual C++ 2010 Redistributable Package from this links 32-Bit or 64-Bit to make this DLL work.

 

–Create  an extended stored procedure in SQL Server

exec sp_addextendedproc  'HeapLeak','C:\HeapLeakdll\HeapLeak.dll'

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

 

 

We will also enable below trace flags in SQL Server to automatically generate filter dump when there  is out of memory errors and see how to identify who is leaking.

 

 

dbcc traceon (2551,-1) — 2551 is used to enable filter dump.

go

dbcc traceon (8004,-1) –8004 is used to take memory dump on first occurrence of OOM condition

go

–Note: Both the trace flags listed above are un-documented, So use it at your own risk and there is no guarantee that this trace flags will work in future versions of SQL Server

 

 

Once we enable the trace flag . We have to cause out memory error in SQL Server to generate OOM memory dump. We have leaked around 300 MB of memory from MTL by executing above extended SP 30 times.

 

Let use execute below script which create XML handles. Memory for xml handles is allocated from MTL we will get out of memory errors very soon because extended stored procedure which we executed has already leaked the memory.

(Do not run below XML script directly with out executing HeapLeak  Below script will cause OOM error because of handle created for each execution, but it is accounted as SQL Server allocation so will not help us to understand the  how to debug leaks caused by 3rd party DLL’s)

 

Note: 1. SQL Server memory dump will be generated in SQL Server error log folder. 
2. Size of MTL is 256 MB + Max worker threads *.5  in 32-Bit SQL Server.  So approximately 384 MB unless modified using –g switch.

 

DECLARE @idoc int
 
DECLARE @doc varchar(1000)
 
SET @doc ='<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
<Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
     <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
      <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
   </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
   <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
   <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
   </Order>           
</Customer>
</ROOT>'
 
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
 
go 10000

We will receive below error after few executions.

Msg 6624, Level 16, State 12, Procedure sp_xml_preparedocument, Line 1

XML document could not be created because server memory is low.

To analyze the dump download and Install Windows Debugger from http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

 

Step 1 (Load the memory dump file to debugger):

 

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

 

Step 2 (Set the symbol path to Microsoft symbols server):

 

on command window type

 

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

 

Step 3 (Load the symbols from Microsoft symbols server):

 

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

 

 

Step 4 (check if symbols are loaded):

 

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

:028> lmvm sqlservr

start    end        module name

01000000 02ba8000   sqlservr   (pdb symbols)          c:\websymbols\sqlservr.pdb\93AACB610C614E1EBAB0FFB42031691D2\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Oct 14 15:35:29 2005 (434F82E9)

    CheckSum:         01B73B9B

    ImageSize:        01BA8000

    File version:     2005.90.1399.0

    Product version:  9.0.1399.0

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0409.04e4

    CompanyName:      Microsoft Corporation

    ProductName:      Microsoft SQL Server

    InternalName:     SQLSERVR

    OriginalFilename: SQLSERVR.EXE

    ProductVersion:   9.00.1399.06

    FileVersion:      2005.090.1399.00

    FileDescription:  SQL Server Windows NT

    LegalCopyright:   © Microsoft Corp. All rights reserved.

    LegalTrademarks:  Microsoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

    Comments:         NT INTEL X86

 

Step 5 : (!address to display the memory information)

 

Use !address command to display the memory information of the process from dump.

 

 

0:028> !address -summary

 

——————– Usage SUMMARY ————————–

    TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage

   686a7000 ( 1710748) : 81.58%    81.80%    : RegionUsageIsVAD

     579000 (    5604) : 00.27%    00.00%    : RegionUsageFree

    4239000 (   67812) : 03.23%    03.24%    : RegionUsageImage

     ea6000 (   15000) : 00.72%    00.72%    : RegionUsageStack

      1e000 (     120) : 00.01%    00.01%    : RegionUsageTeb

   122d0000 (  297792) : 14.20%    14.24%    : RegionUsageHeap

          0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap

       1000 (       4) : 00.00%    00.00%    : RegionUsagePeb

       1000 (       4) : 00.00%    00.00%    : RegionUsageProcessParametrs

       1000 (       4) : 00.00%    00.00%    : RegionUsageEnvironmentBlock

       Tot: 7fff0000 (2097088 KB) Busy: 7fa77000 (2091484 KB)

 

——————– Type SUMMARY ————————–

    TotSize (      KB)   Pct(Tots)  Usage

     579000 (    5604) : 00.27%   : <free>

    4239000 (   67812) : 03.23%   : MEM_IMAGE

     5fc000 (    6128) : 00.29%   : MEM_MAPPED

   7b242000 ( 2017544) : 96.21%   : MEM_PRIVATE

 

——————– State SUMMARY ————————–

    TotSize (      KB)   Pct(Tots)  Usage

   1b7bd000 (  450292) : 21.47%   : MEM_COMMIT

     579000 (    5604) : 00.27%   : MEM_FREE

   642ba000 ( 1641192) : 78.26%   : MEM_RESERVE

 

Largest free region: Base 00000000 – Size 00010000 (64 KB)

 

 

Look at the RegionUsageHeap it is around 297792 KB and largest free region is just 64KB. We know SQL Server doesn’t use Heap’s extensively so normally the heap allocated by SQL Server will not go beyond few MB. In this case it is consuming around 290 MB and so other components which use MTL can easily fail.

Let us try to understand why the Heap is around 297792 KB and try to identify if there is  a pattern.

 

Step 6: (Let us use !heap –s to display summary information about the heap)

 

 

0:028> !heap -s

LFH Key                   : 0x672ddb11

  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast

                    (k)     (k)    (k)     (k) length      blocks cont. heap

—————————————————————————–

000d0000 00000002    1024    896    896      6     1     1    0      0   L 

001d0000 00008000      64     12     12     10     1     1    0      0     

002c0000 00001002    1088     96     96      2     1     1    0      0   L 

002e0000 00001002      64     52     52      3     2     1    0      0   L 

007c0000 00001002      64     64     64     56     1     0    0      0   L 

00d10000 00001002     256     24     24      8     1     1    0      0   L 

340b0000 00001002      64     28     28      1     0     1    0      0   L 

340c0000 00041002     256     12     12      4     1     1    0      0   L 

342a0000 00000002    1024     24     24      3     1     1    0      0   L 

34440000 00001002      64     48     48     40     2     1    0      0   L 

61cd0000 00011002     256     12     12      4     1     1    0      0   L 

61d10000 00001002      64     16     16      7     1     1    0      0   L 

61d20000 00001002      64     12     12      4     1     1    0      0   L 

62a90000 00001002    1024   1024   1024   1016     2     0    0      0   L 

62b90000 00001002    1024   1024   1024   1016     2     0    0      0   L 

62c90000 00001002     256     40     40      7     1     1    0      0   LFH

00770000 00001002      64     16     16      2     2     1    0      0   L 

63820000 00001002      64     24     24      3     1     1    0      0   L 

63830000 00001001   10240  10240  10240    160    21     0    0    bad     

64230000 00001001   10240  10240  10240    160    21     0    0    bad     

64c30000 00001001   10240  10240  10240    160    21     0    0    bad     

65630000 00001001   10240  10240  10240    160    21     0    0    bad     

66030000 00001001   10240  10240  10240    160    21     0    0    bad     

66a30000 00001001   10240  10240  10240    160    21     0    0    bad     

67430000 00001001   10240  10240  10240    160    21     0    0    bad     

68130000 00001001   10240  10240  10240    160    21     0    0    bad     

68b30000 00001001   10240  10240  10240    160    21     0    0    bad     

69530000 00001001   10240  10240  10240    160    21     0    0    bad     

69f30000 00001001   10240  10240  10240    160    21     0    0    bad     

6a930000 00001001   10240  10240  10240    160    21     0    0    bad     

6b330000 00001001   10240  10240  10240    160    21     0    0    bad     

6bd30000 00001001   10240  10240  10240    160    21     0    0    bad     

6c730000 00001001   10240  10240  10240    160    21     0    0    bad     

6d130000 00001001   10240  10240  10240    160    21     0    0    bad     

6db30000 00001001   10240  10240  10240    160    21     0    0    bad     

6e530000 00001001   10240  10240  10240    160    21     0    0    bad     

6ef30000 00001001   10240  10240  10240    160    21     0    0    bad     

6f930000 00001001   10240  10240  10240    160    21     0    0    bad     

70330000 00001001   10240  10240  10240    160    21     0    0    bad     

70d30000 00001001   10240  10240  10240    160    21     0    0    bad     

7a160000 00001001   10240  10240  10240    160    21     0    0    bad     

7ab60000 00001001   10240  10240  10240    160    21     0    0    bad     

7b560000 00001001   10240  10240  10240    160    21     0    0    bad     

7d0d0000 00001001   10240  10240  10240    160    21     0    0    bad     

7e030000 00001001   10240  10240  10240    160    21     0    0    bad     

7ea30000 00001001   10240  10240  10240    160    21     0    0    bad     

67f90000 00001003     256     16     16     14     1     1    0    bad     

71850000 00001003     256      4      4      2     1     1    0    bad     

71890000 00001003     256      4      4      2     1     1    0    bad     

67fd0000 00001002      64     16     16      4     1     1    0      0   L 

718d0000 00001003     256     40     40      3     1     1    0    bad     

71910000 00001003     256      4      4      2     1     1    0    bad     

71950000 00001003     256      4      4      2     1     1    0    bad     

71990000 00001003     256      4      4      2     1     1    0    bad     

67ff0000 00001002      64     16     16      4     1     1    0      0   L 

719d0000 00001003    1792   1352   1352      5     2     1    0    bad     

71a10000 00001003     256      4      4      2     1     1    0    bad     

71a50000 00001003     256      4      4      2     1     1    0    bad     

71a90000 00001002      64     16     16      1     0     1    0      0   L 

—————————————————————————–

 

 

If you look at the above out put you can clearly identify a pattern. There are multiple created and each of them is 10 MB. But how to identify who actually created them?

 

Step 7:

 

Let us pickup one of the heap which is 10 MB and display all the entries (allocations) with in this 10 MB heap using !heap with –h parameter

 

Heap I have picked is 63830000.

 

 

0:028> !heap -h 63830000

Index   Address  Name      Debugging options enabled

19:   63830000

    Segment at 63830000 to 64230000 (00a00000 bytes committed)

    Flags:                00001001

    ForceFlags:           00000001

    Granularity:          8 bytes

    Segment Reserve:      00100000

    Segment Commit:       00002000

    DeCommit Block Thres: 00000200

    DeCommit Total Thres: 00002000

    Total Free Size:      00005048

    Max. Allocation Size: 7ffdefff

    Lock Variable at:     00000000

    Next TagIndex:        0000

    Maximum TagIndex:     0000

    Tag Entries:          00000000

    PsuedoTag Entries:    00000000

    Virtual Alloc List:   63830050

    UCR FreeList:        63830588

    FreeList Usage:      00000000 00000000 00000000 00000000

    FreeList[ 00 ] at 63830178: 6422de88 . 638ad7e0      Unable to read nt!_HEAP_FREE_ENTRY structure at 638ad7e0

(1 block )

    Heap entries for Segment00 in Heap 63830000

        63830608: 00608 . 00040 [01] – busy (40)

        63830648: 00040 . 02808 [01] – busy (2800)

        641b6698: 02808 . 02808 [01] – busy (2800)

        ……………………………………

        ……………………………………

        ……………………………………

        ……………………………………

       

Step 8: (Let us pickup one of the heap entry (allocation) and try to identify what is in it)

 

 

0:028> db 641b6698

641b6698  01 05 01 05 93 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

641b66a8  6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20  ng the heap for

641b66b8  64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b  demo at MSSQLWIK

641b66c8  49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00  I.COM………..

641b66d8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b66e8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b66f8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b6708  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

 

0:028> db 63830648

63830648  01 05 08 00 89 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

63830658  6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20  ng the heap for

63830668  64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b  demo at MSSQLWIK

63830678  49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00  I.COM………..

63830688  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

63830698  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306a8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306b8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

 

 

Similarly you can dump multiple heap allocations to identify a pattern.

 

Now if you look at the memory dumped you see a string which might help you to identify the DLL which created the heap. There is a pattern in above heaps. All the heap allocations have below string

“Iam filing the heap for demo at MSSQLWIKI.COM”

 

Note : You can use L Size to dump more memory using db or dc command’s example db 63830648 L1500

 

Step 9:

Let us open the DLL which we loaded in SQL Server for testing using notepad and see if there is string which matches the pattern

 

clip_image002

 

 

Yes there is which proves that this DLL’s has caused the leak. In real time you may have to play with different heap allocations to identify the pattern.

 

This is one way to find the leaks from the memory dump after the leak has actually happened. It may not be always easy to find a pattern and identify the modules who allocated the memory, In such scenarios  you may have to track the leak using the tools like debug diagnostic tool, UMDH etc.In the my next blog I will post how to track memory leak using Debug diagnostics tool.

Continued in Debugging memory Leaks using Debug diagnostic tool

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

Related posts:

Basics of SQL Server Memory Architecture

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, Memory, Performance, SQL General, SQL Server Engine | Tagged: , , , , | 46 Comments »

SQL Server Query optimization

Posted by Karthick P.K on November 6, 2012

SQL Server Query optimization (or) Tuning slow queries in SQL Server.

How to troubleshoot (or) tune slow queries in SQL Server, Optimize slow queries to run faster , resolve error sql server -2147217871 Query timeout expired  and make them run faster?

A query in considered to be slow when it is executing for longer duration than expected. Total duration of the query can be broken in to compile time, CPU time and Wait time.

 

Before you start troubleshooting the query which is running for longer duration, Identify if the query is slow because it is long waiting (or) Long running (or) Long compiling.

 

Compile time:Time taken to compile the query.  compile time can be identified by looking at the

 

1. CompileTime=”n”  in XML plan

2. SQL Server parse and compile time when Set statistics time on is enabled.

 

CPU time: Time taken by the query in CPU (Execution time – (compile time+ wait time). CPU time can be identified by looking at the

 

1. CPU column in profiler.

2.  CPU time under SQL Server Execution Times when statistics time on is enabled.

 

Execution time: Time taken by the query for complete execution ( Execution time = CPU time (CPU time for compilation+execution) +Wait time). Total duration of the query can be identified by using the

 

1.Duration column in profiler

2. SQL Server Execution Times, elapsed times when statistics time on is enabled.

 

What is long waiting query?

 

A query is considered to be long waiting query, when it spend most of its time waiting for some resource.

 

How to identify if the query is long waiting?

 

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set .

When a query is waiting for a resource (such as lock, network I/O, Page_I/O Etc) it will not  consume CPU.  So if you see duration being higher than CPU (Difference between Duration and  CPU is wait time),It indicates that the query has spent large amount of time waiting for some resource.

               

Let us see an example of long waiting query. I have collected profiler trace while executing the query.

 

set statistics io on

set statistics time on

go

–Place your query here

select top 10000 * from a

go

set statistics io off

set statistics time off

go

 

 

clip_image001[15]

Look at the Duration and CPU column in the profiler Cpu=256 and duration =1920. So this query has spent majority of time waiting for some resource.

clip_image002[16]

Look at the output of statistics time and statistics I/O in above image.

SQL Server has spent only 2 milliseconds compiling the query and 256 milliseconds on CPU, but the overall duration was 1920 milliseconds so the query has spent maximum time waiting for some resource.

 

Identify the resource in which this query is waiting on using one of the steps listed below.

1. Look at the wait type column of the sysprocesses for the spid which is executing query while the query is executing.

2. If there is no other activity on the server collect sys.dm_os_wait_stats output before and after the query execution and identify the wait (Will not help in tuning queries running for short duration)

3. Collect XEvent to gather the wait stats of individual query.

 

Once you identify the resource in which the query is waiting on tune the resource. Most of the times queries would be slow waiting for below resource.

 

PAGEIOLATCH_* or Write log: This indicates I/O resource bottleneck follow the detailed troubleshooting steps mentioned in  This Link to fix the I/O bottleneck. If you find SQL Server spawning excessive I/O Create necessary indexes.

a. Logical reads + Physical reads in statistics I/O output (Refer above image) or Reads and writes in profiler will indicate the I/O posted by this query. If you see very high reads for query compared with the result rest retuned by query it is an indication of  missing indexes or bad plan. Create necessary indexes (You can use DTA for index recommendations.).

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO. 

a.This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid The wait_resource is the page number (the format is  dbid:file:pageno) that is being accessed. 

b. We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine  whether contention  is for allocation, data or text.

c. If the pages that SQL Server is most frequently waiting on are in Tempdb database ,check the wait resource column for a page number in dbid 2 Ex(2:1:1 or 1:1:2). Enable TF 1118 and increase the number of TEMPDB data files and size them  equally (You may be facing tempdb   llocation latch contention mentioned in http://support.microsoft.com/kb/328551)

d. If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table.  In this case we need to choose a different clustered index key to spread the work across different pages.

LATCH_*:    Non-buf latch waits can be caused by variety of things.  We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101). 

a. A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

b. Auto Grow and auto shrink while query is executed.

c. Queries going for excessive parallelism.

Blocking (LCK*): Use the query in This Link  to identify the blocking. Tune the head blocker.

Asynch_network_io (or) network IO: Keep the result set returned by the query smaller. Follow detailed troubleshooting refer  This Link

Resource_semaphore waits: Make sure there is no memory pressure on the server Follow steps in This Link  for detailed troubleshooting.

SQL Trace: Stop all the profiler traces running on the server. Identify the traces which are running on the server using the query in This Link

Cx packet: Set the Max degree of parallelism. But remember Cxpacket wait type is not always a problem.

a. For servers that have eight or less processors, use the following configuration where N equals the number of processors: max degree of parallelism = 0 to N .

b. For servers that use more than eight processors, use the following configuration: max degree of parallelism = 8.Refer This Link

SOS_SCHEDULER_YIELD : Identify if there is CPU bottleneck on the server. This waiting means that the thread is waiting for CPU.

a.  SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) Will ( is expected to) yield back to SQL Server scheduler when it exceeds 4ms and before it yields back it check if there are any other runnable threads, If there is any runnable threads then the thread which is in top of runnable list is  scheduled and current thread will go to the tail of the runnable list and will get rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum. The time thread spends in runnable list waiting for its quantum is accounted as SOS_SCHEDULER_YIELD. You will see this type when multiple threads are waiting to get CPU cycle. Follow trouble shooting the steps mentioned This Link

Important: In SQL Server instances when there more than 1 CPU it is possible that the CPU is higher than the duration. Because CPU is sum of time spend by query in all the CPU’s when choosing a parallel whereas the duration is actual duration of the query.

 

What is long running query?

A query is considered to be long running query, when it spend most of its time on CPU and not waiting for some resource.

How to identify if the query is long running ?

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set . If the CPU and duration is close than the query is considered to be long running. If the query is long running identify where the query spend the time ,It could be for compiling or post compilation (For executing the query). compare the duration of the query with CompileTime (XML plan compile time (or) SQL Server parse and compile time when statistics time is on refer above image).

High Compile time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on).Compile time will normally be in few millisecond . Follow the below steps if you see high compile time

1. Identify if you have large token perm refer http://support.microsoft.com/kb/927396

2. Create necessary indexes and stats. Tune the query manually (or) in DTA and apply the recommendation

3. Reduce the complexity of query. Query which joins multiple tables (or) having large number of IN clause can taking  a while to compile.

4. You can reduce the compile’s by using force parameterization option.  

High CPU time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on).  If the compile time is very low compared to the duration. Then follow the below steps.

1. Update the stats of tables and indexes used by the query (If the stats are up to date Estimated rows and estimated execution will  be approximately same in execution plan .If there is huge difference stats are out dated and requires update) .

2. Identify if the query has used bad plan because of parameter sniffing (If the ParameterCompiledValue and ParameterRuntimeValue is different in XML plan). Refer THIS LINK to know more about Parameter Sniffing

3. If updating the stats and fixing the parameter sniffing doesn’t resolve the issue it is more likely optimizer is not able to create efficient plan because of lack of indexes and correct statistics. Run the query which is driving the CPU in database tuning advisor and apply the recommendations. (You will find missing index detail in xml plan but DTA is more efficient).

4. If the query which is running longer and consuming CPU is linked server query try changing the security of linked server to ensure linked server user has ddl_admin or dba/sysadmin on the remote server. More details regarding the issue in THIS LINK.

5. Ensure optimizer is not aborting early and creating bad plan. For details refer THIS LINK

6. Ensure the query which is spiking the CPU doesn’t have plan guides (xml plan will have PlanGuideDB attribute.  Also sys.plan_guides will have entries) and query hints(index= or (option XXX join) or inner (Join Hint) join).

7.  Ensure that SET options are not changed.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

Posted in Performance, SQL General, SQL Query | Tagged: , , , , , , , , , | 11 Comments »

SQL Server 2012 Memory

Posted by Karthick P.K on October 21, 2012

SQL Server 2012 has made many changes to the memory manager to govern the SQL Server memory consumption in efficient way compared with earlier versions. Important changes to SQL Server 2012 memory which every DBA should be aware of is documented in this blog. If you are not familiar with the SQL Server memory architecture of earlier versions I would recommend reading THIS ARTICLE  before you continue with changes in Denali memory manager.

clip_image001
Max Server Memory

In previous versions of SQL Server “Max Server Memory” controlled the Maximum physical memory Single page allocator (BPOOL)  can consume in SQL Server user address space.

Only the single page allocator was part of BPOOL and Max server memory controlled only BPOOL, so the following allocations came outside BPOOL (Max server memory)

1.Multi-Page allocations from SQL Server [These are allocations which request more > 8 KB and required contiguous memory]

2.CLR allocations [These include the SQL CLR heaps and its global allocations created during startup]

3.Memory used for thread stacks within SQL Server process (Max worker threads * thread stack size). Thread stack size is 512K in 32 bit SQL Server, 904 K in WOW mode and 2 MB in 64-Bit 

4.Direct windows allocations made by Non-SQL Server dll’s ([These include windows heap usage and direct virtual allocations made by modules loaded into SQL Server process. Examples: allocations from extended stored procedure dll’s, objects created using OLE Automation procedures (sp_OA calls), allocations from linked server providers loaded in sqlserver process)

 

SQL Server 2012 memory manager has now clubbed single page allocator and multipage allocator together  as any-size page allocator . As a result, the any-size page allocator now manages allocations categorized in the past as single page and Multi-Page allocations.

1. "max server memory" now controls and includes “Multi pages allocations”.

2. In earlier versions of SQL Server CLR allocated memory was outside BPOOL (Max server memory)   . SQL Server 2012 includes SQL CLR allocated memory in "max server memory".

SQL Server 2012 "max server memory" configuration does not include only the following allocations:

1. Memory allocations for thread stacks within SQL Server process

2. Memory allocation requests made directly to Windows [Ex: Allocations (Heap,Virtualalloc calls ) from 3rd party Dll’s loaded in SQL Server process , objects created using OLE Automation procedures (sp_oa) etc]

These changes allow DBA’s to configure and control SQL Server more accurately in accordance with the memory requirements and using resource governor.

-g startup parameter

We used the -g startup option to change the default value of a region in SQL Server user address space known as "Memory-To-Reserve". This region was also known as "memory-to-leave or MTL.  The "Memory-To-Reserve" (or) -g configuration option are relevant only for a 32-bit instance of SQL Server.

Multi pages allocation and CLR was part of Mem-to-reserve (-g)  in In previous SQL Server versions until SQL Server 2008 R2 , From Denali they are part of BPOOL (Controlled by Max server memory)  So you may have to remove –g if you have set it to give space for multipage allocator or CLR in earlier versions and migrating to Denali now.

 

AWE feature removed from SQL Server 2012

AWE feature was used in earlier versions of 32-Bit SQL Server to address more than 4GB of memory . This feature is now removed in Denali  refer:"AWE deprecation".  So if you need more memory then you may need to migrate to 64-Bit SQL server.

 

Locked pages in memory

Trace flag 845 is no more required to Lock Pages in memory. As long as the startup account of SQL Server has “Lock pages in memory” privilege Datacenter, Enterprise, standard and Business intelligence edition will use AWE allocator Api’s for memory allocations in BPOOL and this allocations will be locked.

 
Dynamic virtual address space management

In earlier versions of SQL Server 32-Bit we reserved Bpool at the startup and remaining addresses are left for MTL (Memory to reserve or Memory to leave) . In Denali virtual address space management is dynamic (we  don’t reserve at startup) , So it is possible for 3rd part components to use more memory than what is  configured in –g parameter.

 
SQLCLR loaded at startup

In earlier SQL Server versions, Common language runtime (CLR) functionality is initialized inside SQL Server process when the first SQL CLR procedure or function is invoked. SQL Server 2012 performs SQL CLR initialization at startup. The initialization is independent of the ‘clr enabled’ configuration option.

You will notice the following messages in the SQL Server error log during server startup:

2012-10-18 15:23:13.250 spid8s       Starting up database ‘master’.

2012-10-18 15:23:13.930 Server       CLR version v4.0.30319 loaded.

Total Physical memory and memory model used

Total physical memory available on the server and the memory model  used is logged in SQL Server error log

2012-10-18 15:23:06.690 Server       Detected 131067 MB of RAM. This is an informational message; no user action is required.

2012-10-18 15:23:06.700 Server       Using locked pages in the memory manager

2012-10-22 15:32:20.450 Server       Detected 131067 MB of RAM. This is an informational message; no user action is required.
2012-10-22 15:32:20.450 Server       Using conventional memory in the memory manager.

 

DMV and Performance counter changes

In earlier version of SQL Server most of the DMV’s used single_pages_kb and  multi_pages_kb to refer allocations by SQL Server with in BPOOL and outside BPOOL. Now they are represented together as  pages_kb. More details in THIS link

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

 

Posted in Memory, SQL General, SQL Server Engine, SQL Server memory | Tagged: , , , , , , , , , | 19 Comments »

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

Posted by Karthick P.K on October 16, 2012

 

I have got few request’s from  SQL Server DBA’s in past to blog about analyzing SQL Server exceptions and assertions . After seeing lot of DBA’s getting stuck when they get EXCEPTION_ACCESS_VIOLATION (or) Assertion in SQL ServersI decided to write this blog.

This blog is published with intention to make DBA’s analyze and resolve EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion before contacting Microsoft support.  Exception and assertion are two different things. SQL handles both assertions and exceptions by writing the current thread’s stack to the Error log and generating a dump.  In simple An exception is an event that occurs during the execution of a program, and requires the execution of code outside the normal flow of control and assertion is the check that the programmer inserted into the code to make sure that some condition is true, If it returns false an assert is raised. SQL handles both assertions and exceptions by writing the current thread’s stack to the Error log and generating a dump, so trouble shooting steps are similar. 

 

You will find messages similar to one below in SQL Serve error logs when you get Exception or EXCEPTION_ACCESS_VIOLATION .

{

Error

External dump process returned no errors.
Using ‘dbghelp.dll’ version ’4.0.5′
SqlDumpExceptionHandler: Process 510 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
* *******************************************************************************
* BEGIN STACK DUMP:
*  Exception Address = 000000007752485C Module(ntdll+000000000002285C)

*   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION

*   Access Violation occurred reading address 0000041EA9AE2EF0

* Input Buffer 510 bytes –

ex_terminator – Last chance exception handling

}

You will find messages similar to one below in SQL Server error logs when you get an Assertion.

{

Error

spid323     Error: 17065, Severity: 16, State: 1.

spid323     SQL Server Assertion: File: < .cpp>, line = 2576 Failed Assertion = ‘fFalse’  This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted

SQL Server Assertion: File: <   .cpp>, line=2040 Failed Assertion =

}

To analyze the dump download and Install Windows Debugger from This Link 

 

Step 1 (Load the memory dump file to debugger):

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

Step 2 (Set the symbol path to Microsoft symbols server):

on command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3 (Load the symbols from Microsoft symbols server):

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

 

Step 4 (check if symbols are loaded):

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr

start             end                 module name

00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)

    CheckSum:         025FEB5E

    ImageSize:        02679000

    File version:     2005.90.4266.0

    Product version:  9.0.4266.0

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5 (Switch to exception context):

Type .ecxr

Step 6(Get the stack of thread which caused exception or assertion):

Type  kC  1000    //You will get the stack of thread which raised exception or assertion .

I have pasted one of the sample stack below, from the exception dump which I worked recently.  First thing to identify from stack is who is raising the exception. In the below stack look at the portion which is highlighted in red (In each frame before the ! symbol), that is the module which raised the exception (Exe or DLL name ).

If Exe/DLL name is Non Microsoft  module (Exe or DLL name ) then the exception is being caused by a third party component, you will need to work with the company that provided that component to get a solution. lmvm Exe/DLL name will give you the company name. For example: lmvm wininet

If Exe/DLL name is  SQLServr  (or) any other SQL Server modules then the exception is raised by SQL Server, In that case type kC 1000 and paste the stack in comments session of this blog (or) When you start thread in MSDN forums (or) In This face book group. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

Note: When you get Assertion make sure you post message line which contains   SQL Server Assertion: File: <Filename.cpp>, line = 2576 Failed Assertion =  ”  

 

0:000> kC 1000

Call Site

wininet!InternetFreeThreadInfo+0x26

wininet!InternetDestroyThreadInfo+0x40

wininet!DllMain_wininet+0xb5

wininet!__DllMainCRTStartup+0xdb

ntdll!LdrShutdownThread+0x155

ntdll!RtlExitUserThread+0x38

msvcr80!_endthreadex+0x27

msvcr80!_callthreadstartex+0x1e

msvcr80!_threadstartex+0x84

kernel32!BaseThreadInitThunk+0xd

ntdll!RtlUserThreadStart+0x1d

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Related posts:

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Debugging, SQL General, SQL Server Engine, Startup failures | Tagged: , , , , , , , , , , , , , , , | 254 Comments »

SQL Server Parameter sniffing

Posted by Karthick P.K on October 8, 2012

When a stored procedure, prepared queries and queries submitted via sp_executesql  is compiled for the first time, the values of the parameters supplied with the execution call are used for cardinality estimation, to optimize the statements within that stored procedure and create the query plan. This is known as parameter sniffing because the optimizer sniffs the current parameter value during compilation.

If these values are typical and the data distribution is even in the underlying tables, all the calls to the stored procedure will benefit from this query plan since the plan is reused. However, parameter sniffing can cause problems if the "sniffed" parameter value is not typical of the values which are actually used during a typical execution or the data in underlying tables are very skewed, because plan generated for “sniffed” parameter value may not be optimal for current parameter passed and since the plan is reused there can be performance degradation.

Consider the following scenario we have a table with two columns (country and some column ). This table has 10001 rows.  10000 rows has USA in country column and 1 row has brazil in country column.

 

This table has NONCLUSTERED INDEX called NC on country column.

 

 

create table data(country char(10),somecolumn char(10))

go

insert into data values (‘BRAZIL’,‘somedata’)

go

insert into data values (‘USA’,‘somedata’)

go 10000

 

CREATE NONCLUSTERED INDEX [NC] ON [dbo].[data]

(

[country] ASC

)WITH (STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

GO

 

create proc sniffing @p1 char(10)

as

begin

select country,somecolumn from data where country=@p1

end

Go

 

 

 

–Let us execute stored procedure sniffing with the with parameter brazil.

 

exec sniffing ‘BRAZIL’

go

 

clip_image002

 

 

 

 

Optimizer picked up Index-seek in Non-clustered index and Row-ID lookup on table.

 

What happens when we execute the same procedure with parameter ‘USA’. Since the plan is already created and cached for ‘BRAZIL’ it is reused and plan which is generated for BRAZIL is Not an optimal plan for parameter USA.

exec sniffing ‘USA’ 

go

 

 

 

clip_image004

 

 

How to identify if the optimizer is using plan which compiled for sniffed parameter values  and not the current parameters value.

 

Let us enable statistics xml on

 

 

set statistics xml on

 

exec sniffing ‘USA’ 

go

 

Look at the XML plan for the ParameterCompiledValue and ParameterRuntimeValue.

 

Below is extract from XML plan and this output proves that the plan is compiled for parameter BRAZIL (ParameterCompiledValue) and it is used for parameter USA (ParameterRuntimeValue)

{

<ColumnReferenceColumn="@p1"ParameterCompiledValue="‘BRAZIL    ‘"ParameterRuntimeValue="‘USA       ‘" />

}

 

 

 

We will also see a huge difference in estimated and actual rows count if the parameter sniffing is impacting the plan

(Remember out dated stats can also cause optimizer to estimate incorrect rows so difference in estimate and actual rows doesn’t mean it is because of parameter sniffing ). 

 

What would have been the optimal plan if the parameter ‘USA’?

 

 

Let us execute the same procedure with recompile option

 

 

exec sniffing ‘USA’ with recompile

go

 

 

clip_image006

 

 

 

 

How to fix Parameter sniffing?

 

1. USE RECOMPILE: when you create the stored procedure. so the parameter is compiled every time it is called. This method can be used if the compile time is very less compared to execution time of bad plan

   Ex: create proc sniffing @p1 char(10) with recompile

 

2.  OPTION (RECOMPILE): for the statement which impacted by the parameter sniffing.  If the procedure has multiple statements recompile will impact only the particular statement.

 

3.  OPTIMIZE FOR HINT: Instructs the query optimizer to use a particular value for a local variable when the query is compiled and optimized (or)  OPTIMIZE FOR  UNKNOWN WHICH Instructs the query optimizer to use statistical data

 

instead of the initial values for all local variables when the query is compiled and optimized.  This value is used only during query optimization and actual values are used during execution.

{

alter proc sniffing @p1 char(10)  as
begin
select country,somecolumn from data where country=@p1
option (optimize for (@p1 = ‘USA’))

–option (optimize for (@p1 unknown))
end
}

 

4.   Assign the incoming parameter values to local variables and use the local variables in the query.  If you are in SQL Server2000 in which we don’t have OPTIMIZE FOR hint.

      Ken Henderson has blogged about it in http://blogs.msdn.com/b/khen1234/archive/2005/06/02/424228.aspx

 

5.   Trace Flag 4136 which is introduced in SQL Server 2008 R2 Cumulative Update 2, SQL Server 2008 SP1 Cumulative Update 7 and SQL Server 2005 SP3 Cumulative Update 9 introduce trace flag 4136 that can be used to disable  the "parameter sniffing" process  more details on http://support.microsoft.com/kb/980653

 

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Optimizer, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , | 6 Comments »

Optimizer Timeout or Optimizer memory abort

Posted by Karthick P.K on October 7, 2012

Optimizer Timeout

When the query processor finds itself consuming a lot of time optimizing a query, it may decide to stop the optimization process abruptly, and choose the best available plan. This is to ensure that the optimizer doesn’t end up optimizing forever. This is  called optimizer timeout (based on the number of plans considered relative to the cost of the best plan so far).

Optimizer memory abort

When queries become more complex number of potential plans to consider can quickly grow in thousands. Optimizer has limit for memory it is allowed to use , when the optimizer reaches the limit it ends with  optimizer memory abort.

When  timeout or memory abort happens optimizer might choose the best plan  from plans which was generated till timeout or abort and it might be far from optimal plan so the query execution can take long time and consume resource.

On SQL 2000 and earlier the only way to detect this condition is compiling the query with trace flag 8675.   If one of these conditions occur the output will reflect a timeout abort or memory abort, similar to the following:

 

End of simplification, time: 2.869 net: 2.869 total: 2.869 net: 2.869

end exploration, tasks: 200094 no total cost time: 16.17 net: 16.169 total: 19.04 net: 19.039

*** Optimizer time out abort at task 614400 ***

*** Optimizer time out abort at task 614400 ***

 

Msg 8623, Level 16, State 1, Line 3

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

End of simplification, time: 0.156491 elapsed: 0.156491

end exploration, tasks: 1614 no total cost time: 0.552436 elapsed: 0.708927

end search(0),  cost: 1275.32 tasks: 3888 time: 0.195008 elapsed: 0.903935

end exploration, tasks: 7596 Cost = 1275.32 time: 0.548032 elapsed: 1.45197

end search(1),  cost: 1263.15 tasks: 21985 time: 2.30564 elapsed: 3.75761

*** Optimizer memory usage abort ***

End of optimization,  elapsed: 2.98304

From SQL server 2005 to determine whether the query optimizer times out or MemoryLimitExceeded search for the 
StatementOptmEarlyAbortReason="TimeOut" (or) StatementOptmEarlyAbortReason="MemoryLimitExceeded" expression in the XML plan output.

 

We can avoid optimizer from timing out and picking bad plan by enabling trace flag –T8780. This increases the time limit before the timeout occurs.

Note: Don’t enable this trace flag at server level , enable it only for the session which runs the query and identify if the optimizer is picking up a better plan. If you see optimizer picking up the better plan, right approach is to tune the query manually or using DTA and apply the recommendations . You can use this trace flag till you apply the recommendations made by DTA.

 

If you experience “ Optimizer memory usage abort” use “SQLServer:Memory Manager\Optimizer Memory (KB)” counter to the amount used for compilation .

select * from sys.dm_os_memory_clerks where type=’MEMORYCLERK_SQLOPTIMIZER’ will tell us the overall memory used by optimizer.

We can also use the CompileMemory= expression in XML plan output starting from SQL server2005 SP2 which will give us the compile memory used by individual plans. If you find optimizer memory is very low then identify what is contributing to memory contention in SQL Server and tune it.

I will discuss compile memory in detail when I blog about Resource_semaphore  wait types.

 

Note: You may also receive below error because of few known issues documented in KB articles 982376, 946020,926773,917888 so if none of the fixes resolve the issue you may have to follow the same steps documented above.

{

"Msg 8623, Level 16, State 1, Line 1

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information."

}

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Performance, SQL General, SQL Query, SQL Server Engine, SQL Server memory | Tagged: , , , , , , , , , , | 24 Comments »

SQL Server Latch & Debugging latch time out

Posted by Karthick P.K on September 7, 2012

  

 

In a multithreaded process what would happens when a one thread updates a data or index page in memory while second thread is reading the same page?

What will happen when 1st  thread reads a data/index page in memory while 2nd thread is freeing the same page from memory?

Answer: We would end up with data or data structure inconsistency. To avoid inconsistency SQL Server uses Synchronization Mechanisms like Locks,Latches and Spinlocks.

 

We will discuss few key points about latches and how to debug latch timeout dumps in this blog.

What is Latch ?

They control the concurrent access to data pages and structures by multiple threads.  Latches provide physical data consistency of data pages and provide synchronization for data structures. Latches are not controllable by user like locks.

 

Types of the Latch:

Buffer (BUF) Latch

Used to synchronize access to BUF structures and their associated database pages.

 

Buffer “IO” Latch

A subset of BUF latches used when the BUF and associated data/index page is in the middle of an IO operation (Reading page from disk or writing page to disk).

 

Non-Buffer (Non-BUF) Latch

These are latches that are used to synchronize general in-memory data structures generally used by queries/tasks executed by parallel threads, auto grow operations , shrink operations etc. 

 

Latch modes

 

Keep (KP) Latches

Used to ensure that the page is not released from memory while it is in use. 

Shared (SH) Latches

Used for read-only access to data structures and prevent write access by others threads.

This mode allows shared access. 

SH is compatible with KP, SH, and UP.  It should be noted that although in general SH implies read-only access, it is not always the case. For buffer latches SH is the minimum mode required in order to read a data page.

Update (UP) Latches

Allows read access to the data structure(Compatible with SH and KP), but prevents other EX-latch access. 

Used for write operations when torn page detection is off and when AWE is not enabled.

Exclusive (EX) Latches

Prevents any read activity from occurring on the latched structure. EX is only compatible with KP.

Used during read IO during write IO when torn page detection is on or AWE is enabled.

Destroy (DT) Latches

Used when removing BUFs from the buffer pool, either by adding them to the free list or unmapping AWE buffers. 

 

 

Latch compatibility

  KP SH UP EX DT
KP Y Y Y Y N
SH Y Y Y N N
UP Y Y N N N
EX Y N N N N
DT N N N N N

 

 

How do you identify Latch contention?

 

Latch contention can be identified using below wait types in sysprocesses.

 

PAGEIOLATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on a physical I/O of a buffer pool page to complete. 

                                            1. PAGEIOLATCH_* are commonly solved by tuning the queries which are performing heavy IO (Commonly by adding, changing and removing indexes (or) statistics to reduce the amount of physical IO).

                                 2. Identifying if there is disk bottleneck and fixing them (Pageiolatch wait times (ex > 30 ms))

                      

                      

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO. 

1.       This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid. The wait_resource is the page number (the format is  dbid:file:pageno)

          that is being accessed. 

2.       We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine  whether contention  is for allocation, data or text.

3.       If the pages that SQL Server is most frequently waiting on are in tempdb database ,check the wait resource column for a page number in dbid 2. You may be facing tempdb allocation latch contention mentioned in    http://support.microsoft.com/kb/328551

4.       If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table.  In this case we need to choose a different

          clustered index key to spread the work across different pages.

 

LATCH_*:    Non-buf latch waits can be caused by variety of things.  We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101). 

1.       A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

2.       Auto Grow and auto shrink.

 

 

 

When a latch is requested by thread and If  that latch cannot be granted immediately because of some other thread holding a incompatible latch on same page or data structure then  the requestor must wait for the latch to be grantable.  Warning messages like one below is printed in SQL Server error log and a mini dump with all the threads is captures if the wait interval reaches 5 minutes (waittime 300). The warning message differs for buffer and non-buffer latches.

 

 

844: Time out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p.  Continuing to wait.

 

846: A time-out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit Id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Not continuing to wait.

 

847: Timeout occurred while waiting for latch: class ‘%ls’, id %p, type %d, Task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Continuing to wait.

 

Break up of above warning

type

The latch mode of the current latch acquire request.  This is a numerical value with the following mapping:  0 – NL (not used); 1 – KP; 2 – SH; 3 – UP; 4 – EX; 5 – DT.

 

task

Task for which we are trying to acquire latch.

 

Waittime

The total time waited for this latch acquire request in seconds.

 

owning task

The address of the Task that owns the latch, if available.

 

bp (Buffer latches only)

The address of the BUF structure corresponding to this buffer latch.

 

page (Buffer latches only.)

The page id for the page currently contained in the BUF structure.

 

database id (Buffer latches only.)

The database id for the page in the BUF.

 

 

Like troubleshooting blocking issues in SQL Server when there is a latch contention or timeout dump identify the owner of latch and troubleshoot why the latch is held by the owner for long time.

 

When there is latch timeout dump you will see a warning message similar to one below. Warning error message printed in SQL server errorlog before the dump is very important to find the owner thread of latch.

 

{

 

2012-01-18 00:52:03.16 spid69      A time-out occurred while waiting for buffer latch — type 4, bp 00000000ECFDAA00, page 1:6088, stat 0x4c1010f, database id: 4, allocation unit Id: 72057594043367424, task 0x0000000006E096D8 : 0, waittime 300, flags 0x19,

owning task 0x0000000006E08328. Not continuing to wait.

spid21s     **Dump thread – spid = 21, PSS = 0x0000000094622B60, EC = 0x0000000094622B70

spid21s     ***Stack Dump being sent to E:\Data\Disk1\MSSQL.1\MSSQL\LOG\SQLDump0009.txt

spid21s     * *******************************************************************************

spid21s     * BEGIN STACK DUMP:

spid21s     *   02/28/12 00:32:03 spid 21

spid21s     * Latch timeout

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_COUNT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_VIRTUAL_ROOT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

}

From the error message above we can easily understand we are trying to acquire latch on database id: 4, page 1:6088 (6088 page of first file) and we timed out because task 0x0000000006E08328 (owning task 0x0000000006E08328 in warning message)  is holding a latch on it.

Note: Task is simply a work request to be performed by the thread. (such as system tasks, login task, Ghost cleanup task etc.). Threads which execute the task will take required latches on need.

Let us see how to analyze latch timeout dump and get the owning thread of the Latch using the  owning task 0x0000000006E08328.

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr
start             end                 module name
00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb
    Loaded symbol image file: sqlservr.exe
    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe
    Image name: sqlservr.exe
    Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)
    CheckSum:         025FEB5E
    ImageSize:        02679000
    File version:     2005.90.4266.0
    Product version:  9.0.4266.0
    File flags:       0 (Mask 3F)
    File OS:          40000 NT Base
    File type:        1.0 App
    File date:        00000000.00000000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Step 5:

Use the below command to search thread stack to identify the thread which has reference to the owning task and it will be the thread which is owning the latch. Replace 0X0000000006E08328 with owning task in your errorlog

~*e .echo ThreadId:; ?? @$tid; r? @$t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? @$t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; s -d @$t1 @$t2 0X0000000006E08328 

ThreadId:
unsigned int 0x93c
ThreadId:
unsigned int 0x9a0
ThreadId:
unsigned int 0x9b4
00000000`091fdaf0  06e08328 00000000 00000000 00000000  (……………
00000000`091fdcb8  06e08328 00000000 091fdd70 00000000  (…….p…….
00000000`091fded0  06e08328 00000000 06e0e798 00000000  (……………
00000000`091fdf38  06e08328 00000000 00000002 00000000  (……………
00000000`091fec60  06e08328 00000000 0168883a 00000000  (…….:.h…..
00000000`091ff260  06e08328 00000000 000007d0 00000000  (……………
00000000`091ff2d0  06e08328 00000000 00000020 00000000  (……. …….
00000000`091ff5f8  06e08328 00000000 800306c0 00000000  (……………
00000000`091ff6c0  06e08328 00000000 00000000 00000000  (……………
00000000`091ff930  06e08328 00000000 00000000 00000001  (……………
00000000`091ff9b8  06e08328 00000000 00000000 00000000  (……………
00000000`091ffa38  06e08328 00000000 00000000 00000000  (……………
00000000`091ffc10  06e08328 00000000 03684080 00000000  (……..@h…..
00000000`091ffc90  06e08328 00000000 00000000 00000000  (……………
ThreadId:
unsigned int 0x9b8
ThreadId:
unsigned int 0x9bc
ThreadId:
unsigned int 0x9c0
……………
…………..

 

Step 6:

From the above out put we see thread 0x9b4 has reference to the pointer of owning task and it will be the thread which is owning the latch. Let us switch to the thread(0x9b4 ) which is executing the owning task and

then go through the stack to see why the thread is owning the latch for long time.

Step 7:

   ~~[0x9b4]s      ==> Switching to the thread (Replace 0x9b4 with your thread id which has reference to the po
ntdll!ZwWaitForSingleObject+0xa:
00000000`77ef047a c3              ret

Step 8:

0:002> kC  ==>  Print the stack
Call Site
ntdll!ZwWaitForSingleObject
kernel32!WaitForSingleObjectEx
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::Suspend
sqlservr!SOS_Event::Wait
sqlservr!BPool::FlushCache
sqlservr!checkpoint2
sqlservr!alloca_probe
sqlservr!ProcessCheckpointRequest
sqlservr!CheckpointLoop
sqlservr!ckptproc
sqlservr!SOS_Task::Param
::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!endthreadex
msvcr80!endthreadex

From the above stack we can understand that the thread which is owning the latch is executing checkpoint and flushing cache (Dirty buffers) to disk. If flushing buffers to disk (checkpoint) is taking a long time, then obviously there is disk bottleneck.

Similarly for any other latch time out issues first identify the owner thread of latch, read the stack of owner thread to understand the task performed by owner thread and troubleshoot the performance of task performed by owner thread.

If you want to see the stack of thread which is waiting, then pickup the task (task 0x0000000006E096D8 )from latch timeout warning message in errorlog instead of owning task (task 0x0000000006E08328) and use the command mentioned in step 5.

I hope this post will help you to learn and debug the latch timeout issues.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Related posts: Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, SQL General, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , , , , | 22 Comments »

Database Mail errors in SQL Server (Troubleshooting steps)

Posted by Karthick P.K on August 25, 2012

Troubleshooting Database Mail issues in SQL Server

 

 

Use the Database Mail Configuration Wizard, change the Logging Level to Verbose and send a test mail to investigate the point of failure.

 

Right click database mail –View database mail log to see error or we can SELECT * FROM msdb.dbo.sysmail_event_log ;

 

Check the sent_Status column in the sysmail_allitems table. The four values are sent, unsent, retrying and failed.

If the status is sent and the recipients  hasn’t received the email yet, that the Database Mail external program successfully delivered the e-mail message to the SMTP server but it failed to deliver the message to the final recipient. At this point, the SMTP needs to be troubleshooted (perhaps engaged your Exchange or Mail server team)

 

If the status is unsent or retrying, it means that the Database Mail has not yet processed the e-mail message or is in the process of retrying after a failed attempt. This could be due to network conditions, volume of messages, SMTP server issues, etc. If the problem persists, use another profile or another mail host database.

 

If the status is failed, it means that the Database Mail was unable to deliver the message to the SMTP server. Check the sysmail_log table and the destination address. Also be sure that there are no Network or SMTP issues.

 

Send a test email outside SQL Server using below script or Other mail clients and check if the  recipients   are receiving mails. If they do not receive problem is outside SQL Server. Engage Exchange or other mail server teams to identify why we are not able to send emails from below script or Office outlook or Other mail clients.

Set objMessage = CreateObject("CDO.Message")
 objMessage.Subject = "Hello"
 objMessage.From = """SENDER NAME""<e-mail ID>"
 objMessage.To = "To address@mssqlwiki.com"
 objMessage.HTMLBody = "<h1><font face=arial>Hello,<br>How are you?."
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "smtphost.dns.Mailserver.com"
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 2
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpusessl") = False
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpconnectiontimeout") = 60
objMessage.Configuration.Fields.Update
objMessage.Send

If the mail has successfully reached  to recipients from above script problem is with in SQL Server mail configuration. 
Verify the following

 

1.Verify if  Service Broker is enabled (select is_broker_enabled from sys.databases where name=‘MSDB’ (0 – disabled, 1- enabled).

To enable service broker on your database run the following query: ALTER DATABASE MSDB SET ENABLE_BROKER

Note: You will be required to have exclusive access to the database while running this statement.  If you do not you will get the following error message: 
Msg 5061, Level 16, State 1, Line 1. ALTER DATABASE failed because a lock could not be placed on database MSDB. Try again later.

                               Msg 5069, Level 16, State 1, Line 1

ALTER DATABASE statement failed.

You will have to stop SQL Server agent to enable broker on MSDB

2.Check if Database mail stored procedures are  enabled (Surface Area Configuration >> “Surface Area Configuration for Features” >> Under MSSQLSERVER, expand Database Engine, and then click Database Mail. >> Ensure that Enable Database Mail stored procedures is selected, and then click Apply).

 

3.Check if the user is part of DatabaseMailUserRole.

 

4.Check what parameters and values are used in configuration by running

   exec msdb..sysmail_help_configure_sp

   A list of default values are given in BOL, topic: “sysmail_help_configure_sp (Transact-SQL)”. To modify a parameter or value you can use the following stored procedure

   exec msdb..sysmail_configure_sp ‘parameter_name’, ‘parameter_value’

 Check if ReadFromConfigurationFile is enabled if yes check if the DatabaseMail90.exe.config file (The default path is < drive >\Program Files\Microsoft SQL   Server\MSSQL.1\MSSQL\Binn) and has proper parameters.

 

5.Verify that the Database Mail executable is located in the correct directory – e.g. C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn

 

6.Verify that the service account for SQL Server has permission to run the executable, DatabaseMail90.exe, which requires network access to the SMTP servers specified in Database Mail accounts. Therefore, the service account for SQL Server must have permission to access the network, and the SMTP servers must allow connections from the computer that runs SQL Server.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Configuration, Database mail, SQL General | Tagged: , | 5 Comments »

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

Posted by Karthick P.K on August 21, 2012

Do you see below errors in SQL error along with dumps and stuck?

Non-yielding IOCP Listener

* BEGIN STACK DUMP:
*   05/06/12 03:54:59 spid 0
* Non-yielding IOCP Listener

Non-yielding Scheduler
* BEGIN STACK DUMP:
*   04/16/12 10:09:58 spid 6256
* Non-yielding Scheduler

Non-yielding Resource Monitor

* BEGIN STACK DUMP

*   01/22/09 19:11:16 spid 0

* Non-yielding Resource Monitor

External dump process returned no errors.
Date Time Server Process 0:0:0 (0x31e8) Worker 0x000000016F41d140 appears to be non-yielding on Scheduler 4. Thread creation time: 12010668087858. Approx Thread CPU Used: kernel 2 ms, user 60516 ms. Process Utilization 11%. System Idle 83%. Interval: 71227 ms.

Refer “How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps”  for analyzing the Non-yielding Scheduler, Non-yielding IOCP Listener and Non-yielding Resource Monitor Dumps.

If you are interested in just finding a quick resolution follow the below steps to get the Non-Yield stack from the dump and check if it is matching with any existing known issues in SQL Server. 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg

step 2:

Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 3:

on command window type    
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 4:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 5:

Type  .cxr sqlservr!g_copiedStackInfo+0X20   for SQL Server2005 and SQL Server2008/2008R2  (or)   .cxr sqlmin!g_copiedStackInfo+0X20  for SQL Server2012.

Type kc 100 and look at the stack to see if it matches with the stack of any of known issues in SQL Server listed below.

If kc 100 doesn’t display any stack and throws “WARNING: Frame IP not in any known module. Following frames may be wrong”  type .cxr to reset to default scope and try  .cxr sqlservr!g_copiedStackInfo+0X00c (In 32-Bit (X86)  SQL server valid offset for context is 0X00c Look at This blog to see how we identified the offset)

Note:If your stack doesn’t match with any of the stack  listed below then paste the stack in comments session of this blog (or) In This face book group.We will try to find the cause for you. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

 

Stack 1

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!ExecuteSql

sqlservr!CSpecProc::ExecuteSpecial

sqlservr!CXProc::Execute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB :2344600:FIX: "Non-yielding Scheduler" error may occur when you use the CONTAINSTABLE function together with many OR and AND predicates in SQL Server 2008 or in SQL Server 2008 R2

Stack 2

sqlservr!TMatchPattern

sqlservr!FMatchStrTxt

sqlservr!I8CharindexStrBhI8

sqlservr!CEs::GeneralEval4

sqlservr!CXStmtCond::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands 0x12a

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

2633357 FIX: "Non-yielding Scheduler" error might occur when you run a query that uses the CHARINDEX function in SQL Server 2008 R2

Stack 3

sqlservr!CItvlVal::Copy

sqlservr!CConstraintItvl::PcnstrItvlUnion

sqlservr!CConstraintProp::FBuildItvlFromOr

sqlservr!CConstraintProp::FBuildItvlFromPexpr

sqlservr!CConstraintProp::FAndItvlConstraint

sqlservr!CConstraintProp::AndNewConstraint

sqlservr!CConstraintProp::PcnstrDeriveSelect

sqlservr!CLogOp_Select::PcnstrDerive

sqlservr!CLogOpArg::PcnstrDeriveHandler

sqlservr!CLogOpArg::DeriveGroupProperties

sqlservr!COpArg::DeriveNormalizedGroupProperties

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!COptExpr::DeriveGroupProperties 0xc6

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

KB: 982376 FIX: A non-yielding scheduler error or an error 8623 occurs when you run a query that contains a large IN clause in SQL Server 2005,SQL Server 2008, or SQL Server 2008 R2

Stack 4

sqlservr!COptExpr::AdjustParallelPlan

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtSelect::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB: 943060 FIX: A query that has many outer joins takes a long time to compile in SQL Server 2005

Stack 5

sqlservr!CXid::GetBlockingTask

sqlservr!SNode::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchAndResolve

sqlservr!DeadlockMonitor::SearchTaskAndResolve

sqlservr!DeadlockMonitor::WorkLoop

sqlservr!lockMonitor

sqlservr!lockMonitorThread

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

956854 Cumulative update package 10 for SQL Server 2005 Service Pack 2

Stack 6

ntdll!ZwQueryAttributesFile

ntdll!RtlDoesFileExists_UstrEx

ntdll!LdrpSearchPath

ntdll!LdrpCheckForLoadedDll

ntdll!LdrpLoadDll

ntdll!LdrLoadDll

kernel32!LoadLibraryExW

mswsock!SockLoadHelperDll

mswsock!SockGetTdiName

mswsock!SockSocket

mswsock!WSPSocket

ws2_32!WSASocketW

ws2_32!WSASocketA

sqlservr!CreateSocket

sqlservr!AcceptObject::AsyncAccept

sqlservr!Tcp::AcceptDone

sqlservr!SNIAcceptDoneWithReturnCode

sqlservr!SNIAcceptDoneWrapper

sqlservr!SNIAcceptDoneRouter

sqlservr!SOS_Node::ListenOnIOCompletionPort

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 2711549 FIX: An error message is logged when you start SQL Server 2008 R2 or when a client sends a request to SQL Server 2008 R2

Stack 7

ntdll!ZwOpenKey

advapi32!LocalBaseRegOpenKey

advapi32!RegOpenKeyExW

sqlservr!COledbConnect::GetProviderOptions

sqlservr!COledbConnect::SetClsidFromProvider

sqlservr!COledbConnect::Init

sqlservr!CStmtExecProc::XretRemoteExec

sqlservr!CRemoteProcExecLevel::Execute

sqlservr!CStmtExecProc::XretWrapRemoteExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExec::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

KB2468047 FIX: Error code 17883 or "Non-yielding Scheduler" error may occur when you use the OPENQUERY function on SQL Server 2005

Stack 8

ntdll!ZwQueryVirtualMemory

psapi!QueryWorkingSetEx

sqlservr!BPool::Shrink

sqlservr!BPool::ReleaseAwayBufs

sqlservr!BPool::LazyWriter

sqlservr!lazywriter

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

967908 Cumulative update package 13 for SQL Server 2005 Service Pack 2 or 970279 Cumulative update package 4 for SQL Server 2005 Service Pack 3

Stack 9

sqlservr!LatchBase::ReleaseInternal

sqlservr!XVB::GetRecord

sqlservr!RowsetVersionScan::GetData

sqlservr!CQScanRowsetNew::GetRowWithPrefetch

sqlservr!CQScanRowsetNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::InitForExecute

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CExecuteStatement::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecStr::XretExecStrExecute

sqlservr!CXStmtExecStr::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB : 949595 FIX: Error message when you run a query that uses a join condition in SQL Server 2005: "Non-yielding Scheduler"

Stack 10

sqlservr!SQLServerLogIter::LookupScanCache

sqlservr!SQLServerLogIterForward::GetNextBlock

sqlservr!SQLServerLogIterForward::GetNext

sqlservr!LsMgr::GetEndOfLog

sqlservr!LsMgr::ProcessInternalRollForward

sqlservr!LsWorkRequest::Execute

sqlservr!LsWorker::ThreadRoutine

sqlservr!LsWorker::ThreadRoutine

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 970044 FIX: Error message when you try to break database mirroring between two servers that are running SQL Server 2008: "Non-yielding Scheduler"

Stack 11

sqlservr!CLinkedMap

sqlservr!CCheckReadersAndWriters::Release

sqlservr!CMainIlb::~CMainIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 12

ntdll!ZwFreeVirtualMemory

KERNELBASE!VirtualFree

sqlservr!MemoryNode::VirtualFree

sqlservr!ReservedMemBlock::FreeMemory

sqlservr!MultiPageAllocator::FreePagesInternal

sqlservr!MultiPageAllocator::FreePages

sqlservr!MemoryNode::FreePagesInternal

sqlservr!MemoryClerkInternal::FreePagesInline

sqlservr!CVarPageMgr::Release

sqlservr!CMemObj::Free

sqlservr!CMemThread<CMemObj>::Free

sqlservr!LockBytesSS::~LockBytesSS

sqlservr!LockBytesHolder::`scalar deleting destructor’

sqlservr!LockBytesHolder::DestroyCallback

sqlservr!CacheLbss

sqlservr!LockBytesSS::Release

sqlservr!CQueryIlb::~CQueryIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 13

sqlservr!CompareStringWEnglishNoCase

sqlservr!CTypeInfo::ICompW

sqlservr!CDefaultCollation::ICompW

sqlservr!CDependElem::ICompare

sqlservr!CDependList::Find

sqlservr!CDependList::Insert

sqlservr!CDependList::Concat

sqlservr!CDependList::CollectDependencies

sqlservr!FillSysdepends

sqlservr!CProchdr::CreateProc

sqlservr!CSQLSource::PerformPphFakeExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2306162 FIX: Poor performance and some occasional non-yielding scheduler errors occur when you create a complex view that references a large amount of nested views or tables in SQL Server 2008 or in SQL Server 2008 R2

Stack 14

sqlservr!BaseSharedHoBt::GetHoBtId

sqlservr!HoBtFactory::GetDeferredDropCacheHobt

sqlservr!DropDeferredWorkTables

sqlservr!GhostRecordCleanupTask

sqlservr!CGhostCleanupTask::ProcessTskPkt

sqlservr!TaskReqPktTimer::ExecuteTask

sqlservr!OnDemandTaskContext::ProcessTskPkt

sqlservr!SystemTaskContext::ExecuteFunc

sqlservr!SystemTaskEntryPoint

sqlservr!OnDemandTaskContext::FuncEntryPoint

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SchedulerManager::FiberEntryPoint

kernel32!BaseFiberStart

kernel32!RtlCompareMemoryStub

KB 2505256 FIX: Poor performance when worktables that are marked for deferred drop are cleaned up in SQL Server 2008 R2

Stack 15

ntdll!ZwReadFile

kernel32!ReadFile

sqlservr!DiskReadAsync

sqlservr!FCB::AsyncRead

sqlservr!BackupIoRequest::StartDatabaseRead

sqlservr!BackupCopyMachine::CopyFileToBackupSet0

sqlservr!BackupCopyMachine::CopyFileToBackupSet

KB 960543 FIX: SQL Server 2005 or SQL Server 2008 may stop responding when you are performing a backup

Stack 16

sqlservr!Worker::ProfilingCPUTicks::ProfilingCpuTicksCallback

sqlservr!SOS_Scheduler::TaskTransition

sqlservr!SOS_Scheduler::Switch

sqlservr!SOS_Scheduler::SuspendNonPreemptive

sqlservr!SOS_Scheduler::Suspend

sqlservr!SOS_Task::Sleep

sqlservr!BTreeMgr::Seek

sqlservr!BTreeMgr::GetHPageIdWithKey

sqlservr!IndexPageManager::GetPageWithKey

sqlservr!GetRowForKeyValue

sqlservr!IndexRowScanner::EstablishInitialKeyOrderPosition

sqlservr!IndexDataSetSession::GetNextRowValuesInternal

sqlservr!RowsetNewSS::GetNextRows

sqlservr!CMEDScan::FGetRow

sqlservr!CMEDCatalogOwner::GetOwnerAliasIdFromSid

sqlservr!CMEDCatalogOwner::LookupPrimaryIdInCatalog

sqlservr!CMEDCacheEntryFactory::GetProxiedCacheEntryByAltKey

sqlservr!CMEDCatalogOwner::GetProxyOwnerBySID

sqlservr!CMEDProxyDatabase::GetOwnerBySID

sqlservr!GetDefaultSchemaIdCrossDb

sqlservr!GetCtxtSchemaId

sqlservr!CMEDAccess::GetMultiNameObject

sqlservr!CRangeObject::CImplName::FSameObject

sqlservr!CRangeObject::FCheckImplNames

sqlservr!CRangeObject::XretPostSchemaChecks

sqlservr!CRangeObject::XretSchemaChanged

sqlservr!CRangeTable::XretSchemaChanged

sqlservr!CEnvCollection::XretSchemaChanged

sqlservr!CXStmtQuery::XretSchemaChanged

sqlservr!CXStmtSelect::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

KB 2699013 FIX: SQL Server 2008 R2 or SQL Server 2008 stops responding and a "Non-yielding Scheduler" error is logged

Stack 17

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

KB 967169 FIX: When you run an UPDATE statement against a table that has a FOR UPDATE trigger that joins the DELETED and INSERTED tables, the query takes a long time to finish

Stack 18

msvcr80!memcpy

BackupString::vswcatf

BackupString::swcatf

BackupHistory::GenerateBackupDetails

sqlservr!BackupHistory::GenerateBackupSet

KB 917971 FIX: You may receive more than 100,000 page faults when you try to back up a SQL Server 2005 database that contains hundreds of files and file groups.

Stack 20

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

Stack 21

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

 

Related blogs: 

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

SQL Server Latch & Debugging latch time out

How to Analyze "Deadlocked Schedulers" Dumps

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group to get answers for all your SQL Server related questions.

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Debugging, Performance, SQL General, SQL Server Tools | Tagged: , , , , , , , , , , , , , , , , , , , , , , , , , , , | 161 Comments »

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

Posted by Karthick P.K on August 17, 2012

Note: If you are interested only in finding a quick resolution for  Non-Yielding scheduler or Non-yielding IOCP Listener dumps or  Non-yielding  resource monitor Jump to THIS LINK. Continue reading this article if you like to learn How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps

How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps?

This blog is targeted towards analyzing Non-Yielding scheduler dumps and not targeted on explaining how Non-Yield Detection works please read http://technet.microsoft.com/en-us/library/cc917684.aspx to understand how the Non-Yield Detection works but let us recollect few key points before we get in to analysis.

 

1.   SQL Server has its own logical schedulers to schedule the SQL Server workers.

2.   The scheduler is called the User Mode Scheduler (UMS) in SQL Server 2000 and the SQL Server Operating System (SOS) Scheduler in SQL Server 2005

3.   Logical scheduler makes the worker non-preemptive to the database engine. The worker owns the scheduler until it yields to another worker on the same scheduler.

 

What if the threads which owns the scheduler executes for long time without yielding (or) forever and does not yield to give quantum for the other threads waiting in the scheduler?

Answer: Other threads would not get CPU cycles and starve the SQL Server performance.

 

What if the thread is not able to finish its work with in quantum –(4 Milliseconds) for example large for loop?

SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) is expected to yield back to SQL Server scheduler when it exceeds 4ms and rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum.

 

What if the thread did not yield after 4 Milliseconds?

SQL Server has its scheduler monitor to track this. SchedulerMonitor algorithm is to check non-Yield condition every 5 seconds during which the basic check (Check if the thread is executing for >4Ms) is done . When the basic check evaluates to true, tracking of the worker begins and if the thread doesn’t yield beyond 10seconds (Nonyield threshold) after the tracking begins then threshold check becomes true. So there is approximately 15 seconds between the time of the last yield on the scheduler and the time that the threshold check becomes true and tracking continues.

 

A dump is taken when an specific nonyield situation has reached 60 seconds in total duration. Once a 17883 mini-dump is captured, no further 17883 mini-dumps are captured until trace flag -T1262 is enabled or the SQL Server process is restarted. However, 17883 error message reporting continues, regardless of the mini-dump capture.  Also when –T1262 is enabled mini-dump is captured when the Non-Yield threshold check becomes true (15 seconds)

and at subsequent 60-second intervals for the same nonyield occurrence. A new nonyielding occurrence causes dump captures to occur again.

When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

Non-yielding IOCP Listener

An identical algorithm is used to detect non-yielding I/O completion routines, counting completed I/O completion routines instead of number of yields. Scheduler Monitor takes a dump when it notices the IOCP has not moved for 10 seconds. Analyzing Non-yielding IOCP Listener is also same as analyzing non-yielding scheduler dump

 

 

Let us step in to analysis of  non-yielding scheduler dump which I got in SQL Server 2012

 

Sample 1

When a non-yielding scheduler dump is generated following error message is logged in SQL Error log and SQLDump000n.mdmp is generated in log folder.

 

{

    * *******************************************************************************

    *

    * BEGIN STACK DUMP:

    *   04/16/12 10:09:58 spid 6256

    *

    * Non-yielding Scheduler

    *

     * *******************************************************************************

Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0. Thread creation time: 12979065797278. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 70110 ms.

}

 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Microsoft (R) Windows Debugger Version 6.11.0001.404 X86

Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Users\karthick \Desktop\Karthick\SQLDump0009.mdmp]

User Mini Dump File: Only registers, stack and portions of memory are available

 

Comment: ‘Stack Trace’

Comment: ‘Non-yielding Scheduler’  èType of the dump

Symbol search path is: *** Invalid ***

 

 

Executable search path is:

Windows 7 Version 7601 (Service Pack 1) MP (24 procs) Free x64

Product: Server, suite: Enterprise TerminalServer SingleUserTS  à Windows version and system information

Machine Name:

Debug session time: Mon Apr 16 09:09:59.000 2012 (GMT-7)

System Uptime: 9 days 15:57:03.155

Process Uptime: 0 days 0:06:48.000

……………………………………………………….

……………………………..

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

0:146> lmvm sqlservr                                               

start             end                 module name

00000000`ffad0000 00000000`ffb0e000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21553ADC31784A4D933974A386EE2E052\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.S1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Apr 06 08:19:38 2012 (4F7F098A)

    CheckSum:         00036498

    ImageSize:        0003E000

    File version:     2011.110.2316.0                                                         

    Product version:  11.0.2316.0 èSQL Server Version

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5:

Use !findstack command to find scheduler monitor thread (sqlservr!SQL_SOSNonYieldSchedulerCallback )

 

 

0:146> !findstack sqlservr!SQL_SOSNonYieldSchedulerCallback

Thread 006, 1 frame(s) match  è Thread ID of scheduler monitor.

        * 07 00000000336be420 000007fee36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

 

Step 6:

Switch to scheduler monitor thread using ~[threadID]s command

 

0:146> ~[006]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

 

Step 7:

Use kC or kP command to look at the stack on scheduler monitor thread.

 

0:006> kP                        Child-SP          RetAddr           Call Site

00000000`3369c218 000007fe`fcd210ac ntdll!NtWaitForSingleObject+0xa

00000000`3369c220 00000000`ffaeecce KERNELBASE!WaitForSingleObjectEx+0x79

00000000`3369c2c0 00000000`ffaef1a4 sqlservr!CDmpDump::DumpInternal+0x20e

00000000`3369c360 000007fe`dbe50794 sqlservr!CDmpDump::Dump+0x24

00000000`3369c3a0 000007fe`dbe511e6 sqllang!SQLDumperLibraryInvoke+0x2e4

00000000`3369c640 000007fe`dbe16ddb sqllang!CImageHelper::DoMiniDump+0x426

00000000`3369c830 00000000`ffae307f sqllang!stackTrace+0xbdb

00000000`3369e270 000007fe`e36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

00000000`336be430 000007fe`e36866da sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0x375

00000000`336bebf0 000007fe`e364b53f sqldk!SchedulerMonitor::CheckScheduler+0x307

00000000`336bed60 000007fe`e364aa8f sqldk!SchedulerMonitor::CheckSchedulers+0x211

00000000`336bf1f0 000007fe`e371c779 sqldk!SchedulerMonitor::Run+0xfb

00000000`336bf320 000007fe`e3642f10 sqldk!SchedulerMonitor::EntryPoint+0x9

00000000`336bf350 000007fe`e3642d04 sqldk!SOS_Task::Param::Execute+0x21e

00000000`336bf950 000007fe`e36429e6 sqldk!SOS_Scheduler::RunTask+0xa8

00000000`336bf9c0 000007fe`e365a29f sqldk!SOS_Scheduler::ProcessTasks+0x299

00000000`336bfa40 000007fe`e365a3b0 sqldk!SchedulerManager::WorkerEntryPoint+0x261

00000000`336bfae0 000007fe`e3659fcf sqldk!SystemThread::RunWorker+0x8f

00000000`336bfb10 000007fe`e365aaf8 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8

00000000`336bfbc0 00000000`76ad652d sqldk!SchedulerManager::ThreadEntryPoint+0x236

 

Step 8:

Switch to the thread which is reported as Non-Yielding in SQL Server error log using ~~[ThreadID]s command.

 

Recollect the below error in SQL error log.  Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0.

 

0:006> ~~[0x1cb0]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

Step 9:

Look at the current stack of Non-yielding thread. using kc command

 

0:146> kc 10

Call Site

ntdll!NtWaitForSingleObject

KERNELBASE!WaitForSingleObjectEx

sqldk!SOS_Scheduler::SwitchContext

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqldk!WorkDispatcher::DequeueTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

 

 

 

 

Recollect the information which we discussed earlier in this blog. When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

 

 

Look at the above stack it cannot be Non-Yield thread because we see SwitchNonPreemptive  and SwitchContex  in the thread. 

{

SwitchPreemptive or SuspendNonPreemptive forces another worker to become owner of the scheduler. It does this by making the head of the runnable list the new owner and removing the current worker from logical scheduler control. The worker transitions ownership and is removed from SQL scheduler control until the external activity is complete. When the external activity is complete, the worker returns to the end of the runnable list by calling SwitchNonPreemptive.

}

 

Step 10:

Search for the copied stack structure using X commad

 

0:146> X sqlmin!*copiedStack*    

000007fe`df11bfe0 sqlmin!g_copiedStackInfo = <no type information>               

 

It is sqlmin!g_copiedStackInfo in this dump because this is dump is from SQL Server2012. In earlier versions of SQL Server it is sqlservr!g_copiedStackInfo

 

Step 11:

We know copied CONTEXT is stored in g_CopiedStackInfo how to find the valid offset in this structure ? If the CONTEXT is valid Rip,Rbp and RSP registers has to be valid if the dump is from X64 system and Eip,Ebp and Esp has to be valid if it is X86 system.

 

 

Let us do dd on sqlmin!g_copiedStackInfo  (remember it is sqlservr!g_copiedStackInfo in SQL2008/2005/2000)

 

0:146> dd sqlmin!g_copiedStackInfo     

000007fe`df11bfe0  00000001 00000000 3369e2e0 00000000

000007fe`df11bff0  0000a998 00000000 00000000 00000000

000007fe`df11c000  00000000 00000000 00000000 00000000

000007fe`df11c010  00000000 00000000 00000000 00000000

000007fe`df11c020  00000000 00000000 00000000 00000000

000007fe`df11c030  0010000b 00001f80 00000033 00000000

000007fe`df11c040  002b0000 00000246 00000000 00000000

000007fe`df11c050  00000000 00000000 00000000 00000000

 

Step 12:

Let us dump each address with context and verify if Rip,Rbp and RSP registers are valid. This dump is from 64-bit SQL Server so we are using Rip,Rbp and RSP registers. If the dump is from x86 system use Eip,Ebp and Esp .

 

0:146> dt 000007fe`df11bfe0 CONTEXT Rip Rsp Rbp     èType cast  000007fe`df11bfe0  with CONTEXT  .  RsP,Rbp and Rip is invalid so this is not valid offset.

ole32!CONTEXT

   +0x098 Rsp : 2

   +0x0a0 Rbp : 0x5a4d

   +0x0f8 Rip : 0

0:146> dt 000007fe`df11bff0 CONTEXT Rip Rsp Rbp      è Type cast 000007fe`df11bff0  with CONTEXT  . RIP is invalid

ole32!CONTEXT

   +0x098 Rsp : 0x72120000

   +0x0a0 Rbp : 0x3369e3cc

   +0x0f8 Rip : 0xf2

0:146> dt 000007fe`df11c000 CONTEXT Rip Rsp Rbp   è Type cast 000007fe`df11c000 with CONTEXT  . RIP ,RSP  and Rbp is valid. So this should be a valid context.Let us switch to this context and verify

ole32!CONTEXT

   +0x098 Rsp : 0x3369e2e0

   +0x0a0 Rbp : 0x3369e498

   +0x0f8 Rip : 0x76d3139a

 

Now we know 000007fe`df11c000 is valid context.  So 000007fe`df11c000 -sqlmin!g_copiedStackInfo =0x20 hence  we can use .cxr sqlmin!g_copiedStackInfo+0X20 directly to switch to the context of copied stack.

 

Step 13:

 

Switch the context of copied stack using .cxr 000007fe`df11c000  or .cxr sqlmin!g_copiedStackInfo+0X20

 

0:146> .cxr 000007fe`df11c000

rax=0000000000000002 rbx=000000003369e3cc rcx=0000000000005a4d

rdx=0000000072120000 rsi=000000000000026c rdi=0000000000000000

rip=0000000076d3139a rsp=000000003369e2e0 rbp=000000003369e498

r8=00000000000000b0  r9=0000000084a85310 r10=0000000000000000

r11=0000000000000000 r12=0000000000000000 r13=0000000000000004

r14=00000000000000f2 r15=0000000000000001

iopl=0         nv up ei pl zr na po nc

cs=0033  ss=002b  ds=0000  es=0000  fs=0000  gs=0000             efl=00000246

ntdll!NtWriteFile+0xa:

00000000`76d3139a c3              ret

 

Step 14:

 

Dump the stack of copied context using kP or kc (kc displays clean stack trace. each display line includes only the module name and the function name)

 

 

0:146> Kc

  *** Stack trace for last set context – .thread/.cxr resets it

Call Site

ntdll!NtWriteFile

KERNELBASE!WriteFile

kernel32!WriteFileImplementation

sqllang!CErrorReportingManager::WriteToErrLog

sqllang!CErrorReportingManager::SendErrorToErrLog

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqllang!ReportLoginFailure

sqllang!FRedoLogin

sqllang!login

sqllang!process_login_finish

sqllang!process_commands

sqldk!SOS_Task::Param::Execute

sqldk!SOS_Scheduler::RunTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

Now compare the current stack and the copied stack to see if the thread has progressed after No-Yield condition. Stack look completely different So the Non-Yield thread has progressed and completed .It is doing new work now. Also to understand why the thread was Non-Yielding look at the copied stack and not the current unless both the stacks are same. 

 

 

Current thread stack which we dumped using the thread ID in SQL Errorlog.

0:146> kc                                                                                    

Copied thread stack which SQL Server copied to global structure before generating the dump.

0:146> Kc 10

ntdll!NtWaitForSingleObject

KERNELBASE!WriteFile

KERNELBASE!WaitForSingleObjectEx

kernel32!WriteFileImplementation

sqldk!SOS_Scheduler::SwitchContext

sqllang!CErrorReportingManager::WriteToErrLog

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqllang!CErrorReportingManager::SendErrorToErrLog

sqldk!WorkDispatcher::DequeueTask

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqldk!SOS_Scheduler::ProcessTasks

sqllang!ReportLoginFailure

sqldk!SchedulerManager::WorkerEntryPoint

sqllang!FRedoLogin

sqldk!SystemThread::RunWorker

sqllang!login

sqldk!SystemThreadDispatcher::ProcessWorker

sqllang!process_login_finish

sqldk!SchedulerManager::ThreadEntryPoint

sqllang!process_commands

kernel32!BaseThreadInitThunk

sqldk!SOS_Task::Param::Execute

ntdll!RtlUserThreadStart

sqldk!SOS_Scheduler::RunTask

 

sqldk!SOS_Scheduler::ProcessTasks

 

sqldk!SchedulerManager::WorkerEntryPoint

 

sqldk!SystemThread::RunWorker

 

sqldk!SystemThreadDispatcher::ProcessWorker

 

sqldk!SchedulerManager::ThreadEntryPoint

 

Now let us read the copied stack and understand what would have caused a Non-Yield condition (read from bottom to top)

 

ntdll!NtWriteFile  -> WriteFile function is at top of the stack and did not complete in expected time.          
KERNELBASE!WriteFile       
kernel32!WriteFileImplementation 
sqllang!CErrorReportingManager::WriteToErrLog  ->Write the error to errorlog
sqllang!CErrorReportingManager::SendErrorToErrLog  ->Send the error to SQL Server errorlog
sqllang!CErrorReportingManager::CwchFormatAndPrint  ->format the error
sqllang!ReportLoginFailure  ->Login failed
sqllang!FRedoLogin        
sqllang!login ->Login task is processed

 

 

From the above stack we are able to understand SQL Server is writing login failed information to SQL Error log (Synchronously) and the writefile function has taken long time and did not return.So there is Non-Yield scheduler dump.

 

When will writefile operation take long time?

When there is Disk bottleneck. So the obvious solution for this issue is to  fix the performance of the disk.

Similarly there could be numerous other reasons for Non-Yield condition so look at the stack of your Non-Yield scheduler dump using the method above and make out what could have caused the Non-Yield condition.

Also refer THIS LINK to check if your stack matches with any of the known issues in SQL Server.

To Be continued…………………………

Related posts:

  • How to Analyze "Deadlocked Schedulers" Dumps?
  • Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes
  • SQL Server generated Access Violation dumps while accessing oracle linked servers.
  • SQL Server Latch & Debugging latch time out

    If you liked this post, do like us on Face Book at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

  • Posted in Configuration, Debugging, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , , | 28 Comments »

    SQL Server performance degraded in 32-Bit SQL Server after adding additional RAM.

    Posted by Karthick P.K on May 18, 2012

     

    Do you know that adding additional RAM can affect the performance of SQL Server Sometimes?

    I am not going to write how Optimizer can some times choose suboptimal plans when we have large amount of memory on the system but We will see how the memory which can be used by other memory clerks (aks: stolen memory) can shrink when we have large physical memory and AWE enabled.

     

    If you notice  performance of 32-Bit SQL Server degraded after you added additional RAM or if you see SQL Server memory errors like ones below after adding RAM then it could be because of Large BUF structures which reduced the size of Bpool.

     

    Errors:

    SQL Server 2005/2008

     

    Buffer Pool errors:

        BPool::Map: no remappable address found.

     

    Either BPool or MemToLeave errors:

        Error: 17803 “Insufficient memory available..”

        Buffer Distribution:  Stolen=7901 Free=0 Procedures=1 Inram=201842 Dirty=0 Kept=572…

     

    Extract from SQL Server memory design

    {

    SQL Server "User address space" is broken into two regions: MemToLeave and Buffer Pool

     

    Size of MemToLeave (MTL) and Buffer Pool (BPool) is determined by SQL Server during start up as below.

     

    MTL (Memory to Leave)= (Stack size * max worker threads) + Additional space to load Dll’s.

     

    Stack size =512 KB per thread for 32 Bit SQL Server (904K under WOW)

     

    I.e. = (256 *512 KB) + 256MB =384MB

     

    Additional space to load Dll’s= 256 MB from SQLServer2000. This space is used to store COM objects, Extended stored procedure, Linked server in SQL Server process

     

    Note: Additional space to load Dll’s can be modified using -g startup parameter.

     

    on any machine with less than 4 processors the Maximum worker Thread’s is always 256 by default (unless we change the value using SP_configure)

     

    SQL Server Buffer Pool is minimum of “Physical RAM “ or “user mode memory(2GB or 3GB) – MTL-  BUF structures”

     

    BPool = Minimum (Physical memory, User address space – MTL) – BUF structures

     

    }

     

     

    When AWE is enabled in 32-Bit SQL Server M_pbuf (part of BUF structures) which is mentioned earlier is calculated and allocated for entire physical memory on the system . Regardless of “MAX Server Memory”   This is to adjust Max server memory without restarting SQL Server.

     

    SQL Server requires 8MB to create M_pbuf for every 1GB of RAM available on the server.

     

    Machine with 64 GB RAM can consume 64 (RAM) *8MB (M_pbuf for each GB) =512 MB just for the BUF array alone.

     

    So the amount of BPOOL available for SQL Server is adversely affected.

     

    Going back to the previous formula for BPOOL. Size of Bpool for 32-Bit SQL Server with AWE enabled and 64 GB of RAM would be.

     

    BPool = Minimum (Physical memory, User address space – MTL) – BUF structures

     

    BPool= Minimum (64GB, (2GB-384MB)) – BUF structures (512+ MB)

     

    Bpool would approximately become 1GB.  Since size BPOOL become very small we might end up with memory errors.

     

    Note:  In 32-Bit SQL Server Only data pages an index pages can be placed in AWE memory. So the memory available for other SQL Server memory objects is still limited to BPOOL and MTL.  

     

    How to resolve this issue?

    Remove few GB of RAM from server J if you can convince your management that removing RAM will improve performance.

    (Or)

    There is a startup trace flag TF 836 which you can use to indicate that BUF’s need to be allocated only for the configured max server memory setting. Enable this Trace Flag (836) and Reduce the “MAX Server Memory” of SQL Server.

    (Or)

    Enable /3GB. This will increase the Size of SQL Server BPOOL by 1GB providing relief to SQL Server BPOOL pressure.

    Note: When the physical RAM in the system exceeds 16 GB and the /3GB switch is used, the operating system will ignore the additional RAM until the /3GB switch is removed.

     

     

     

    Related posts

    Troubleshooting steps for all SQL Server Memory errors

    Trouble shooting working set trim “A significant part of SQL Server process memory has been paged out”

    SQL Server lock pages in memory should I use it?

    SQL Server memory leak

    What is new in SQL Server 2012 Memory

    How to set max server memory and min server memory

     

    If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Disclaimer
    The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

    Posted in Memory, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , | 6 Comments »

    Copy database wizard or replication setup might fail due to broken dependency

    Posted by Karthick P.K on May 4, 2012

    Copy database wizard would fail with below error while creating views or user defined functions if the dependency lists of objects is  broken.

    Error:

    failed with the following error: “Invalid object name ‘dbo. .”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.

    helpFile= helpContext=0 idofInterfaceWithError={8BDFE893-E9D8-4D23-9739-DA807BCDC2AC}

    StackTrace: at Microsoft.SqlServer.Management.Dts.DtsTransferProvider.ExecuteTransfer()

    at Microsoft.SqlServer.Management.Smo.Transfer.TransferData()

    at Microsoft.SqlServer.Dts.Tasks.TransferObjectsTask.TransferObjectsTask.TransferDatabasesUsingSMOTransfer()

    Replication setup would also fail while applying the scripts if the dependency lists of objects is broken.

    Error:

    The schema script ‘XXX_4.sch’ could not be propagated to the subscriber. (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147201001)

    Get help: http://help/MSSQL_REPL-2147201001

    Unable to replicate a view or function because the referenced objects or columns are not present on the Subscriber. (Source: MSSQL_REPL, Error number: MSSQL_REPL20164)

    Get help: http://help/MSSQL_REPL20164

    Invalid object name ‘. (Source: MSSQLServer, Error number: 208)

    Get help: http://help/208

    --Below script will fix the broken dependencies on all the objects
    ----------------------------------------------------------------------------
    --List of objects for which referenced objects are missing.
    --ex: View created on table XYZ and table XYZ is dropped
    ----------------------------------------------------------------------------
    SELECT OBJECT_NAME (referencing_id),referenced_database_name, referenced_schema_name, referenced_entity_name
    FROM sys.sql_expression_dependencies
    WHERE referenced_entity_name not in (select name from sysobjects)
    
    create table #t_excluded_modules (module_name sysname)
    go
    
    create table #t_modules_refreshed_in_end (module_name sysname)
    
    go
    
    ------------------------------------------------------------------------------
    --
    -- get the list of modules whose dependencies have to be refreshed
    --
    -- Comment:
    -- in the list we're not considering procedures or triggers because
    -- because they can be created in any order, which means they can be refreshed
    -- in any order
    --
    ------------------------------------------------------------------------------
    create table #t_user_views_or_tables (module_id int)
    insert into #t_user_views_or_tables(module_id)
    		select object_id from sys.objects where
    			type in ('V', 'FN', 'IF', 'TF')
    			and name not like 'MSMerge%'
    			and is_ms_shipped <> 1
    			and name not in (select * from #t_modules_refreshed_in_end)
    			and name not in (select * from #t_excluded_modules)
    
    insert into #t_user_views_or_tables(module_id)
    		select object_id from sys.objects where
    			name in (select * from #t_modules_refreshed_in_end)
    
    ----------------------------------------
    --
    -- get the dependency table
    -- |---------------------------------|
    -- |  referencing_id | referenced_id |
    -- | ----------------|---------------|
    -- |      XXX        |     XXX       |
    -- | ----------------|---------------|
    --
    ----------------------------------------
    Declare @module int, @message varchar(1000), @str nvarchar(1000)
    create table #t_dependency_table (referencing_id int, referenced_id int)
    DECLARE modules_cursor CURSOR FOR SELECT module_id FROM #t_user_views_or_tables
    open modules_cursor
    fetch next from modules_cursor into @module
    
    IF @@FETCH_STATUS <> 0
    	PRINT '            <<None>>	No module to refresh'
    
    while @@FETCH_STATUS = 0
    	begin
    		select @str = quotename(schema_name(objectproperty(@module, 'schemaid'))) + '.' + quotename(object_name(@module))
    		select @message = '            trying to refresh ' + @str
    		print @message
    		exec sys.sp_refreshsqlmodule @str
    		select @message = '            ' + @str + ' was refreshed'
    		print @message
    
    		insert into #t_dependency_table (referencing_id, referenced_id)
    			select distinct object_id as referencing_id, referenced_major_id as referenced_id
    				from sys.sql_dependencies
    				where object_id <> referenced_major_id      -- to avoid self recursion for functions
    					  and object_id = @module
    
    		fetch next from modules_cursor into @module
    	end
    
    close modules_cursor
    deallocate modules_cursor
    
    -------------------------------------------------------------------------------------------
    --
    -- get the the bottom of the dependency list i.e. independent modules
    -- i.e.
    -- get the list of referenced_ids in the dependency table which
    -- don't occur in the referencing_ids column
    --
    -- Comment:
    -- if there are circular dependencies then the few modules which form a circular dependency
    -- would be ignored in the independent modules list
    --
    -------------------------------------------------------------------------------------------
    create table #t_independent_modules (modules int)
    
    insert into #t_independent_modules (modules)
    	select #t_dependency_table.referenced_id from
    			#t_dependency_table left outer join #t_dependency_table t2
    			on #t_dependency_table.referenced_id = t2.referencing_id
    	where
    			t2.referencing_id is NULL
    
    -------------------------------------------------------------------------------------------
    --
    -- build the ordered list of dependencies starting with the independent modules
    -- in the beginning first few rows, ones dependent on it in the following rows and so on...
    --
    -- there can be tricky cases of dependencies such as
    -- V1 -> V2 -> V3
    --  |           ^
    --  +-----------+
    --
    -- in the above example the refresh order would be V3, V1, V2, V1.
    -- note that V1 is being refreshed twice, the last refresh of V1 after V2 is important.
    --
    -------------------------------------------------------------------------------------------
    create table #t_final_dependency_list (id_num int IDENTITY(1,1), modules int)
    
    while exists (select * from #t_independent_modules)
    	begin
    		-- append the set of independent modules into a list
    		insert into #t_final_dependency_list select * from #t_independent_modules
    
    		-- get the set of dependent modules
    		select distinct #t_dependency_table.referencing_id into #temp_table
    			from #t_dependency_table
    			where #t_dependency_table.referenced_id in (select * from #t_independent_modules)
    
    		-- clear up the list of independent modules
    		truncate table #t_independent_modules
    
    		-- the dependent modules now become the independent modules
    		insert into #t_independent_modules select * from #temp_table
    
    		-- delete the dependent modules list
    		drop table #temp_table
    	end
    
    -----------------------------------------------------------
    --
    -- refresh the modules once more but in the right order now
    --
    -----------------------------------------------------------
    declare modules_cursor_final cursor for
    	select modules from #t_final_dependency_list order by #t_final_dependency_list.id_num
    
    open modules_cursor_final
    fetch next from modules_cursor_final into @module
    
    IF @@FETCH_STATUS <> 0
    	PRINT '            <<None>>	No module to refresh'
    
    while @@FETCH_STATUS = 0
    	begin
    		if (select type from sys.objects where object_id = @module) in ('V', 'FN', 'IF', 'TF')
    			and (select is_schema_bound from sys.sql_modules where object_id = @module) = 0
    			begin
    				select @str = quotename(schema_name(objectproperty(@module, 'schemaid'))) + '.' + quotename(object_name(@module))
    				select @message = '            trying to finally, once more, refresh ' + @str
    				print @message
    				exec sys.sp_refreshsqlmodule @str
    				select @message = '            ' + @str + ' was finally refreshed once again'
    				print @message
    			end
    
    		fetch next from modules_cursor_final into @module
    	end
    
    close modules_cursor_final
    DEALLOCATE modules_cursor_final
    
    -----------
    --
    -- cleanup
    --
    -----------
    drop table #t_excluded_modules
    drop table #t_modules_refreshed_in_end
    drop table #t_user_views_or_tables
    drop table #t_dependency_table
    drop table #t_independent_modules
    drop table #t_final_dependency_list
    

    Posted in Copy database wizard, Replication, SQL General | Tagged: , , , , | 6 Comments »

    SQL Server Agent is taking long time to start

    Posted by Karthick P.K on April 19, 2012

     

    SQL Server Agent might take long time to start because of slow communications with Certificate Authorities.

     

    If you enable verbose logging for SQL Server agent (-v) and look at the SQL Server agent log you will notice that ‘ANALYSISQUERY’ subsystem has taken long time to start

    2012-02-15 15:42:42 – ? [124] Subsystem ‘QueueReader’ successfully loaded (maximum concurrency: 800)

    2012-02-15 15:47:08 – ? [124] Subsystem ‘ANALYSISQUERY’ successfully loaded (maximum concurrency: 800)

    2012-02-15 15:47:08 – ? [124] Subsystem ‘ANALYSISCOMMAND’ successfully loaded (maximum concurrency: 800)

     

    Also if you collect dumps during the SQLServer agent startup you will notice the stack like one below.

     

    ntdll!ZwWaitForSingleObject

    kernel32!WaitForSingleObjectEx

    cryptnet!CryptRetrieveObjectByUrlWithTimeout

    cryptnet!CryptRetrieveObjectByUrlW

    crypt32!ChainRetrieveObjectByUrlW

    crypt32!CCertChainEngine::RetrieveCrossCertUrl

    crypt32!CCertChainEngine::UpdateCrossCerts

    crypt32!CCertChainEngine::Resync

    crypt32!CCertChainEngine::CreateChainContextFromPathGraph

    crypt32!CCertChainEngine::GetChainContext

    crypt32!CertGetCertificateChain

    wintrust!_WalkChain

    wintrust!WintrustCertificateTrust

    wintrust!_VerifyTrust

    wintrust!WinVerifyTrust

    mscorsec!GetPublisher

    mscorwks!PEFile::CheckSecurity

    mscorwks!PEAssembly::DoLoadSignatureChecks

    mscorwks!PEAssembly::PEAssembly

    mscorwks!PEAssembly::DoOpenHMODULE

    mscorwks!PEAssembly::OpenHMODULE

    mscorwks!AppDomain::BindExplicitAssembly

    mscorwks!AppDomain::LoadExplicitAssembly

    mscorwks!ExecuteDLLForAttach

    mscorwks!ExecuteDLL

    mscorwks!CorDllMainForThunk

    mscoree!CorDllMainWorkerForThunk

    mscoree!VTableBootstrapThunkInitHelper

    mscoree!VTableBootstrapThunkInitHelperStub

    SQLAGENT!LoadSubsystem

    SQLAGENT!StartSubSystems

    SQLAGENT!DumpAndCheckServerVersion

    SQLAGENT!ServiceMain

    advapi32!ScSvcctrlThreadW

    kernel32!BaseThreadInitThunk

    ntdll!RtlUserThreadStart

     

    ANALYSISQUERY subsystem has assembly which has an Authenticode signature. When the CLR loads an assembly which has an Authenticode signature, it will always try to verify that signature.

    This verification can be quite time intensive, since it can require hitting the network several times to download up to date certificate revocation lists, and also to ensure that there is a full chain

    of valid certificates on the way to a trusted root.

    If you can’t get to the internet to authenticate signature or want to bypass the Authenticode signature you can try creating a sqlagent.exe.config file with the following xml in Binn directory. This bypasses the check

     

    Create a sqlagent.exe.config file with:
    <?xml version="1.0" encoding="utf-8"?>
    <configuration>
    <runtime>
    <generatePublisherEvidence enabled="false"/>
    </runtime>
    </configuration>

     

    Thanks

    Karthick P.K

    Posted in SQL General | Tagged: , , , | 1 Comment »

    SQL-Server resource fails to come online IS Alive check fails

    Posted by Karthick P.K on January 31, 2012

    SQL-Server resource fails to come online with below Error:

    [sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = 35; message = [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible.

    Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

     

    Resolution:

    Look at the version of (c:\windows\system32\sqsrvres.dll) and install the same version of SQL Server native client.

    Cause:

    When Higher version of SQL-Server is installed on a cluster in which lower version of SQL Server is already installed, the lower version SQL Server Resource DLL (c:\windows\system32\sqsrvres.dll) is upgraded to higher version and Higher resource DLL will be loaded by the resource monitor process to monitor Lower version as well.

    For example: The Denali SQL Server Resource uses SNAC 11.0 to connect to the SQL instance and because SNAC 11.0 can be used to connect to Shiloh, Yukon and Katmai as well this side by side configuration will work. However if Denali is uninstalled, the Denali SQL Server resource DLL is not downgraded to Katmai, Yukon or Shiloh version and hence care should be taken to not uninstall SNAC 11.0 otherwise Yukon or Shiloh instance cannot be brought online.

    Similarly When we install Yukon and Shiloh together, Yukon SQL Server Resource uses SNAC to connect to the SQL instance and because SNAC can be used to connect to Shiloh as well this side by side configuration will work. However if Yukon is uninstalled, the Yukon SQL Server resource DLL is not  downgraded to Shiloh version and hence care should be taken to not uninstall SNAC otherwise Shiloh instance cannot be brought online.

     

    If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Disclaimer
    The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

    Posted in Configuration, Connectivity, SQL General, SQL Server Cluster | Tagged: , , , , | 8 Comments »

    How to move the LOB data from one file group to other?

    Posted by Karthick P.K on January 17, 2012

     

    We do not have a direct way to move the LOB data from one file group to other. Using ALTER TABLE and/or CREATE INDEX to support moving LOB data is unavailable till current version of  SQL (SQL Server 2008).

     

    Only way to move the LOB data is to

    1. create new table in new file group

    2. Move the data from existing table to new table.

    3. Drop the existing table.

    4. Change the name of new table to Old table.

    Management studio has easy way to create script for all the above task.

    1. In management studio Right click the table –>Design –>change the file group in properties windows (Click View—> properties window if you do not see properties window)

    2. Generate Change Script.

    clip_image002

    3. Script similar to following script is generated.

    4. Copy the script and run in Query window.

     

    /* To prevent any potential data loss issues, you should review this script in detail before running it outside the context of the database designer.*/

    BEGIN TRANSACTION

    SET QUOTED_IDENTIFIER ON

    SET ARITHABORT ON

    SET NUMERIC_ROUNDABORT OFF

    SET CONCAT_NULL_YIELDS_NULL ON

    SET ANSI_NULLS ON

    SET ANSI_PADDING ON

    SET ANSI_WARNINGS ON

    COMMIT

    BEGIN TRANSACTION

    GO

    CREATE TABLE dbo.Tmp_BLOB_TABLE

                    (

                    BLOBName varchar(100) NULL,

                    BLOBData varbinary(MAX) NULL

                    )  ON [PRIMARY]

                    TEXTIMAGE_ON Lob2

    GO

    ALTER TABLE dbo.Tmp_BLOB_TABLE SET (LOCK_ESCALATION = TABLE)

    GO

    IF EXISTS(SELECT * FROM dbo.BLOB_TABLE)

                    EXEC(‘INSERT INTO dbo.Tmp_BLOB_TABLE (BLOBName, BLOBData)

                                    SELECT BLOBName, BLOBData FROM dbo.BLOB_TABLE WITH (HOLDLOCK TABLOCKX)’)

    GO

    DROP TABLE dbo.BLOB_TABLE

    GO

    EXECUTE sp_rename N’dbo.Tmp_BLOB_TABLE’, N’BLOB_TABLE’, ‘OBJECT’

    GO

    COMMIT

     

    Thanks

    Karthick P.K

    Posted in Space management, SQL General, SQL Server Tools | Tagged: | 34 Comments »

    Script to free cache

    Posted by Karthick P.K on December 6, 2010

     

    DBCC FREESYSTEMCACHE ( 'ALL' ) WITH MARK_IN_USE_FOR_REMOVAL

    GO

    DBCC FREESESSIONCACHE WITH NO_INFOMSGS

    GO

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    GO

    DBCC DROPCLEANBUFFERS

    GO

    Posted in Performance, SQL General, SQL Query | Tagged: , , , , , | 59 Comments »

    How to rebuild index and update statistics for all the tables in database.

    Posted by Karthick P.K on September 26, 2010

     
    EXEC sp_MSforeachtable 'UPDATE STATISTICS ? WITH FULLSCAN'   --  {can be run anytime}
    
    Exec sp_MSforeachtable "dbcc dbreindex('?')"      --- {Always run this on a off-peak hour on any SQL Server instance}
     
    

    Thanks

    Karthick

    Posted in Optimizer, Performance, SQL General, SQL Query | Tagged: , , , , | 3 Comments »

    How to check if local system is connected to a network and identify the type of network connection

    Posted by Karthick P.K on July 26, 2010

    #include <windows.h> 
    #include <iostream> 
    using namespace std;
    #pragma comment(lib, "Sensapi.lib")
    #include <Sensapi.h>
    
    void main()
    {
        
    bool a; 
    LPDWORD lpdwFlags;
    lpdwFlags = new DWORD;
    a=IsNetworkAlive( lpdwFlags);
    
            if(GetLastError()!=0)
            {
                cout<<"IsNetworkAlive failed:%d"<<GetLastError();
            }
            else if(GetLastError()==0 & (!a) )
            {
                cout<<"Network is not connected";
            }
            else if(GetLastError()==0 & (a))
            {
                cout<<"Network is connected.Type: "<< *lpdwFlags;  //1=Lan and 2=WAN
            }
    
    }

    Regards

    Karthick P.K

    Posted in Programming, SQL General | 2 Comments »

    “Value cannot be null” when i connect SQL Server from SSMS

    Posted by Karthick P.K on May 26, 2010

    I get this below when I connect to SQL Server using SSMS…… What should i do?

    Error

    Value cannot be null.
    Parameter name: viewInfo (Microsoft.SqlServer.Management.SqlStudio.Explorer)

    Resolution

    Right click SSMS   “run as administrator”   🙂

    If the “run as administrator doesn’t resolve the problem verify if %Temp% environment variable to set properly for the logged on widows account

    If %Temp% is not set properly in environment variables we might end up with error.

    If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups

    Thank you,

    Karthick P.K |Technical Lead | Microsoft SQL Server Support  |My Facebook Page |My Site| Blog space| Twitter

    Posted in SQL General, SQL Server Tools, SSMS | Tagged: , , , , , , | 14 Comments »

    How to get SQL Text and Query Plan for statements which are executing now

    Posted by Karthick P.K on February 1, 2010

     
    SELECT getdate() as "RunTime", st.text as batch,
    SUBSTRING(st.text,statement_start_offset / 2+1 , 
    ( (CASE WHEN a.statement_end_offset = -1 
    THEN (LEN(CONVERT(nvarchar(max),st.text)) * 2) 
    ELSE a.statement_end_offset END)  - a.statement_start_offset) / 2+1)  as current_statement
    ,qp.query_plan, a.* FROM sys.dm_exec_requests a CROSS APPLY sys.dm_exec_sql_text(a.sql_handle) as st CROSS APPLY sys.dm_exec_query_plan(a.plan_handle) as qp 
    order by CPU_time desc
    

     

    If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Posted in SQL General, SQL Query | Tagged: , , , , | 7 Comments »

    Script to clear stats

    Posted by Karthick P.K on January 20, 2010

    How to reset SQL Server stats with out restarting.

    DBCC SQLPERF ('spinlockstats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('netstats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('rastats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('iostats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('threads', CLEAR);

     

    GO

     

    DBCC SQLPERF ('logspace', CLEAR);

     

    GO

     

    DBCC SQLPERF ('umsstats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('waitstats', CLEAR);

     

    GO

    Posted in Performance, SQL General, SQL Query | Tagged: | 2 Comments »

    Monitoring Tempdb usage

    Posted by Karthick P.K on January 13, 2010

    Monitoring Tempdb space usage and identifying the session and query which Consumes Tempdb

    The total space used by Tempdb consists of 

    1. User Objects 

    2.  Internal Objects

    3. Version Store

    4. Free Space.

    Use  Below Query to Track which objects (above) is consuming pace in TempDb. 
     

    SELECT
    
    SUM(unallocated_extent_page_count) AS [free pages], (SUM(unallocated_extent_page_count)*1.0/128) AS [free space in MB],SUM(version_store_reserved_page_count) AS [version store pages used],
    
    (SUM(version_store_reserved_page_count)*1.0/128) AS [version store space in MB],SUM(internal_object_reserved_page_count) AS [internal object pages used],
    
    (SUM(internal_object_reserved_page_count)*1.0/128) AS [internal object space in [MB],SUM(user_object_reserved_page_count) AS [user object pages used],
    
    (SUM(user_object_reserved_page_count)*1.0/128) AS [user object space in MB]
    
    FROM sys.dm_db_file_space_usage;
    
    go
    

    Once you have identified the objects identify the query and session which is consuming tempdb using the query listed below

    –Use below query to identify which Query and Session is consuming the space in TempDB

     
    SELECT R1.session_id, R1.request_id, R1.Task_request_internal_objects_alloc_page_count, R1.Task_request_internal_objects_dealloc_page_count,
    
    R1.Task_request_user_objects_alloc_page_count,R1.Task_request_user_objects_dealloc_page_count,R3.Session_request_internal_objects_alloc_page_count ,
    
    R3.Session_request_internal_objects_dealloc_page_count,R3.Session_request_user_objects_alloc_page_count,R3.Session_request_user_objects_dealloc_page_count,
    
    R2.sql_handle, RL2.text as SQLText, R2.statement_start_offset, R2.statement_end_offset, R2.plan_handle FROM (SELECT session_id, request_id, 
    
    SUM(internal_objects_alloc_page_count) AS Task_request_internal_objects_alloc_page_count, SUM(internal_objects_dealloc_page_count)AS 
    
    Task_request_internal_objects_dealloc_page_count,SUM(user_objects_alloc_page_count) AS Task_request_user_objects_alloc_page_count,
    
    SUM(user_objects_dealloc_page_count)AS Task_request_user_objects_dealloc_page_count FROM sys.dm_db_task_space_usage 
    
    GROUP BY session_id, request_id) R1 INNER JOIN (SELECT session_id, SUM(internal_objects_alloc_page_count) AS Session_request_internal_objects_alloc_page_count,
    
    SUM(internal_objects_dealloc_page_count)AS Session_request_internal_objects_dealloc_page_count,SUM(user_objects_alloc_page_count) AS Session_request_user_objects_alloc_page_count,
    
    SUM(user_objects_dealloc_page_count)AS Session_request_user_objects_dealloc_page_count FROM sys.dm_db_Session_space_usage 
    
    GROUP BY session_id) R3 on R1.session_id = R3.session_id 
    
    left outer JOIN sys.dm_exec_requests R2 ON R1.session_id = R2.session_id and R1.request_id = R2.request_id
    
    OUTER APPLY sys.dm_exec_sql_text(R2.sql_handle) AS RL2
    Where 
    Task_request_internal_objects_alloc_page_count >0 or  
    Task_request_internal_objects_dealloc_page_count>0 or 
    Task_request_user_objects_alloc_page_count >0 or 
    Task_request_user_objects_dealloc_page_count >0 or 
    Session_request_internal_objects_alloc_page_count >0 or 
    Session_request_internal_objects_dealloc_page_count >0 or 
    Session_request_user_objects_alloc_page_count >0 or 
    Session_request_user_objects_dealloc_page_count >0 
    

    Known issues related to TEMPDB Shrink

    FIX: The used space in the tempdb database increases continuously when you run a query that creates internal objects in the tempdb database in SQL Server 2005
     

    Thank you,

    Karthick P.K | My Facebook Page |My Site| Blog space| Twitter

    Posted in Performance, SQL General, SQL Query | Tagged: , , , , , , , , , , | 4 Comments »

    Different Status bits of sysdatabases in SQLServer

    Posted by Karthick P.K on April 11, 2009

    Sysdatabases.status and sysdatabases.status2 Bit of Sysdatabases in SQLServer
    ======================================================

    <Script1>

    declare @status int;
    declare @status2 int;

    Set  @status=’65544′  –Replace your DB status here
    set @status2=’1090520064′ –Replace your DB status2 here

    SELECT CASE (@status & 1) WHEN 1 THEN 1 ELSE 0 END AS autoclose,
    CASE (@status & 4) WHEN 4 THEN 1 ELSE 0 END AS selectintobulkcopy,
    CASE (@status & 8) WHEN 8 THEN 1 ELSE 0 END AS trunclogonchkpt,
    CASE (@status & 16) WHEN 16 THEN 1 ELSE 0 END AS tornpagedetection,
    CASE (@status & 32) WHEN 32 THEN 1 ELSE 0 END AS loading, CASE (@status & 64) WHEN 64 THEN 1 ELSE 0 END AS prerecovery, CASE (@status & 128) WHEN 128 THEN 1 ELSE 0 END AS recovering, CASE (@status & 256) WHEN 256 THEN 1 ELSE 0 END AS notrecovered, CASE (@status & 512) WHEN 512 THEN 1 ELSE 0 END AS offline, CASE (@status & 1024) WHEN 1024 THEN 1 ELSE 0 END AS readonly, CASE (@status & 2048) WHEN 2048 THEN 1 ELSE 0 END AS dbouseonly, CASE (@status & 4096) WHEN 4096 THEN 1 ELSE 0 END AS singleuser, CASE (@status & 32768) WHEN 32768 THEN 1 ELSE 0 END AS emergencymode,
    CASE (@status & 4194304) WHEN 4194304 THEN 1 ELSE 0 END AS autoshrink, CASE (@status & 1073741824) WHEN 1073741824 THEN 1 ELSE 0 END AS cleanlyshutdown,
    CASE (@status2 & 16384) WHEN 16384 THEN 1 ELSE 0 END AS ansinulldefault, CASE (@status2 & 65536) WHEN 65536 THEN 1 ELSE 0 END AS concatnullyieldsnull,
    CASE (@status2 & 131072) WHEN 131072 THEN 1 ELSE 0 END AS recursivetriggers, CASE (@status2 & 1048576) WHEN 1048576 THEN 1 ELSE 0 END AS defaulttolocalcursor,
    CASE (@status2 & 8388608) WHEN 8388608 THEN 1 ELSE 0 END AS quotedidentifier, CASE (@status2 & 33554432) WHEN 33554432 THEN 1 ELSE 0 END AS cursorcloseoncommit,
    CASE (@status2 & 67108864) WHEN 67108864 THEN 1 ELSE 0 END AS ansinulls, CASE (@status2 & 268435456) WHEN 268435456 THEN 1 ELSE 0 END AS ansiwarnings,
    CASE (@status2 & 536870912) WHEN 536870912 THEN 1 ELSE 0 END AS fulltextenabled

    </Script1>

     

    =================================================================================================================

    <Script2>

    SELECT substring(name, 1, 50) as dbname, cmptlevel, filename, version, CASE (status & 1) WHEN 1 THEN 1 ELSE 0 END AS autoclose,
    CASE (status & 4) WHEN 4 THEN 1 ELSE 0 END AS selectintobulkcopy,
    CASE (status & 8) WHEN 8 THEN 1 ELSE 0 END AS trunclogonchkpt,
    CASE (status & 16) WHEN 16 THEN 1 ELSE 0 END AS tornpagedetection,
    CASE (status & 32) WHEN 32 THEN 1 ELSE 0 END AS loading, CASE (status & 64) WHEN 64 THEN 1 ELSE 0 END AS prerecovery, CASE (status & 128) WHEN 128 THEN 1 ELSE 0 END AS recovering, CASE (status & 256) WHEN 256 THEN 1 ELSE 0 END AS notrecovered, CASE (status & 512) WHEN 512 THEN 1 ELSE 0 END AS offline, CASE (status & 1024) WHEN 1024 THEN 1 ELSE 0 END AS readonly, CASE (status & 2048) WHEN 2048 THEN 1 ELSE 0 END AS dbouseonly, CASE (status & 4096) WHEN 4096 THEN 1 ELSE 0 END AS singleuser, CASE (status & 32768) WHEN 32768 THEN 1 ELSE 0 END AS emergencymode,
    CASE (status & 4194304) WHEN 4194304 THEN 1 ELSE 0 END AS autoshrink, CASE (status & 1073741824) WHEN 1073741824 THEN 1 ELSE 0 END AS cleanlyshutdown,
    CASE (status2 & 16384) WHEN 16384 THEN 1 ELSE 0 END AS ansinulldefault, CASE (status2 & 65536) WHEN 65536 THEN 1 ELSE 0 END AS concatnullyieldsnull,
    CASE (status2 & 131072) WHEN 131072 THEN 1 ELSE 0 END AS recursivetriggers, CASE (status2 & 1048576) WHEN 1048576 THEN 1 ELSE 0 END AS defaulttolocalcursor,
    CASE (status2 & 8388608) WHEN 8388608 THEN 1 ELSE 0 END AS quotedidentifier, CASE (status2 & 33554432) WHEN 33554432 THEN 1 ELSE 0 END AS cursorcloseoncommit,
    CASE (status2 & 67108864) WHEN 67108864 THEN 1 ELSE 0 END AS ansinulls, CASE (status2 & 268435456) WHEN 268435456 THEN 1 ELSE 0 END AS ansiwarnings,
    CASE (status2 & 536870912) WHEN 536870912 THEN 1 ELSE 0 END AS fulltextenabled FROM sysdatabases where name = ‘master’
    go
    SELECT fileid, name, filename, size, maxsize, growth, groupid
    FROM sysaltfiles WHERE dbid IN ( SELECT dbid FROM sysdatabases where name = ‘ Replace DB Name’ )
    go
    SELECT fileid, name, filename, size, maxsize, growth, groupid FROM sysfiles
    go
    SELECT groupid, groupname, CASE (status & 0x8) WHEN 0x8 THEN 1 ELSE 0 END AS readonly,
    CASE (status & 0x10) WHEN 0x10 THEN 1 ELSE 0 END AS isdefault FROM sysfilegroups WHERE groupid IN ( SELECT groupid FROM sysaltfiles a, sysdatabases b WHERE a.dbid = b.dbid AND b.name = ‘Replace DB Name’ )

    </Script2>

     

    Regards

    Karthick P.K

    Posted in SQL General | Tagged: , , , | 7 Comments »

    Installation of SQLServer2008 fails (The registry key SYSTEM\CurrentControlSet\Services\RsFx0102\InstancesShares is missing)

    Posted by Karthick P.K on April 10, 2009

    Installation of SQLServer2008 might fail with below error

    Detailed results:
    Feature:                       Database Engine Services
    Status:                        Failed: see logs for details
    MSI status:                    Passed
    Configuration status:          Failed: see details below
    Configuration error code:      0xCD263ADC@1306@30
    Configuration error description: The registry key SYSTEM\CurrentControlSet\Services\RsFx0102\InstancesShares is missing.
    Configuration log:             C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\Log\20090410_153833\Detail.txt

    To resolve this issue:

    Create a new KEY with name   InstancesShares under registry path

    HKEY_LOCAL_MACHINE\SYSTEM\ControlSet\Services\RsFx0102\

    Uninstall the failed Database engine components and reinstall the SQLServer 2008 instance after restarting the system.

    Regards

    Karthick P.K

    Posted in SQL General | Tagged: | 2 Comments »

    Installation of SQLserver2008 cluster fails on windows2008.(The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0x8007139F)

    Posted by Karthick P.K on March 26, 2009

    Installation of SQLserver2008 cluster might fail on windows2008 with error mentioned below

    The cluster resource ‘SQL Server’ could not be brought online.
    Error: The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0x8007139F)

     

    Root Cause

    This problem occurs because of a new security feature named Loopback check functionality. By default, loopback check functionality is turned ON in Windows and the value of the DisableLoopbackCheck registry entry is set to 0 (zero).
    http://support.microsoft.com/kb/957097/

    With this feature being turned ON: windows do not allow NTLM authentication if we try to access server from Local server using a name which is not its Net-Bios name (or) IPAddress.

    When SQL Server Agent is started, SQL Agent resource access the SQL Server using SQL  VirtualServer name and hence we do not allow NTLM. So the SQL Server Agent would fail and the SQLServer Agent Resource creation would also fail.

    SQL Server resource will fail to come Online because, IsAlive check will be done using NTLM Authentication i.e: Cluster service startup account resolves as NT AUTHORITY\ANONYMOUS LOGON when connecting to SQL Server for IsAlive check and the connection fails.

    We will not get in to this issue if startup account of SQL Server has permissions to read and write SPN’s.

    After the installation fails you will see the SQL Server resource is created but not the SQL Agent resource.

    There are three ways to resolve this issue.

    Option 1

    1. After the failure, create the SPN’s manually using SetSPN tool (or) Configure SQL Server service to create SPNs dynamically for the SQL Server instances (Refer KB: 811889)

    Example for creating SPN’s manually:
    SETSPN -A MSSQLSVC/VSName.XX.XX.EDU:1433
    SETSPN -A MSSQLSVC/VSName.XX.XX.EDU

    2. Bring the SQL Server Resource online.

    3. Create the SQL Server Agent resource type.

    {
    To add the sql server agent resource type execute the below command:

    cluster restype “SQL Server Agent” /create /DLL:sqagtres.dll .Once done we got the
    update that the Resource type ‘SQL Server Agent’ created.
    }

    4. Create SQL Server agent resource manually.

    We need to make sure that the newly created SQL server Agent resource have the virtualservername and Instance name .

    To add this property go to “failover cluster management” ==>SQL Server Agent Resource==>Properties==>properties
    check for the two parameters (virtualservername and Instancename) and fill in the
    details.

    }

    5. Change configuration reg_dword values of all components to 1 in below registry path

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQLX.MSSQLSERVER\ConfigurationState

    Option 2

    1. Do a Complete uninstall of failed installation (or) Configure SQL Server service to create SPNs dynamically for the SQL Server instances (Refer KB: 811889) and move to Step 3.

    2. Create the SPN’s before we do the installation. —

    Example:
    SETSPN -A MSSQLSVC/VSName.XX.XX.EDU:1433
    SETSPN -A MSSQLSVC/VSName.XX.XX.EDU

    Note:Beginning with SQL Server 2008, the SPN format is changed and new SPN format does not require a port number Refer: http://msdn.microsoft.com/en-us/library/ms191153.aspx

    3. Then install the SQL Server on cluster

    Option 3 (Recommended)

    1. Disable the authentication loopback check by setting the DisableLoopbackCheck value in
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa registry subkey to 1.
    To set the DisableLoopbackCheck registry entry to 1, follow below steps on all nodes of cluster.

    a. Click Start, click Run, type regedit, and then click OK.
    b. Locate the following registry path:
    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa
    c. Right-click Lsa, select New, and then click DWORD Value.
    d. Type DisableLoopbackCheck, and then press ENTER.
    e. Right-click DisableLoopbackCheck, and then click Modify.
    f. In the Value data box, type 1, and then click OK.

    2. Restart the system.

    3. Do complete uninstall and re-run the setup(or) Follow the steps from step2 in option 1.

     

    Note:

    1. We will encounter above error if we are installing the named instance of SQL Server and SQL Server browser is in stopped state. 

    2.  If you have installed SQLServer 2012 (Denali) and uninstalled it on same cluster. You might encounter above issue. Refer below link for   details.

    https://mssqlwiki.com/2012/01/31/sql-server-resource-fails-to-come-online-is-alive-check-fails/

    Regards

    Karthick P.K

    Posted in SQL Cluster Setup, SQL General, SQL Server Setup | Tagged: , , , , , , | 8 Comments »

    Using DMVs to find out the index usage history- SQLServer Index Usage

    Posted by Karthick P.K on March 10, 2009

    SQLserver 2005 ships with a set of DMVs that can help you identify the missing indexes for your workload, Analyze the effectiveness of the existing ones and help find out index fragmentation.

    Using DMVs to find out the index usage history

    Over a period of time, you could create a lot of indexes on your tables and modify existing ones. However, in SQL 2000, you couldn’t estimate how effective were each of these indexes. Poorly design indexes could lead to performance overhead instead of enhancing performance.

    In SQL 2005, you can query the sys.dm_db_index_usage_stats DMV to find out the indexes that have NEVER been used since the last start of SQL Server. You can use the following query to find that out:

     1: select object_name(i.object_id) as ObjectName,

     

     2: i.name as IndexName, s.user_updates, s.user_seeks, s.user_scans,

     

     3: s.user_lookups

     

     4: from sys.indexes i

     

     5: left join sys.dm_db_index_usage_stats s

     

     6: on s.object_id = i.object_id and i.index_id = s.index_id and s.database_id =

     

     7: <dbid>

     

     8: where objectproperty(i.object_id, 'IsIndexable') = 1 and

     

     9: -- index_usage_stats has no reference to this index (not being used)

     

     10: s.index_id is null or

     

     11: -- index is being updated, but not used by seeks/scans/lookups

     

     12: (s.user_updates > 0 and s.user_seeks = 0

     

     13: and s.user_scans = 0 and s.user_lookups = 0)

     

     14: order by object_name(i.object_id) asc

     

    In the output, you will ALL the indexes that have never been used by any sort of workload on your server, since the last start of SQL Server.

    For indexes that have NEVER been used (either for a SELECT or a DML statement), all columns will be NULL

    For indexes that have NEVER been used (for a SELECT), but had to be updated due to a DML statement, the user_updates column will be >0, while other columns will be 0. It is these indexes that could cause severe performance overhead for your DML statements and might be worth dropping.

    Give ample time for SQL Server to get exposed to all the workload after a restart, before running this query.

    Using DMVs to find out missing indexes

    When the query optimizer generates a query plan, it analyzes what are the best indexes for a particular filter condition. If the best indexes do not exist, the query optimizer generates a suboptimal query plan, but still stores information about these indexes. The missing indexes feature enables you to access information about these indexes so you can decide whether they should be implemented.

    For more information on how to use this feature, please visit the following link

    http://msdn2.microsoft.com/en-us/library/ms345417.aspx

     

    Using DMVs to find out index fragmentation

    The sys.dm_db_index_physical_stats dynamic management function replaces the DBCC SHOWCONTIG statement.

    You can learn more about using this DMV to identify fragmentation, correcting it and possibly automating this activity for your server by visiting the following link:

    http://msdn2.microsoft.com/en-us/library/ms188917.aspx

    Regards

    Karthick P.K

    Posted in Performance, SQL General | Tagged: | 1 Comment »

    How to find who altered my SQL Server Login

    Posted by Karthick P.K on January 25, 2009

    Do you know how to find who changed SQL Server login or Password?

    Here is the way…..

    SELECT [Transaction SID],suser_sname([Transaction SID]) as ‘Login Name’   FROM ::fn_dblog(default, default)   WHERE [Transaction Name]=’ALTER LOGIN’

     

     

    Thanks

    Karthick P.K

    Posted in DBCC, Security, SQL General | 3 Comments »

    How to Browse (or) view objects and there code in mssqlsystemresource Database

    Posted by Karthick P.K on January 15, 2009

    How to Browse (or) view mssqlsystemresource Database.

    1. Stop the SQL Server service
    Copy the mssqlsystemresource.MDF and mssqlsystemresource.LDF to a new path (This
    two files will be in same path where master databse is located).
    Start the SQL Server Service
    Ues the following command to attach the data and log file as a new user database.

    EXEC sp_attach_db
    ‘mssqlsystemresource _Copy’, ‘<Path where you copied>\resource_copy.mdf’,
    ‘<Path where you copied>\resource_copy.ldf’
    Now you browse through all the system objects and there code.

    OR

    1. Start SQL Server in single user mode.

    2. Open SSMS and connect using DAC (ADMIN) Connection and change DB context to “mssqlsystemresource”.

    use mssqlsystemresource

    3. Query Resource DB objects.

    Regards

    Karthick P.K

    Posted in Configuration, SQL General, SQL Server Engine | Tagged: , , | 3 Comments »