Archive for the ‘SQL General’ Category

Running 1 Million databases on Azure SQL

Posted by Karthick P.K on October 11, 2020

Curious to know how my team manage over 1 million SQL databases across dozens of datacenters in Azure to enable the Common Data Service behind the Power Platform & Dynamics 365 hands free through Spartan and DAMS!

Read for some never before shared details

https://www.linkedin.com/posts/karthick-pk-1b342727_running-1m-databases-on-azure-sql-for-a-large-activity-6720400296566243328-x6_t

Thank you

Karthick P.K

Posted in SQL General | Leave a Comment »

Transactional Replication Part -2

Posted by Karthick P.K on November 22, 2013

Transactional Replication Part -2 of transactional replication series covers

Demo of data flow, configuring distributor, publisher, publication, subscription etc. After watching this video you will be able to correlate the concepts we discussed in earlier video, configure transactional replication on your own , Understand different replication agents like snapshot agent, log reader agent and distribution agent and how to monitor these agents after the transactional replication is configured

How to configure transactional replication By Gaurav Mathur

Posted in SQL General | Tagged: How to configure transactional replication, Setup transactional replication, step by step guide to transactional replication | 3 Comments »

Transactional Replication Part -1

Posted by Karthick P.K on November 22, 2013

Transactional Replication Part -1 of transactional replication series covers about

1. Architecture and transactional replication data flow.
2. Different entities involved in transactional replication like Publisher Server, Distributor Server and Subscriber Server, publication, publication database, subscription, subscription database, articles are discussed in this video.
3. Replication agents involved in one way transactional replication and their usage of different agents like snapshot agent, log reader agent and distribution agent are also discussed
4. Steps involved in configuring transactional replication like configuring distributor, publisher and subscriber along with configuring publication and subscription are also discussed in this video.
5. Any DBA can look into this video and can learn the Transactional Replication Data flow, working and how to configure Transactional replication.

After watching the below video you can look at the Transactional replication Part 2 demo video which will help you to learn the above concepts practically and will enable you to configure replication on your servers.

Transactional Replication internals and architecture by Gaurav Mathur

Posted in Replication, SQL General | Tagged: How Transactional replication works, replication agents, replication data flow, Transactional replication architecture, Transactional Replication internals | 2 Comments »

Tempdb latch contention

Posted by Karthick P.K on September 17, 2013

You might see Page latch contention in tempdb when you repeatedly drop and create TempDb objects (Temp tables, table variables etc.).

When you notice PAGELATCH_* contention on tempdb (Wait resource in sysprocesses starts with 2: ) check if the latch wait is on PFS,GAM or SGAM page. When there is latch contention on tempdb you will see lot of sessions waiting on Pagelatch_* similar to one below.

In the below output session is waiting on resource 2:15:121320 . If we decode the wait resource it is 2: database id of tempdb , 15: file number , 121320 is page number. 121320 is in multiple of 8088 so it is a PFS page, similarly identify if the page we are waiting is GAM or SGAM page if it is not PFS page.

Wait type Wait resource

PAGELATCH_UP 2:15:121320

How to identify if page is PFS,GAM or IAM?

PFS Page: A PFS page occurs once in 8088 pages. SQL Server will attempt to place a PFS page on the first page of every PFS interval(8088Pages). The only time a PFS page is not the first page in its interval is in the first interval for a file. File header page is first, and the PFS page is second. (Page ID starts from 0 so the first PFS page is at Page ID 1). If (page number)/8088 is round value then the page is PFS page.

GAM Page: GAM page is page 2 in the data file, next GAM page is placed at 511230 Page after first GAM page (GAM interval). If (page number-1)/511230 is round value then the page is GAM page.

SGAM Page: SGAM page is page 3 in data file , next SGAM page is placed at 511230 Page after first SGAM page. If (page number-2)/511230 is round value then the page is GAM page.

How to resolve?

1. Increase the number of TEMPDB data files files and size them equally. As a general rule, if the number of logical processors is less than or equal to 8, use the same number of data files as logical processors. If the number of logical processors is greater than 8, use 8 data files and then if contention continues further increase the number of data files by multiples of 4 (You may not see improvement once you reach 32 files).

2. Enable server side trace flag 1118.

3. If you further see latch contention on PFS page after following above two steps then the only option is to modify your application to limit the tempdb usage.

4. If you see contention on 2:1:103 (Page 103 is for system table sys.sysmultiobjrefs. This table manages the relationship between created objects in every database). The only way to reduce contention on this page is reduce the relation. Example creating lot of temp tables with primary key can cause this contention because the relation between the table and PK constraint has to be updated in sys.sysmultiobjrefs.

What’s the best practice ?

1. Create multiple tempdb data files instead of creating 1 large file and size them equally in all your SQL Server instances.

2. Make TF1118 (Uniform allocation) as default. (Extra space required by this trace flag shouldn’t really matter as amount additional space required is minimal and storage cost is not that high these days).

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Performance, Space management, SQL General, SQL Server Engine | Tagged: PAGELATCH_up, Wait resource 2:1:1, Wait resource 2:1:103, Wait resource 2:1:2, Wait resource 2:1:3 | 1 Comment »

Troubleshooting Transactional replication Latency using Agent Statistics

Posted by Prabhakar Bhaskaran on September 13, 2013

Troubleshooting latency issues in replication is black box for many DBA’s, In this post I will explain how you can leverage the agent statistics to troubleshoot the latency issues.

Before understanding how to decode the agent statistics, lets take a look at the some of the basic things which will help us to troubleshoot the replication performance issue in better way.

The following MSDN diagram depicts the transactional replication architecture in simple manner.

Transactional replication components and data flow

Troubleshooting latency issues is multi step approach, first step is identify which agent is slow,

Log reader Agent (Publisher to Distributor)
Distribution Agent (Distributor to Subscriber)

So, the problem can be either log reader or distribution agent, we can identify this by just simply inserting the tracer token.

Once we find out the problematic agent the next step is to identify within the agent which particular thread causing the issue.

Let me introduce you to the important threads and its work on these replication agents in nutshell.

Log Reader Agent

Reader Thread – It scans the publisher database transaction log using sp_replcmds

Writer Thread – Add the queued transactions to Distribution database using sp_MSadd_repl_commands

Distribution Agent

Reader thread – It finds the watermark from the table Msreplication_subscriptions(on subscriber) and uses this information to retrieve pending commands from the Distribution database. It basically uses the stored procedure sp_MSget_replcommands to achieve it.

Writer thread – Writer thread uses the Batched RPC calls to write the information to subscriber database.

Now that we understood the threads in the replication agents. let’s assume we already identified which agent is slow by inserting tracer token. Next is to dig deeper on thread level, this is where our replication agent statistics comes to rescue us.

Agent statistics entries appended to history tables every 5 minutes by default. It provides the historical view of how the agent has been performing and keeps the last 3 days data. You can keep for more days by changing the history retention period.

MSlogreader_history

MSdistribution_history

the above two tables are located in Distribution database. The statistics information is added as XML blob in comments column of these tables.

Now, lets take a look at how to decipher this XML Data for each agents.

Log Reader Agent statistics

– State = 1 means stats after batch commit

–Work = cumulative time spent by the agent since restart – idle time

–Idle = Time spent waiting to call sp_replcmds

–Reader fetch = Time to do execute sp_replcmds

Wait = Time spent waiting on writer to release buffer

–Writer write = Time spent writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer

Note: Each thread will have their own buffer with 40k in size.

Here,we need to look at the wait time to understand where the bottleneck exist.For example, if you notice wait time for Reader thread is high then it essentially means your writer thread is slow since reader thread is waiting for writer to release the buffer. Similarly, if you notice high wait time for writer thread then your reader thread is performing slow.

The simple way to decode this is,

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the commands to distribution database)

HIGH Wait time on Writer thread = Reader thread is slow ( thread which scans the transaction log)

Distribution Agent Statistics

<stats state=”1″ work=”154″ idle=”351464″>
<reader fetch=”144″ wait=”11″/>
<writer write=”12″ wait=”338″/>
<sincelaststats elapsedtime=”305″ work=”10″ cmds=”81262″ cmdspersec=”8041.000000″><reader fetch=”0″ wait=”9″/><writer write=”10″ wait=”0″/></sincelaststats></stats>
– State =1 means stats after a batch commit

– Work = cumulative time spend by the agent since restart – idle time (seconds)

– Idle = Time spend waiting to call sp_msget_repl_commands

– Reader fetch = Time to do execute sp_msget_repl_commands

Wait = Time spent waiting on writer to release buffer.

– Writer write = Time spend writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer.

Similar to log reader agent, the decoding of wait time is same way we did for log reader agent.

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the subscriber database using batched RPC Calls)

HIGH wait time on Writer thread = Reader thread is slow ( thread which takes the pending commands from Distribution database)

Distributor Writer thread Slow Scenario

We would be able to understand this concepts better by looking at the example statistics, In this below case, I explicitly started the transaction on subscriber table to simulate blocking at the subscriber side making the writer thread of distribution agent to wait and build up latency.

This is how stats looked,

<stats state=”1″ work=”755″ idle=”354505″>
<reader fetch=”153″ wait=”604″/>
<writer write=”613″ wait=”346″/>
<sincelaststats elapsedtime=”636″ work=”515″ cmds=”45033″ cmdspersec=”87.000000″><reader fetch=”0″ wait=”515″/><writer write=”515″ wait=”0″/></sincelaststats></stats>

We can clearly see Reader thread wait time is high(515) which means writer thread is slow since we simulated the blocking on subscriber side.

Similarly,we can simulate the blocking on replication tables msrepl_commands and msrepl_transactions which will cause Log reader writer thread to be slow and stats will show Reader thread wait time as high.

Ok, now we isolated the source of bottleneck in thread level, After this we can just follow the standard performance troubleshooting approach described in this Whitepaper to troubleshoot the slowness of the replication session.

For instance, check out the video where Joe Sack talks about using Extended events to troubleshoot the Distributor writer thread slowness.

In Summary

1. Find which agent is causing slowness using tracer token.

2. Leverage the Agent statistics to narrow down problem to thread level .

3. Follow standard performance troubleshooting approach to resolve the issue.

Thanks for reading! I hope this will help you to troubleshoot the replication performance better next time.

Posted in Performance, Replication, SQL General | Tagged: Agent statistics, latency, Replication, replication latency, replication performance, Transactional replication | 2 Comments »

The connection to the primary replica is not active. The command cannot be processed

Posted by Karthick P.K on June 20, 2013

When you configure SQL Server always on available group from management studio it may fail with below error while joining secondary replica to the availability group.

Error 1

{

Joining database on secondary replica resulted in an error. (Microsoft.SqlServer.Management.HadrTasks)

——————————

ADDITIONAL INFORMATION:

Failed to join the database ‘AG’ to the availability group ‘AG1’ on the availability replica ‘NODE2’. (Microsoft.SqlServer.Smo)

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

——————————

The connection to the primary replica is not active. The command cannot be processed. (Microsoft SQL Server, Error: 35250)

}

Error 2

{

TITLE: Microsoft SQL Server Management Studio

——————————

Failed to join the instance ‘NODE2’ to the availability group ‘AG1’. (Microsoft.SqlServer.Management.SDK.TaskForms)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=11.0.2100.60+((SQL11_RTM).120210-1917+)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

——————————

ADDITIONAL INFORMATION:

Failed to join local availability replica to availability group ‘AG1’. The operation encountered SQL Server error 41106 and has been rolled back. Check the SQL Server error log for more details. When the cause of the error has been resolved, retry the ALTER AVAILABILITY GROUP JOIN command. (Microsoft SQL Server, Error: 41158)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=11.00.2100&EvtSrc=MSSQLServer&EvtID=41158&LinkId=20476

}

You may get below error when you configure AG availability group using alter database command mentioned below or synchronization might fail with 35250 error mentioned below.

ALTER DATABASE [AG] SET HADR AVAILABILITY GROUP = [Group name];

Error 1

Msg 35250, Level 16, State 7, Line 1

The connection to the primary replica is not active. The command cannot be processed.

To resolve above errors

1. Ensure always on endpoint ([Hadr_endpoint]) are not blocked by firewall (Default port 5022).

2. Make sure startup account of primary server is added to all secondary server’s and Startup accounts of all secondary servers are added to primary servers.(Startup account of each replica to be added to other replica’s)

3. If log on account of SQL Server is “Nt service\” or local system account then ensure system account (Domainname\systemname$) of each replica is added to other replicas.

{

CREATE LOGIN [MSSQLWIKI\node2$] FROM WINDOWS

}

4. Grant connect on always on endpoints created on each replicas for startup account of other replica servers (Grant connect on endpoints even if startup account of other replicas are added as sysadmins).

{

GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [MSSQLWIKI\node1$]

}

5. Make sure SQL Server name (select @@servername) matches with hostname.

6. Make sure cluster service startup account is part of SQL Server logins (More details in This link).

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Always On, Configuration, Connectivity, Security, SQL General | Tagged: Error: 35250), Failed to join local availability replica to availability group 'AG1'. The operation encountered SQL Server error 41106 and has been rolled back, Failed to join the database '' to the availability group '' on the availability replica, Joining database on secondary replica resulted in an error, The connection to the primary replica is not active. The command cannot be processed., The connection to the primary replica is not active. The command cannot be processed. (Microsoft SQL Server, The operation encountered SQL Server error 41106 and has been rolled back | 28 Comments »

Inside sys.dm_os_ring_buffers

Posted by Karthick P.K on March 29, 2013

Sys.dm_os_ring_buffers DMV can be used to troubleshoot connectivity errors, track exceptions, monitor system health, memory pressure, Non-yielding/Deadlocked schedulers and a lot more.

You can use below scripts to query the data from sys.dm_os_ring_buffers during troubleshooting.

USE master
go
SET NOCOUNT ON
SET QUOTED_IDENTIFIER ON
GO
PRINT 'Start Time: ' + CONVERT (varchar(30), GETDATE(), 121)
GO
PRINT ''
PRINT '==== SELECT GETDATE()'
SELECT GETDATE()
PRINT ''
PRINT ''
PRINT '==== SELECT @@version'
SELECT @@VERSION
GO
PRINT ''
PRINT '==== SQL Server name'
SELECT @@SERVERNAME
GO
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_CONNECTIVITY - LOGIN TIMERS'
 
SELECT a.* FROM
(SELECT 
x.value('(//Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(30)') AS [RecordType], 
x.value('(//Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(30)') AS [RecordSource], 
x.value('(//Record/ConnectivityTraceRecord/Spid)[1]', 'int') AS [Spid], 
x.value('(//Record/ConnectivityTraceRecord/OSError)[1]', 'int') AS [OSError], 
x.value('(//Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') AS [SniConsumerError], 
x.value('(//Record/ConnectivityTraceRecord/State)[1]', 'int') AS [State], 
x.value('(//Record/ConnectivityTraceRecord/RecordTime)[1]', 'nvarchar(30)') AS [RecordTime],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') AS [TdsInputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') AS [TdsOutputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') AS [TdsInputBufferBytes],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]', 'int') AS [TotalLoginTimeInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]', 'int') AS [LoginTaskEnqueuedInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]', 'int') AS [NetworkWritesInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]', 'int') AS [NetworkReadsInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]', 'int') AS [SslProcessingInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]', 'int') AS [SspiProcessingInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]', 'int') AS [LoginTriggerAndResourceGovernorProcessingInMilliseconds]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers 
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') AS R(x)) a
where a.RecordType = 'LoginTimers'
order by a.recordtime 
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_CONNECTIVITY - TDS Data'
 
SELECT a.* FROM
(SELECT 
x.value('(//Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(30)') AS [RecordType], 
x.value('(//Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(30)') AS [RecordSource], 
x.value('(//Record/ConnectivityTraceRecord/Spid)[1]', 'int') AS [Spid], 
x.value('(//Record/ConnectivityTraceRecord/OSError)[1]', 'int') AS [OSError], 
x.value('(//Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') AS [SniConsumerError], 
x.value('(//Record/ConnectivityTraceRecord/State)[1]', 'int') AS [State], 
x.value('(//Record/ConnectivityTraceRecord/RecordTime)[1]', 'nvarchar(30)') AS [RecordTime],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') AS [TdsInputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') AS [TdsOutputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') AS [TdsInputBufferBytes],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]', 'int') AS [PhysicalConnectionIsKilled],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]', 'int') AS [DisconnectDueToReadError],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]', 'int') AS [NetworkErrorFoundInInputStream],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]', 'int') AS [ErrorFoundBeforeLogin],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]', 'int') AS [SessionIsKilled],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]', 'int') AS [NormalDisconnect]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers 
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') AS R(x)) a
where a.RecordType = 'Error'
order by a.recordtime
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_SECURITY_EORROR'
 
SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, rbf.[timestamp] - tme.ms_ticks, GETDATE()) as [Notification_Time],
cast(record as xml).value('(//SPID)[1]', 'bigint') as SPID,
cast(record as xml).value('(//ErrorCode)[1]', 'varchar(255)') as Error_Code,
cast(record as xml).value('(//CallingAPIName)[1]', 'varchar(255)') as [CallingAPIName],
cast(record as xml).value('(//APIName)[1]', 'varchar(255)') as [APIName],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time],tme.ms_ticks as [Current Time]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
where rbf.ring_buffer_type = 'RING_BUFFER_SECURITY_ERROR'
ORDER BY rbf.timestamp ASC
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_EXCEPTION'
 
SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as Time_Stamp,
cast(record as xml).value('(//Exception//Error)[1]', 'varchar(255)') as [Error],
cast(record as xml).value('(//Exception/Severity)[1]', 'varchar(255)') as [Severity],
cast(record as xml).value('(//Exception/State)[1]', 'varchar(255)') as [State],
msg.description,
cast(record as xml).value('(//Exception/UserDefined)[1]', 'int') AS [isUserDefinedError],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type], 
cast(record as xml).value('(//Record/@time)[1]', 'int') AS [Record Time],
tme.ms_ticks as [Current Time]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
cross join sys.sysmessages msg
where rbf.ring_buffer_type = 'RING_BUFFER_EXCEPTION' 
and msg.error = cast(record as xml).value('(//Exception//Error)[1]', 'varchar(500)') and msg.msglangid = 1033 
ORDER BY rbf.timestamp ASC

PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_RESOURCE_MONITOR to capture external and internal memory pressure'

SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime], 
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as [Notification_Time],  
cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') AS [Notification_type],  
cast(record as xml).value('(//Record/MemoryRecord/MemoryUtilization)[1]', 'bigint') AS [MemoryUtilization %],  
cast(record as xml).value('(//Record/MemoryNode/@id)[1]', 'bigint') AS [Node Id],  
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') AS [Process_Indicator],  
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') AS [System_Indicator], 
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@reversed)[1]', 'int') AS [reserved], 
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[1]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@reversed)[1]', 'int') AS [reserved],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[2]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@reversed)[1]', 'int') AS [reserved],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[3]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/MemoryNode/ReservedMemory)[1]', 'bigint') AS [SQL_ReservedMemory_KB],  
cast(record as xml).value('(//Record/MemoryNode/CommittedMemory)[1]', 'bigint') AS [SQL_CommittedMemory_KB],  
cast(record as xml).value('(//Record/MemoryNode/AWEMemory)[1]', 'bigint') AS [SQL_AWEMemory],  
cast(record as xml).value('(//Record/MemoryNode/SinglePagesMemory)[1]', 'bigint') AS [SinglePagesMemory],  
cast(record as xml).value('(//Record/MemoryNode/MultiplePagesMemory)[1]', 'bigint') AS [MultiplePagesMemory],  
cast(record as xml).value('(//Record/MemoryRecord/TotalPhysicalMemory)[1]', 'bigint') AS [TotalPhysicalMemory_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [AvailablePhysicalMemory_KB],  
cast(record as xml).value('(//Record/MemoryRecord/TotalPageFile)[1]', 'bigint') AS [TotalPageFile_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailablePageFile)[1]', 'bigint') AS [AvailablePageFile_KB],  
cast(record as xml).value('(//Record/MemoryRecord/TotalVirtualAddressSpace)[1]', 'bigint') AS [TotalVirtualAddressSpace_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [AvailableVirtualAddressSpace_KB],  
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],  
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],  
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time], 
tme.ms_ticks as [Current Time] 
FROM sys.dm_os_ring_buffers rbf 
cross join sys.dm_os_sys_info tme 
where rbf.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR' --and cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') = 'RESOURCE_MEMPHYSICAL_LOW' 
ORDER BY rbf.timestamp ASC


PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_SCHEDULER_MONITOR to Monitor system health'

SELECT  CONVERT (varchar(30), GETDATE(), 121) as runtime, DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE()) AS Notification_time,    a.* , sys.ms_ticks AS [Current Time]  
FROM   (SELECT x.value('(//Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS [ProcessUtilization],    
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle %],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/UserModeTime) [1]', 'bigint') AS [UserModeTime],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/KernelModeTime) [1]', 'bigint') AS [KernelModeTime],    
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/PageFaults) [1]', 'bigint') AS [PageFaults],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/WorkingSetDelta) [1]', 'bigint')/1024 AS [WorkingSetDelta],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/MemoryUtilization) [1]', 'bigint') AS [MemoryUtilization (%workingset)],   
x.value('(//Record/@time)[1]', 'bigint') AS [Record Time]  FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers    
WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR') AS R(x)) a  CROSS JOIN sys.dm_os_sys_info sys ORDER BY DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE())

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in SQL General, SQL Server Engine, SQL Server memory, SQL Server Tools | Tagged: dm_os_ring_buffers, RING_BUFFER_CONNECTIVITY, RING_BUFFER_EXCEPTION, RING_BUFFER_RESOURCE_MONITOR, RING_BUFFER_SCHEDULER_MONITOR, RING_BUFFER_SECURITY_ERROR | 2 Comments »

SQL Server memory leak

Posted by Karthick P.K on December 4, 2012

What is memory leak?

When a process allocates memory it is supposed to de-allocate it and release it back to OS. If it misses to de-allocate the memory due to flaw in code it is called as leak and It can cause memory pressure both to the operating system and application.

Myth about SQL Server memory leak

SQL Server memory management is designed to dynamically grow and shrink its memory based on the amount of available memory on the system and Max server memory setting in SQLServer.

Many times system admins look at the memory usage of SQLServer and assume SQLServer is leaking memory if they find SQL Server memory usage is high.

This is incorrect SQL Server is server based application and its memory manager is designed in such a way that it will keep growing its memory usage on need (Exception large pages) and will not scale down its usage unless there is low memory notification from Windows. We can control the memory usage of SQL Server using Max server memory setting in SQLServer. This setting limits the Bpool usage of SQL Server and doesn’t control the overall memory usage of SQLServer. There are portions of SQLServer memory that is allocated outside BPOOL (aks: MTL or MTR) we do not have a way to control how much memory SQL Server can use outside bpool, but non bool memory usage will be normally low and can be easily estimated by studying the components running in SQL Server.

Ex: If you want to set SQLServer to use only 10GB RAM on server. Consider how much memory SQL Server might need outside Bpool and set the “max server memory” setting accordingly. In this case if you estimate SQL Server will use 1.5GB outside Bpool then set the Max server memory to 8.5GB.

What can cause SQL Server Memory leak?

SQL Server code has a logic to allocate memory but doesn’t de-allocate it. If any of the components in SQL Server is causing a memory leak in SQL Server it can be identified easily using the DMV’s like sys.dm_os_memory_allocation,sys.dm_os_memory_clerks and sys.dm_os_memory_objects etc., but most of the memory leaks in SQL Server is caused by 3rd party Dll’s which are loaded in SQL Server process.

Note: All the memory allocations by Non SQL server Dll’s loaded in SQL Server will happens in “Mem to Leave”(outside the Bpool) and they are called as direct windows allocations (DWA)

When there is out of memory conditions in SQL Server and if you suspect there is a memory leak.First thing to determine is who is consuming the memory. If SQL Server is not using majority of the memory in MemToLeave and still you get Mem to leave errors probably there is a leak and it caused by some DLL’s loaded in

SQL Server. Refer Section 1 (MTL error) in https://mssqlwiki.com/sqlwiki/sql-performance/troubleshooting-sql-server-memory/

Below query can be used to determine actual memory consumption by SQL Server in MTL.

select sum(multi_pages_kb) from sys.dm_os_memory_clerks

If the memory consumption by SQL Server is very low and still if you see SQL Server memory errors like few below then focus on Leaks.

Ex:

SQL Server 2000

WARNING: Failed to reserve contiguous memory of Size= 65536.

WARNING: Clearing procedure cache to free contiguous memory.

Error: 17802 “Could not create server event thread.”

SQL Server could not spawn process_loginread thread.

SQL Server 2005/2008

Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 122880

How to identify and troubleshoot the memory leak?

There are multiple ways in windows to identify who is leaking memory in process. We will discuss how to identify the memory leak using

1. Windows debugger 2. Debug diagnostics tools for windows and 3. UMDH in this blog.

Let us create a sample DLL to load in SQL server process to leak memory and see how to use the tools I mentioned above to troubleshoot the leak.

Download HeapLeak.dll from This link and install Microsoft Visual C++ 2010 Redistributable Package from this links 32-Bit or 64-Bit to make this DLL work.

–Create an extended stored procedure in SQL Server

exec sp_addextendedproc  'HeapLeak','C:\HeapLeakdll\HeapLeak.dll'

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

We will also enable below trace flags in SQL Server to automatically generate filter dump when there is out of memory errors and see how to identify who is leaking.

dbcc traceon (2551,-1) — 2551 is used to enable filter dump.

dbcc traceon (8004,-1) –8004 is used to take memory dump on first occurrence of OOM condition

–Note: Both the trace flags listed above are un-documented, So use it at your own risk and there is no guarantee that this trace flags will work in future versions of SQL Server

Once we enable the trace flag . We have to cause out memory error in SQL Server to generate OOM memory dump. We have leaked around 300 MB of memory from MTL by executing above extended SP 30 times.

Let use execute below script which create XML handles. Memory for xml handles is allocated from MTL we will get out of memory errors very soon because extended stored procedure which we executed has already leaked the memory.

(Do not run below XML script directly with out executing HeapLeak Below script will cause OOM error because of handle created for each execution, but it is accounted as SQL Server allocation so will not help us to understand the how to debug leaks caused by 3rd party DLL’s)

Note: 1. SQL Server memory dump will be generated in SQL Server error log folder.
2. Size of MTL is 256 MB + Max worker threads *.5 in 32-Bit SQL Server. So approximately 384 MB unless modified using –g switch.

DECLARE @idoc int
 
DECLARE @doc varchar(1000)
 
SET @doc ='<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
<Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
     <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
      <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
   </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
   <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
   <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
   </Order>           
</Customer>
</ROOT>'
 
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
 
go 10000

We will receive below error after few executions.

Msg 6624, Level 16, State 12, Procedure sp_xml_preparedocument, Line 1

XML document could not be created because server memory is low.

To analyze the dump download and Install Windows Debugger from http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

Step 1 (Load the memory dump file to debugger):

Open Windbg . Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

Step 2 (Set the symbol path to Microsoft symbols server):

on command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3 (Load the symbols from Microsoft symbols server):

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4 (check if symbols are loaded):

Verify if symbols are loaded for SQL Server by using the debugger command lmvm

:028> lmvm sqlservr

start end module name

01000000 02ba8000 sqlservr (pdb symbols) c:\websymbols\sqlservr.pdb\93AACB610C614E1EBAB0FFB42031691D2\sqlservr.pdb

Loaded symbol image file: sqlservr.exe

Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

Image name: sqlservr.exe

Timestamp: Fri Oct 14 15:35:29 2005 (434F82E9)

CheckSum: 01B73B9B

ImageSize: 01BA8000

File version: 2005.90.1399.0

Product version: 9.0.1399.0

File flags: 0 (Mask 3F)

File OS: 40000 NT Base

File type: 1.0 App

File date: 00000000.00000000

Translations: 0409.04e4

CompanyName: Microsoft Corporation

ProductName: Microsoft SQL Server

InternalName: SQLSERVR

OriginalFilename: SQLSERVR.EXE

ProductVersion: 9.00.1399.06

FileVersion: 2005.090.1399.00

FileDescription: SQL Server Windows NT

LegalTrademarks: Microsoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

Comments: NT INTEL X86

Step 5 : (!address to display the memory information)

Use !address command to display the memory information of the process from dump.

0:028> !address -summary

——————– Usage SUMMARY ————————–

TotSize ( KB) Pct(Tots) Pct(Busy) Usage

686a7000 ( 1710748) : 81.58% 81.80% : RegionUsageIsVAD

579000 ( 5604) : 00.27% 00.00% : RegionUsageFree

4239000 ( 67812) : 03.23% 03.24% : RegionUsageImage

ea6000 ( 15000) : 00.72% 00.72% : RegionUsageStack

1e000 ( 120) : 00.01% 00.01% : RegionUsageTeb

122d0000 ( 297792) : 14.20% 14.24% : RegionUsageHeap

0 ( 0) : 00.00% 00.00% : RegionUsagePageHeap

1000 ( 4) : 00.00% 00.00% : RegionUsagePeb

1000 ( 4) : 00.00% 00.00% : RegionUsageProcessParametrs

1000 ( 4) : 00.00% 00.00% : RegionUsageEnvironmentBlock

Tot: 7fff0000 (2097088 KB) Busy: 7fa77000 (2091484 KB)

——————– Type SUMMARY ————————–

TotSize ( KB) Pct(Tots) Usage

579000 ( 5604) : 00.27% : <free>

4239000 ( 67812) : 03.23% : MEM_IMAGE

5fc000 ( 6128) : 00.29% : MEM_MAPPED

7b242000 ( 2017544) : 96.21% : MEM_PRIVATE

——————– State SUMMARY ————————–

TotSize ( KB) Pct(Tots) Usage

1b7bd000 ( 450292) : 21.47% : MEM_COMMIT

579000 ( 5604) : 00.27% : MEM_FREE

642ba000 ( 1641192) : 78.26% : MEM_RESERVE

Largest free region: Base 00000000 – Size 00010000 (64 KB)

Look at the RegionUsageHeap it is around 297792 KB and largest free region is just 64KB. We know SQL Server doesn’t use Heap’s extensively so normally the heap allocated by SQL Server will not go beyond few MB. In this case it is consuming around 290 MB and so other components which use MTL can easily fail.

Let us try to understand why the Heap is around 297792 KB and try to identify if there is a pattern.

Step 6: (Let us use !heap –s to display summary information about the heap)

0:028> !heap -s

LFH Key : 0x672ddb11

Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast

(k) (k) (k) (k) length blocks cont. heap

—————————————————————————–

000d0000 00000002 1024 896 896 6 1 1 0 0 L

001d0000 00008000 64 12 12 10 1 1 0 0

002c0000 00001002 1088 96 96 2 1 1 0 0 L

002e0000 00001002 64 52 52 3 2 1 0 0 L

007c0000 00001002 64 64 64 56 1 0 0 0 L

00d10000 00001002 256 24 24 8 1 1 0 0 L

340b0000 00001002 64 28 28 1 0 1 0 0 L

340c0000 00041002 256 12 12 4 1 1 0 0 L

342a0000 00000002 1024 24 24 3 1 1 0 0 L

34440000 00001002 64 48 48 40 2 1 0 0 L

61cd0000 00011002 256 12 12 4 1 1 0 0 L

61d10000 00001002 64 16 16 7 1 1 0 0 L

61d20000 00001002 64 12 12 4 1 1 0 0 L

62a90000 00001002 1024 1024 1024 1016 2 0 0 0 L

62b90000 00001002 1024 1024 1024 1016 2 0 0 0 L

62c90000 00001002 256 40 40 7 1 1 0 0 LFH

00770000 00001002 64 16 16 2 2 1 0 0 L

63820000 00001002 64 24 24 3 1 1 0 0 L

63830000 00001001 10240 10240 10240 160 21 0 0 bad

64230000 00001001 10240 10240 10240 160 21 0 0 bad

64c30000 00001001 10240 10240 10240 160 21 0 0 bad

65630000 00001001 10240 10240 10240 160 21 0 0 bad

66030000 00001001 10240 10240 10240 160 21 0 0 bad

66a30000 00001001 10240 10240 10240 160 21 0 0 bad

67430000 00001001 10240 10240 10240 160 21 0 0 bad

68130000 00001001 10240 10240 10240 160 21 0 0 bad

68b30000 00001001 10240 10240 10240 160 21 0 0 bad

69530000 00001001 10240 10240 10240 160 21 0 0 bad

69f30000 00001001 10240 10240 10240 160 21 0 0 bad

6a930000 00001001 10240 10240 10240 160 21 0 0 bad

6b330000 00001001 10240 10240 10240 160 21 0 0 bad

6bd30000 00001001 10240 10240 10240 160 21 0 0 bad

6c730000 00001001 10240 10240 10240 160 21 0 0 bad

6d130000 00001001 10240 10240 10240 160 21 0 0 bad

6db30000 00001001 10240 10240 10240 160 21 0 0 bad

6e530000 00001001 10240 10240 10240 160 21 0 0 bad

6ef30000 00001001 10240 10240 10240 160 21 0 0 bad

6f930000 00001001 10240 10240 10240 160 21 0 0 bad

70330000 00001001 10240 10240 10240 160 21 0 0 bad

70d30000 00001001 10240 10240 10240 160 21 0 0 bad

7a160000 00001001 10240 10240 10240 160 21 0 0 bad

7ab60000 00001001 10240 10240 10240 160 21 0 0 bad

7b560000 00001001 10240 10240 10240 160 21 0 0 bad

7d0d0000 00001001 10240 10240 10240 160 21 0 0 bad

7e030000 00001001 10240 10240 10240 160 21 0 0 bad

7ea30000 00001001 10240 10240 10240 160 21 0 0 bad

67f90000 00001003 256 16 16 14 1 1 0 bad

71850000 00001003 256 4 4 2 1 1 0 bad

71890000 00001003 256 4 4 2 1 1 0 bad

67fd0000 00001002 64 16 16 4 1 1 0 0 L

718d0000 00001003 256 40 40 3 1 1 0 bad

71910000 00001003 256 4 4 2 1 1 0 bad

71950000 00001003 256 4 4 2 1 1 0 bad

71990000 00001003 256 4 4 2 1 1 0 bad

67ff0000 00001002 64 16 16 4 1 1 0 0 L

719d0000 00001003 1792 1352 1352 5 2 1 0 bad

71a10000 00001003 256 4 4 2 1 1 0 bad

71a50000 00001003 256 4 4 2 1 1 0 bad

71a90000 00001002 64 16 16 1 0 1 0 0 L

—————————————————————————–

If you look at the above out put you can clearly identify a pattern. There are multiple created and each of them is 10 MB. But how to identify who actually created them?

Step 7:

Let us pickup one of the heap which is 10 MB and display all the entries (allocations) with in this 10 MB heap using !heap with –h parameter

Heap I have picked is 63830000.

0:028> !heap -h 63830000

Index Address Name Debugging options enabled

19: 63830000

Segment at 63830000 to 64230000 (00a00000 bytes committed)

Flags: 00001001

ForceFlags: 00000001

Granularity: 8 bytes

Segment Reserve: 00100000

Segment Commit: 00002000

DeCommit Block Thres: 00000200

DeCommit Total Thres: 00002000

Total Free Size: 00005048

Max. Allocation Size: 7ffdefff

Lock Variable at: 00000000

Next TagIndex: 0000

Maximum TagIndex: 0000

Tag Entries: 00000000

PsuedoTag Entries: 00000000

Virtual Alloc List: 63830050

UCR FreeList: 63830588

FreeList Usage: 00000000 00000000 00000000 00000000

FreeList[ 00 ] at 63830178: 6422de88 . 638ad7e0 Unable to read nt!_HEAP_FREE_ENTRY structure at 638ad7e0

(1 block )

Heap entries for Segment00 in Heap 63830000

63830608: 00608 . 00040 [01] – busy (40)

63830648: 00040 . 02808 [01] – busy (2800)

641b6698: 02808 . 02808 [01] – busy (2800)

……………………………………

Step 8: (Let us pickup one of the heap entry (allocation) and try to identify what is in it)

0:028> db 641b6698

641b6698 01 05 01 05 93 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

641b66a8 6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20 ng the heap for

641b66b8 64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b demo at MSSQLWIK

641b66c8 49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00 I.COM………..

641b66d8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

641b66e8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

641b66f8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

641b6708 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

0:028> db 63830648

63830648 01 05 08 00 89 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

63830658 6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20 ng the heap for

63830668 64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b demo at MSSQLWIK

63830678 49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00 I.COM………..

63830688 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

63830698 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306a8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306b8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

Similarly you can dump multiple heap allocations to identify a pattern.

Now if you look at the memory dumped you see a string which might help you to identify the DLL which created the heap. There is a pattern in above heaps. All the heap allocations have below string

“Iam filing the heap for demo at MSSQLWIKI.COM”

Note : You can use L Size to dump more memory using db or dc command’s example db 63830648 L1500

Step 9:

Let us open the DLL which we loaded in SQL Server for testing using notepad and see if there is string which matches the pattern

Yes there is which proves that this DLL’s has caused the leak. In real time you may have to play with different heap allocations to identify the pattern.

This is one way to find the leaks from the memory dump after the leak has actually happened. It may not be always easy to find a pattern and identify the modules who allocated the memory, In such scenarios you may have to track the leak using the tools like debug diagnostic tool, UMDH etc.In the my next blog I will post how to track memory leak using Debug diagnostics tool.

Continued in Debugging memory Leaks using Debug diagnostic tool

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, Memory, Performance, SQL General, SQL Server Engine | Tagged: memory leak, Memory leaks in SQL Server, MTL erros in SQL Server, sqlserver memory, tracking memory leaks in SQL Server | 46 Comments »

SQL Server Query optimization

Posted by Karthick P.K on November 6, 2012

SQL Server Query optimization (or) Tuning slow queries in SQL Server.

How to troubleshoot (or) tune slow queries in SQL Server, Optimize slow queries to run faster , resolve error sql server -2147217871 Query timeout expired and make them run faster?

A query in considered to be slow when it is executing for longer duration than expected. Total duration of the query can be broken in to compile time, CPU time and Wait time.

Before you start troubleshooting the query which is running for longer duration, Identify if the query is slow because it is long waiting (or) Long running (or) Long compiling.

Compile time:Time taken to compile the query. compile time can be identified by looking at the

1. CompileTime=”n” in XML plan

2. SQL Server parse and compile time when Set statistics time on is enabled.

CPU time: Time taken by the query in CPU (Execution time – (compile time+ wait time). CPU time can be identified by looking at the

1. CPU column in profiler.

2. CPU time under SQL Server Execution Times when statistics time on is enabled.

Execution time: Time taken by the query for complete execution ( Execution time = CPU time (CPU time for compilation+execution) +Wait time). Total duration of the query can be identified by using the

1.Duration column in profiler

2. SQL Server Execution Times, elapsed times when statistics time on is enabled.

What is long waiting query?

A query is considered to be long waiting query, when it spend most of its time waiting for some resource.

How to identify if the query is long waiting?

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set .

When a query is waiting for a resource (such as lock, network I/O, Page_I/O Etc) it will not consume CPU. So if you see duration being higher than CPU (Difference between Duration and CPU is wait time),It indicates that the query has spent large amount of time waiting for some resource.

Let us see an example of long waiting query. I have collected profiler trace while executing the query.

set statistics io on

set statistics time on

–Place your query here

select top 10000 * from a

set statistics io off

set statistics time off

Look at the Duration and CPU column in the profiler Cpu=256 and duration =1920. So this query has spent majority of time waiting for some resource.

Look at the output of statistics time and statistics I/O in above image.

SQL Server has spent only 2 milliseconds compiling the query and 256 milliseconds on CPU, but the overall duration was 1920 milliseconds so the query has spent maximum time waiting for some resource.

Identify the resource in which this query is waiting on using one of the steps listed below.

1. Look at the wait type column of the sysprocesses for the spid which is executing query while the query is executing.

2. If there is no other activity on the server collect sys.dm_os_wait_stats output before and after the query execution and identify the wait (Will not help in tuning queries running for short duration)

3. Collect XEvent to gather the wait stats of individual query.

Once you identify the resource in which the query is waiting on tune the resource. Most of the times queries would be slow waiting for below resource.

PAGEIOLATCH_* or Write log: This indicates I/O resource bottleneck follow the detailed troubleshooting steps mentioned in This Link to fix the I/O bottleneck. If you find SQL Server spawning excessive I/O Create necessary indexes.

a. Logical reads + Physical reads in statistics I/O output (Refer above image) or Reads and writes in profiler will indicate the I/O posted by this query. If you see very high reads for query compared with the result rest retuned by query it is an indication of missing indexes or bad plan. Create necessary indexes (You can use DTA for index recommendations.).

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO.

a.This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid The wait_resource is the page number (the format is dbid:file:pageno) that is being accessed.

b. We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine whether contention is for allocation, data or text.

c. If the pages that SQL Server is most frequently waiting on are in Tempdb database ,check the wait resource column for a page number in dbid 2 Ex(2:1:1 or 1:1:2). Enable TF 1118 and increase the number of TEMPDB data files and size them equally (You may be facing tempdb llocation latch contention mentioned in http://support.microsoft.com/kb/328551)

d. If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table. In this case we need to choose a different clustered index key to spread the work across different pages.

LATCH_*: Non-buf latch waits can be caused by variety of things. We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101).

a. A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

b. Auto Grow and auto shrink while query is executed.

c. Queries going for excessive parallelism.

Blocking (LCK*): Use the query in This Link to identify the blocking. Tune the head blocker.

Asynch_network_io (or) network IO: Keep the result set returned by the query smaller. Follow detailed troubleshooting refer This Link

Resource_semaphore waits: Make sure there is no memory pressure on the server Follow steps in This Link for detailed troubleshooting.

SQL Trace: Stop all the profiler traces running on the server. Identify the traces which are running on the server using the query in This Link

Cx packet: Set the Max degree of parallelism. But remember Cxpacket wait type is not always a problem.

a. For servers that have eight or less processors, use the following configuration where N equals the number of processors: max degree of parallelism = 0 to N .

b. For servers that use more than eight processors, use the following configuration: max degree of parallelism = 8.Refer This Link

SOS_SCHEDULER_YIELD : Identify if there is CPU bottleneck on the server. This waiting means that the thread is waiting for CPU.

a. SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) Will ( is expected to) yield back to SQL Server scheduler when it exceeds 4ms and before it yields back it check if there are any other runnable threads, If there is any runnable threads then the thread which is in top of runnable list is scheduled and current thread will go to the tail of the runnable list and will get rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum. The time thread spends in runnable list waiting for its quantum is accounted as SOS_SCHEDULER_YIELD. You will see this type when multiple threads are waiting to get CPU cycle. Follow trouble shooting the steps mentioned This Link

Important: In SQL Server instances when there more than 1 CPU it is possible that the CPU is higher than the duration. Because CPU is sum of time spend by query in all the CPU’s when choosing a parallel whereas the duration is actual duration of the query.

What is long running query?

A query is considered to be long running query, when it spend most of its time on CPU and not waiting for some resource.

How to identify if the query is long running ?

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set . If the CPU and duration is close than the query is considered to be long running. If the query is long running identify where the query spend the time ,It could be for compiling or post compilation (For executing the query). compare the duration of the query with CompileTime (XML plan compile time (or) SQL Server parse and compile time when statistics time is on refer above image).

High Compile time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on).Compile time will normally be in few millisecond . Follow the below steps if you see high compile time

1. Identify if you have large token perm refer http://support.microsoft.com/kb/927396

2. Create necessary indexes and stats. Tune the query manually (or) in DTA and apply the recommendation

3. Reduce the complexity of query. Query which joins multiple tables (or) having large number of IN clause can taking a while to compile.

4. You can reduce the compile’s by using force parameterization option.

High CPU time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on). If the compile time is very low compared to the duration. Then follow the below steps.

1. Update the stats of tables and indexes used by the query (If the stats are up to date Estimated rows and estimated execution will be approximately same in execution plan .If there is huge difference stats are out dated and requires update) .

2. Identify if the query has used bad plan because of parameter sniffing (If the ParameterCompiledValue and ParameterRuntimeValue is different in XML plan). Refer THIS LINK to know more about Parameter Sniffing

3. If updating the stats and fixing the parameter sniffing doesn’t resolve the issue it is more likely optimizer is not able to create efficient plan because of lack of indexes and correct statistics. Run the query which is driving the CPU in database tuning advisor and apply the recommendations. (You will find missing index detail in xml plan but DTA is more efficient).

4. If the query which is running longer and consuming CPU is linked server query try changing the security of linked server to ensure linked server user has ddl_admin or dba/sysadmin on the remote server. More details regarding the issue in THIS LINK.

5. Ensure optimizer is not aborting early and creating bad plan. For details refer THIS LINK

6. Ensure the query which is spiking the CPU doesn’t have plan guides (xml plan will have PlanGuideDB attribute. Also sys.plan_guides will have entries) and query hints(index= or (option XXX join) or inner (Join Hint) join).

7. Ensure that SET options are not changed.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Performance, SQL General, SQL Query | Tagged: query optimization, query performance tuning, Query tuning, query tuning in sql server, sql performance, sql query optimizer, sql query tuning, sql server -2147217871, sql server query tuning, Tuning sql server query | 11 Comments »

SQL Server 2012 Memory

Posted by Karthick P.K on October 21, 2012

SQL Server 2012 has made many changes to the memory manager to govern the SQL Server memory consumption in efficient way compared with earlier versions. Important changes to SQL Server 2012 memory which every DBA should be aware of is documented in this blog. If you are not familiar with the SQL Server memory architecture of earlier versions I would recommend reading THIS ARTICLE before you continue with changes in Denali memory manager.

Max Server Memory

In previous versions of SQL Server “Max Server Memory” controlled the Maximum physical memory Single page allocator (BPOOL) can consume in SQL Server user address space.

Only the single page allocator was part of BPOOL and Max server memory controlled only BPOOL, so the following allocations came outside BPOOL (Max server memory)

1.Multi-Page allocations from SQL Server [These are allocations which request more > 8 KB and required contiguous memory]

2.CLR allocations [These include the SQL CLR heaps and its global allocations created during startup]

3.Memory used for thread stacks within SQL Server process (Max worker threads * thread stack size). Thread stack size is 512K in 32 bit SQL Server, 904 K in WOW mode and 2 MB in 64-Bit

4.Direct windows allocations made by Non-SQL Server dll’s ([These include windows heap usage and direct virtual allocations made by modules loaded into SQL Server process. Examples: allocations from extended stored procedure dll’s, objects created using OLE Automation procedures (sp_OA calls), allocations from linked server providers loaded in sqlserver process)

SQL Server 2012 memory manager has now clubbed single page allocator and multipage allocator together as any-size page allocator . As a result, the any-size page allocator now manages allocations categorized in the past as single page and Multi-Page allocations.

1. "max server memory" now controls and includes “Multi pages allocations”.

2. In earlier versions of SQL Server CLR allocated memory was outside BPOOL (Max server memory) . SQL Server 2012 includes SQL CLR allocated memory in "max server memory".

SQL Server 2012 "max server memory" configuration does not include only the following allocations:

1. Memory allocations for thread stacks within SQL Server process

2. Memory allocation requests made directly to Windows [Ex: Allocations (Heap,Virtualalloc calls ) from 3^rd party Dll’s loaded in SQL Server process , objects created using OLE Automation procedures (sp_oa) etc]

These changes allow DBA’s to configure and control SQL Server more accurately in accordance with the memory requirements and using resource governor.

-g startup parameter

We used the -g startup option to change the default value of a region in SQL Server user address space known as "Memory-To-Reserve". This region was also known as "memory-to-leave or MTL. The "Memory-To-Reserve" (or) -g configuration option are relevant only for a 32-bit instance of SQL Server.

Multi pages allocation and CLR was part of Mem-to-reserve (-g) in In previous SQL Server versions until SQL Server 2008 R2 , From Denali they are part of BPOOL (Controlled by Max server memory) So you may have to remove –g if you have set it to give space for multipage allocator or CLR in earlier versions and migrating to Denali now.

AWE feature removed from SQL Server 2012

AWE feature was used in earlier versions of 32-Bit SQL Server to address more than 4GB of memory . This feature is now removed in Denali refer:"AWE deprecation". So if you need more memory then you may need to migrate to 64-Bit SQL server.

Locked pages in memory

Trace flag 845 is no more required to Lock Pages in memory. As long as the startup account of SQL Server has “Lock pages in memory” privilege Datacenter, Enterprise, standard and Business intelligence edition will use AWE allocator Api’s for memory allocations in BPOOL and this allocations will be locked.

Dynamic virtual address space management

In earlier versions of SQL Server 32-Bit we reserved Bpool at the startup and remaining addresses are left for MTL (Memory to reserve or Memory to leave) . In Denali virtual address space management is dynamic (we don’t reserve at startup) , So it is possible for 3^rd part components to use more memory than what is configured in –g parameter.

SQLCLR loaded at startup

In earlier SQL Server versions, Common language runtime (CLR) functionality is initialized inside SQL Server process when the first SQL CLR procedure or function is invoked. SQL Server 2012 performs SQL CLR initialization at startup. The initialization is independent of the ‘clr enabled’ configuration option.

You will notice the following messages in the SQL Server error log during server startup:

2012-10-18 15:23:13.250 spid8s Starting up database ‘master’.

2012-10-18 15:23:13.930 Server CLR version v4.0.30319 loaded.

Total Physical memory and memory model used

Total physical memory available on the server and the memory model used is logged in SQL Server error log

2012-10-18 15:23:06.690 Server Detected 131067 MB of RAM. This is an informational message; no user action is required.

2012-10-18 15:23:06.700 Server Using locked pages in the memory manager

2012-10-22 15:32:20.450 Server Detected 131067 MB of RAM. This is an informational message; no user action is required.
2012-10-22 15:32:20.450 Server Using conventional memory in the memory manager.

DMV and Performance counter changes

In earlier version of SQL Server most of the DMV’s used single_pages_kb and multi_pages_kb to refer allocations by SQL Server with in BPOOL and outside BPOOL. Now they are represented together as pages_kb. More details in THIS link

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Memory, SQL General, SQL Server Engine, SQL Server memory | Tagged: Changes in SQL server memory, denali memory model, mssqlwiki.com, sql server 2012 changes in memory, sql server 2012 memory architecture, sql server 2012 memory management, SQL Server 2102 memory changes, sql server maximum server memory, SQL Server memory, SQLServer memory architecture changes | 19 Comments »

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

Posted by Karthick P.K on October 16, 2012

I have got few request’s from SQL Server DBA’s in past to blog about analyzing SQL Server exceptions and assertions . After seeing lot of DBA’s getting stuck when they get EXCEPTION_ACCESS_VIOLATION (or) Assertion in SQL ServersI decided to write this blog.

This blog is published with intention to make DBA’s analyze and resolve EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion before contacting Microsoft support. Exception and assertion are two different things. SQL handles both assertions and exceptions by writing the current thread’s stack to the Error log and generating a dump. In simple An exception is an event that occurs during the execution of a program, and requires the execution of code outside the normal flow of control and assertion is the check that the programmer inserted into the code to make sure that some condition is true, If it returns false an assert is raised. SQL handles both assertions and exceptions by writing the current thread’s stack to the Error log and generating a dump, so trouble shooting steps are similar.

You will find messages similar to one below in SQL Serve error logs when you get Exception or EXCEPTION_ACCESS_VIOLATION .

{

Error

External dump process returned no errors.
Using ‘dbghelp.dll’ version ’4.0.5′
SqlDumpExceptionHandler: Process 510 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
* *******************************************************************************
* BEGIN STACK DUMP:
* Exception Address = 000000007752485C Module(ntdll+000000000002285C)

* Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION

* Access Violation occurred reading address 0000041EA9AE2EF0

* Input Buffer 510 bytes –

ex_terminator – Last chance exception handling

}

You will find messages similar to one below in SQL Server error logs when you get an Assertion.

{

Error

spid323 Error: 17065, Severity: 16, State: 1.

spid323 SQL Server Assertion: File: < .cpp>, line = 2576 Failed Assertion = ‘fFalse’ This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted

SQL Server Assertion: File: < .cpp>, line=2040 Failed Assertion =

}

To analyze the dump download and Install Windows Debugger from This Link

Step 1 (Load the memory dump file to debugger):

Open Windbg . Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

Step 2 (Set the symbol path to Microsoft symbols server):

on command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3 (Load the symbols from Microsoft symbols server):

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4 (check if symbols are loaded):

Verify if symbols are loaded for SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr

start end module name

00000000`01000000 00000000`03679000 sqlservr T (pdb symbols) c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb

Loaded symbol image file: sqlservr.exe

Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

Image name: sqlservr.exe

Timestamp: Wed Oct 07 21:15:52 2009 (4ACD6778)

CheckSum: 025FEB5E

ImageSize: 02679000

File version: 2005.90.4266.0

Product version: 9.0.4266.0

File flags: 0 (Mask 3F)

File OS: 40000 NT Base

File type: 1.0 App

File date: 00000000.00000000

Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4

Step 5 (Switch to exception context):

Type .ecxr

Step 6(Get the stack of thread which caused exception or assertion):

Type kC 1000 //You will get the stack of thread which raised exception or assertion .

I have pasted one of the sample stack below, from the exception dump which I worked recently. First thing to identify from stack is who is raising the exception. In the below stack look at the portion which is highlighted in red (In each frame before the ! symbol), that is the module which raised the exception (Exe or DLL name ).

If Exe/DLL name is Non Microsoft module (Exe or DLL name ) then the exception is being caused by a third party component, you will need to work with the company that provided that component to get a solution. lmvm Exe/DLL name will give you the company name. For example: lmvm wininet

If Exe/DLL name is SQLServr (or) any other SQL Server modules then the exception is raised by SQL Server, In that case type kC 1000 and paste the stack in comments session of this blog (or) When you start thread in MSDN forums (or) In This face book group. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

Note: When you get Assertion make sure you post message line which contains SQL Server Assertion: File: <Filename.cpp>, line = 2576 Failed Assertion = ”

0:000> kC 1000

Call Site

wininet!InternetFreeThreadInfo+0x26

wininet!InternetDestroyThreadInfo+0x40

wininet!DllMain_wininet+0xb5

wininet!__DllMainCRTStartup+0xdb

ntdll!LdrShutdownThread+0x155

ntdll!RtlExitUserThread+0x38

msvcr80!_endthreadex+0x27

msvcr80!_callthreadstartex+0x1e

msvcr80!_threadstartex+0x84

kernel32!BaseThreadInitThunk+0xd

ntdll!RtlUserThreadStart+0x1d

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Debugging, SQL General, SQL Server Engine, Startup failures | Tagged: access of violation at address, Access Violation occurred reading address, Access Violation occurred writing address, BEGIN STACK DUMP:, exception access violation c0000005, Exception Code = c0000005 EXCEPTION_ACCESS_VIOLATION, ex_terminator - Last chance exception handling, Faulting application name: sqlservr.exe, generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION, SQL Server Assertion, SQL Server Assertion: File:, SQL Server EXCEPTION_ACCESS_VIOLATION, SQL Server is terminating because of fatal exception, SQL Server is terminating this process, SqlDumpExceptionHandler, SQLServer EXCEPTION ACCESS VIOLATION | 254 Comments »

SQL Server Parameter sniffing

Posted by Karthick P.K on October 8, 2012

When a stored procedure, prepared queries and queries submitted via sp_executesql is compiled for the first time, the values of the parameters supplied with the execution call are used for cardinality estimation, to optimize the statements within that stored procedure and create the query plan. This is known as parameter sniffing because the optimizer sniffs the current parameter value during compilation.

If these values are typical and the data distribution is even in the underlying tables, all the calls to the stored procedure will benefit from this query plan since the plan is reused. However, parameter sniffing can cause problems if the "sniffed" parameter value is not typical of the values which are actually used during a typical execution or the data in underlying tables are very skewed, because plan generated for “sniffed” parameter value may not be optimal for current parameter passed and since the plan is reused there can be performance degradation.

Consider the following scenario we have a table with two columns (country and some column ). This table has 10001 rows. 10000 rows has USA in country column and 1 row has brazil in country column.

This table has NONCLUSTERED INDEX called NC on country column.

create table data(country char(10),somecolumn char(10))

insert into data values (‘BRAZIL’,‘somedata’)

insert into data values (‘USA’,‘somedata’)

go 10000

CREATE NONCLUSTERED INDEX [NC] ON [dbo].[data]

(

[country] ASC

)WITH (STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

create proc sniffing @p1 char(10)

begin

select country,somecolumn from data where country=@p1

end

–Let us execute stored procedure sniffing with the with parameter brazil.

exec sniffing ‘BRAZIL’

Optimizer picked up Index-seek in Non-clustered index and Row-ID lookup on table.

What happens when we execute the same procedure with parameter ‘USA’. Since the plan is already created and cached for ‘BRAZIL’ it is reused and plan which is generated for BRAZIL is Not an optimal plan for parameter USA.

exec sniffing ‘USA’

How to identify if the optimizer is using plan which compiled for sniffed parameter values and not the current parameters value.

Let us enable statistics xml on

set statistics xml on

exec sniffing ‘USA’

Look at the XML plan for the ParameterCompiledValue and ParameterRuntimeValue.

Below is extract from XML plan and this output proves that the plan is compiled for parameter BRAZIL (ParameterCompiledValue) and it is used for parameter USA (ParameterRuntimeValue)

{

<ColumnReferenceColumn="@p1"ParameterCompiledValue="‘BRAZIL ‘"ParameterRuntimeValue="‘USA ‘" />

}

We will also see a huge difference in estimated and actual rows count if the parameter sniffing is impacting the plan

(Remember out dated stats can also cause optimizer to estimate incorrect rows so difference in estimate and actual rows doesn’t mean it is because of parameter sniffing ).

What would have been the optimal plan if the parameter ‘USA’?

Let us execute the same procedure with recompile option

exec sniffing ‘USA’ with recompile

How to fix Parameter sniffing?

1. USE RECOMPILE: when you create the stored procedure. so the parameter is compiled every time it is called. This method can be used if the compile time is very less compared to execution time of bad plan

Ex: create proc sniffing @p1 char(10) with recompile

2. OPTION (RECOMPILE): for the statement which impacted by the parameter sniffing. If the procedure has multiple statements recompile will impact only the particular statement.

3. OPTIMIZE FOR HINT: Instructs the query optimizer to use a particular value for a local variable when the query is compiled and optimized (or) OPTIMIZE FOR UNKNOWN WHICH Instructs the query optimizer to use statistical data

instead of the initial values for all local variables when the query is compiled and optimized. This value is used only during query optimization and actual values are used during execution.

{

alter proc sniffing @p1 char(10) as
begin
select country,somecolumn from data where country=@p1
option (optimize for (@p1 = ‘USA’))

–option (optimize for (@p1 unknown))
end
}

4. Assign the incoming parameter values to local variables and use the local variables in the query. If you are in SQL Server2000 in which we don’t have OPTIMIZE FOR hint.

Ken Henderson has blogged about it in http://blogs.msdn.com/b/khen1234/archive/2005/06/02/424228.aspx

5. Trace Flag 4136 which is introduced in SQL Server 2008 R2 Cumulative Update 2, SQL Server 2008 SP1 Cumulative Update 7 and SQL Server 2005 SP3 Cumulative Update 9 introduce trace flag 4136 that can be used to disable the "parameter sniffing" process more details on http://support.microsoft.com/kb/980653

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Optimizer, Performance, SQL General, SQL Server Engine | Tagged: how to troubleshoot parameter sniffing, mssqlwiki, OPTIMIZE FOR UNKNOWN, OPTION (RECOMPILE), Parameter sniffing, parameter sniffing in stored procedures, parameter sniffing performance issue, parameter sniffing sql server 2008, ParameterCompiledValue. how to troubleshoot parameter sniffing, ParameterRuntimeValue, query optimizer and parameter sniffing, reporting services parameter sniffing, sp_executesql parameter sniffing, SQL Server, sql server parameter sniffing 2008, sql server performance, ssrs parameter sniffing, what is parameter sniffing | 6 Comments »

Optimizer Timeout or Optimizer memory abort

Posted by Karthick P.K on October 7, 2012

Optimizer Timeout

When the query processor finds itself consuming a lot of time optimizing a query, it may decide to stop the optimization process abruptly, and choose the best available plan. This is to ensure that the optimizer doesn’t end up optimizing forever. This is called optimizer timeout (based on the number of plans considered relative to the cost of the best plan so far).

Optimizer memory abort

When queries become more complex number of potential plans to consider can quickly grow in thousands. Optimizer has limit for memory it is allowed to use , when the optimizer reaches the limit it ends with optimizer memory abort.

When timeout or memory abort happens optimizer might choose the best plan from plans which was generated till timeout or abort and it might be far from optimal plan so the query execution can take long time and consume resource.

On SQL 2000 and earlier the only way to detect this condition is compiling the query with trace flag 8675. If one of these conditions occur the output will reflect a timeout abort or memory abort, similar to the following:

End of simplification, time: 2.869 net: 2.869 total: 2.869 net: 2.869

end exploration, tasks: 200094 no total cost time: 16.17 net: 16.169 total: 19.04 net: 19.039

*** Optimizer time out abort at task 614400 ***

Msg 8623, Level 16, State 1, Line 3

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

End of simplification, time: 0.156491 elapsed: 0.156491

end exploration, tasks: 1614 no total cost time: 0.552436 elapsed: 0.708927

end search(0), cost: 1275.32 tasks: 3888 time: 0.195008 elapsed: 0.903935

end exploration, tasks: 7596 Cost = 1275.32 time: 0.548032 elapsed: 1.45197

end search(1), cost: 1263.15 tasks: 21985 time: 2.30564 elapsed: 3.75761

*** Optimizer memory usage abort ***

End of optimization, elapsed: 2.98304

From SQL server 2005 to determine whether the query optimizer times out or MemoryLimitExceeded search for the

StatementOptmEarlyAbortReason="TimeOut" (or) StatementOptmEarlyAbortReason="MemoryLimitExceeded" expression in the XML plan output.

We can avoid optimizer from timing out and picking bad plan by enabling trace flag –T8780. This increases the time limit before the timeout occurs.

Note: Don’t enable this trace flag at server level , enable it only for the session which runs the query and identify if the optimizer is picking up a better plan. If you see optimizer picking up the better plan, right approach is to tune the query manually or using DTA and apply the recommendations . You can use this trace flag till you apply the recommendations made by DTA.

If you experience “ Optimizer memory usage abort” use “SQLServer:Memory Manager\Optimizer Memory (KB)” counter to the amount used for compilation .

select * from sys.dm_os_memory_clerks where type=’MEMORYCLERK_SQLOPTIMIZER’ will tell us the overall memory used by optimizer.

We can also use the CompileMemory= expression in XML plan output starting from SQL server2005 SP2 which will give us the compile memory used by individual plans. If you find optimizer memory is very low then identify what is contributing to memory contention in SQL Server and tune it.

I will discuss compile memory in detail when I blog about Resource_semaphore wait types.

Note: You may also receive below error because of few known issues documented in KB articles 982376, 946020,926773,917888 so if none of the fixes resolve the issue you may have to follow the same steps documented above.

{

"Msg 8623, Level 16, State 1, Line 1

}

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Performance, SQL General, SQL Query, SQL Server Engine, SQL Server memory | Tagged: "The query processor ran out of internal resources and could not produce a query plan", "The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large numbe, contact Customer Support Services for more information.", Level 16, Line 3", Msg 8623, Optimizer memory usage abort, Optimizer time out abort, State: 1., StatementOptmEarlyAbortReason="MemoryLimitExceeded", StatementOptmEarlyAbortReason="TimeOut" | 24 Comments »

SQL Server Latch & Debugging latch time out

Posted by Karthick P.K on September 7, 2012

In a multithreaded process what would happens when a one thread updates a data or index page in memory while second thread is reading the same page?

What will happen when 1^st thread reads a data/index page in memory while 2^nd thread is freeing the same page from memory?

Answer: We would end up with data or data structure inconsistency. To avoid inconsistency SQL Server uses Synchronization Mechanisms like Locks,Latches and Spinlocks.

We will discuss few key points about latches and how to debug latch timeout dumps in this blog.

What is Latch ?

They control the concurrent access to data pages and structures by multiple threads. Latches provide physical data consistency of data pages and provide synchronization for data structures. Latches are not controllable by user like locks.

Types of the Latch:

Buffer (BUF) Latch

Used to synchronize access to BUF structures and their associated database pages.

Buffer “IO” Latch

A subset of BUF latches used when the BUF and associated data/index page is in the middle of an IO operation (Reading page from disk or writing page to disk).

Non-Buffer (Non-BUF) Latch

These are latches that are used to synchronize general in-memory data structures generally used by queries/tasks executed by parallel threads, auto grow operations , shrink operations etc.

Latch modes

Keep (KP) Latches

Used to ensure that the page is not released from memory while it is in use.

Shared (SH) Latches

Used for read-only access to data structures and prevent write access by others threads.

This mode allows shared access.

SH is compatible with KP, SH, and UP. It should be noted that although in general SH implies read-only access, it is not always the case. For buffer latches SH is the minimum mode required in order to read a data page.

Update (UP) Latches

Allows read access to the data structure(Compatible with SH and KP), but prevents other EX-latch access.

Used for write operations when torn page detection is off and when AWE is not enabled.

Exclusive (EX) Latches

Prevents any read activity from occurring on the latched structure. EX is only compatible with KP.

Used during read IO during write IO when torn page detection is on or AWE is enabled.

Destroy (DT) Latches

Used when removing BUFs from the buffer pool, either by adding them to the free list or unmapping AWE buffers.

Latch compatibility

	KP	SH	UP	EX	DT
KP	Y	Y	Y	Y	N
SH	Y	Y	Y	N	N
UP	Y	Y	N	N	N
EX	Y	N	N	N	N
DT	N	N	N	N	N

How do you identify Latch contention?

Latch contention can be identified using below wait types in sysprocesses.

PAGEIOLATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on a physical I/O of a buffer pool page to complete.

1. PAGEIOLATCH_* are commonly solved by tuning the queries which are performing heavy IO (Commonly by adding, changing and removing indexes (or) statistics to reduce the amount of physical IO).

2. Identifying if there is disk bottleneck and fixing them (Pageiolatch wait times (ex > 30 ms))

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO.

1. This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid. The wait_resource is the page number (the format is dbid:file:pageno)

that is being accessed.

2. We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine whether contention is for allocation, data or text.

3. If the pages that SQL Server is most frequently waiting on are in tempdb database ,check the wait resource column for a page number in dbid 2. You may be facing tempdb allocation latch contention mentioned in http://support.microsoft.com/kb/328551

4. If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table. In this case we need to choose a different

clustered index key to spread the work across different pages.

LATCH_*: Non-buf latch waits can be caused by variety of things. We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101).

1. A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

2. Auto Grow and auto shrink.

When a latch is requested by thread and If that latch cannot be granted immediately because of some other thread holding a incompatible latch on same page or data structure then the requestor must wait for the latch to be grantable. Warning messages like one below is printed in SQL Server error log and a mini dump with all the threads is captures if the wait interval reaches 5 minutes (waittime 300). The warning message differs for buffer and non-buffer latches.

844: Time out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Continuing to wait.

846: A time-out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit Id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Not continuing to wait.

847: Timeout occurred while waiting for latch: class ‘%ls’, id %p, type %d, Task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Continuing to wait.

Break up of above warning

type

The latch mode of the current latch acquire request. This is a numerical value with the following mapping: 0 – NL (not used); 1 – KP; 2 – SH; 3 – UP; 4 – EX; 5 – DT.

task

Task for which we are trying to acquire latch.

Waittime

The total time waited for this latch acquire request in seconds.

owning task

The address of the Task that owns the latch, if available.

bp (Buffer latches only)

The address of the BUF structure corresponding to this buffer latch.

page (Buffer latches only.)

The page id for the page currently contained in the BUF structure.

database id (Buffer latches only.)

The database id for the page in the BUF.

Like troubleshooting blocking issues in SQL Server when there is a latch contention or timeout dump identify the owner of latch and troubleshoot why the latch is held by the owner for long time.

When there is latch timeout dump you will see a warning message similar to one below. Warning error message printed in SQL server errorlog before the dump is very important to find the owner thread of latch.

{

2012-01-18 00:52:03.16 spid69 A time-out occurred while waiting for buffer latch — type 4, bp 00000000ECFDAA00, page 1:6088, stat 0x4c1010f, database id: 4, allocation unit Id: 72057594043367424, task 0x0000000006E096D8 : 0, waittime 300, flags 0x19,

owning task 0x0000000006E08328. Not continuing to wait.

spid21s **Dump thread – spid = 21, PSS = 0x0000000094622B60, EC = 0x0000000094622B70

spid21s ***Stack Dump being sent to E:\Data\Disk1\MSSQL.1\MSSQL\LOG\SQLDump0009.txt

spid21s * *******************************************************************************

spid21s * BEGIN STACK DUMP:

spid21s * 02/28/12 00:32:03 spid 21

spid21s * Latch timeout

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_COUNT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_VIRTUAL_ROOT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

}

From the error message above we can easily understand we are trying to acquire latch on database id: 4, page 1:6088 (6088 page of first file) and we timed out because task 0x0000000006E08328 (owning task 0x0000000006E08328 in warning message) is holding a latch on it.

Note: Task is simply a work request to be performed by the thread. (such as system tasks, login task, Ghost cleanup task etc.). Threads which execute the task will take required latches on need.

Let us see how to analyze latch timeout dump and get the owning thread of the Latch using the owning task 0x0000000006E08328.

To analyze the dump download and Install Windows Debugger from This link

Step 1:

Open Windbg . Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr
start             end                 module name
00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb
    Loaded symbol image file: sqlservr.exe
    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe
    Image name: sqlservr.exe
    Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)
    CheckSum:         025FEB5E
    ImageSize:        02679000
    File version:     2005.90.4266.0
    Product version: 9.0.4266.0
    File flags:       0 (Mask 3F)
    File OS:          40000 NT Base
    File type:        1.0 App
    File date:        00000000.00000000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Step 5:

Use the below command to search thread stack to identify the thread which has reference to the owning task and it will be the thread which is owning the latch. Replace 0X0000000006E08328 with owning task in your errorlog

~*e .echo ThreadId:; ?? @$tid; r? @$t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? @$t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; s -d @$t1 @$t2 0X0000000006E08328

ThreadId:
unsigned int 0x93c
ThreadId:
unsigned int 0x9a0
ThreadId:
unsigned int 0x9b4
00000000`091fdaf0 06e08328 00000000 00000000 00000000 (……………
00000000`091fdcb8 06e08328 00000000 091fdd70 00000000 (…….p…….
00000000`091fded0 06e08328 00000000 06e0e798 00000000 (……………
00000000`091fdf38 06e08328 00000000 00000002 00000000 (……………
00000000`091fec60 06e08328 00000000 0168883a 00000000 (…….:.h…..
00000000`091ff260 06e08328 00000000 000007d0 00000000 (……………
00000000`091ff2d0 06e08328 00000000 00000020 00000000 (……. …….
00000000`091ff5f8 06e08328 00000000 800306c0 00000000 (……………
00000000`091ff6c0 06e08328 00000000 00000000 00000000 (……………
00000000`091ff930 06e08328 00000000 00000000 00000001 (……………
00000000`091ff9b8 06e08328 00000000 00000000 00000000 (……………
00000000`091ffa38 06e08328 00000000 00000000 00000000 (……………
00000000`091ffc10 06e08328 00000000 03684080 00000000 (……..@h…..
00000000`091ffc90 06e08328 00000000 00000000 00000000 (……………
ThreadId:
unsigned int 0x9b8
ThreadId:
unsigned int 0x9bc
ThreadId:
unsigned int 0x9c0
……………
…………..

Step 6:

From the above out put we see thread 0x9b4 has reference to the pointer of owning task and it will be the thread which is owning the latch. Let us switch to the thread(0x9b4 ) which is executing the owning task and

then go through the stack to see why the thread is owning the latch for long time.

Step 7:

~~[0x9b4]s ==> Switching to the thread (Replace 0x9b4 with your thread id which has reference to the po
ntdll!ZwWaitForSingleObject+0xa:
00000000`77ef047a c3 ret

Step 8:

0:002> kC ==> Print the stack
Call Site
ntdll!ZwWaitForSingleObject
kernel32!WaitForSingleObjectEx
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::Suspend
sqlservr!SOS_Event::Wait
sqlservr!BPool::FlushCache
sqlservr!checkpoint2
sqlservr!alloca_probe
sqlservr!ProcessCheckpointRequest
sqlservr!CheckpointLoop
sqlservr!ckptproc
sqlservr!SOS_Task::Param::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!endthreadex
msvcr80!endthreadex

From the above stack we can understand that the thread which is owning the latch is executing checkpoint and flushing cache (Dirty buffers) to disk. If flushing buffers to disk (checkpoint) is taking a long time, then obviously there is disk bottleneck.

Similarly for any other latch time out issues first identify the owner thread of latch, read the stack of owner thread to understand the task performed by owner thread and troubleshoot the performance of task performed by owner thread.

If you want to see the stack of thread which is waiting, then pickup the task (task 0x0000000006E096D8 )from latch timeout warning message in errorlog instead of owning task (task 0x0000000006E08328) and use the command mentioned in step 5.

I hope this post will help you to learn and debug the latch timeout issues.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, SQL General, SQL Server Engine, SQL Server I/O | Tagged: A time-out occurred while waiting for buffer latch, ACCESS_METHODS_HOBT_COUNT, ACCESS_METHODS_HOBT_VIRTUAL_ROOT, Latch timeout, Latch timeout sqlserver, latch timeout stack dump, Latches SQL Server, SQL Server Latch timeout, SQL Server Latches, Timeout occurred while waiting for latch | 22 Comments »

Database Mail errors in SQL Server (Troubleshooting steps)

Posted by Karthick P.K on August 25, 2012

Troubleshooting Database Mail issues in SQL Server

Use the Database Mail Configuration Wizard, change the Logging Level to Verbose and send a test mail to investigate the point of failure.

Right click database mail –View database mail log to see error or we can SELECT * FROM msdb.dbo.sysmail_event_log ;

Check the sent_Status column in the sysmail_allitems table. The four values are sent, unsent, retrying and failed.

If the status is sent and the recipients hasn’t received the email yet, that the Database Mail external program successfully delivered the e-mail message to the SMTP server but it failed to deliver the message to the final recipient. At this point, the SMTP needs to be troubleshooted (perhaps engaged your Exchange or Mail server team)

If the status is unsent or retrying, it means that the Database Mail has not yet processed the e-mail message or is in the process of retrying after a failed attempt. This could be due to network conditions, volume of messages, SMTP server issues, etc. If the problem persists, use another profile or another mail host database.

If the status is failed, it means that the Database Mail was unable to deliver the message to the SMTP server. Check the sysmail_log table and the destination address. Also be sure that there are no Network or SMTP issues.

Send a test email outside SQL Server using below script or Other mail clients and check if the recipients are receiving mails. If they do not receive problem is outside SQL Server. Engage Exchange or other mail server teams to identify why we are not able to send emails from below script or Office outlook or Other mail clients.

Set objMessage = CreateObject("CDO.Message")
 objMessage.Subject = "Hello"
 objMessage.From = """SENDER NAME""<e-mail ID>"
 objMessage.To = "To address@mssqlwiki.com"
 objMessage.HTMLBody = "<h1><font face=arial>Hello,<br>How are you?."
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "smtphost.dns.Mailserver.com"
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 2
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpusessl") = False
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpconnectiontimeout") = 60
objMessage.Configuration.Fields.Update
objMessage.Send

If the mail has successfully reached to recipients from above script problem is with in SQL Server mail configuration.
Verify the following

1.Verify if Service Broker is enabled (select is_broker_enabled from sys.databases where name=‘MSDB’ (0 – disabled, 1- enabled).

To enable service broker on your database run the following query: ALTER DATABASE MSDB SET ENABLE_BROKER

Note: You will be required to have exclusive access to the database while running this statement. If you do not you will get the following error message:
Msg 5061, Level 16, State 1, Line 1. ALTER DATABASE failed because a lock could not be placed on database MSDB. Try again later.

Msg 5069, Level 16, State 1, Line 1

ALTER DATABASE statement failed.

You will have to stop SQL Server agent to enable broker on MSDB

2.Check if Database mail stored procedures are enabled (Surface Area Configuration >> “Surface Area Configuration for Features” >> Under MSSQLSERVER, expand Database Engine, and then click Database Mail. >> Ensure that Enable Database Mail stored procedures is selected, and then click Apply).

3.Check if the user is part of DatabaseMailUserRole.

4.Check what parameters and values are used in configuration by running

exec msdb..sysmail_help_configure_sp

A list of default values are given in BOL, topic: “sysmail_help_configure_sp (Transact-SQL)”. To modify a parameter or value you can use the following stored procedure

exec msdb..sysmail_configure_sp ‘parameter_name’, ‘parameter_value’

Check if ReadFromConfigurationFile is enabled if yes check if the DatabaseMail90.exe.config file (The default path is < drive >\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn) and has proper parameters.

5.Verify that the Database Mail executable is located in the correct directory – e.g. C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn

6.Verify that the service account for SQL Server has permission to run the executable, DatabaseMail90.exe, which requires network access to the SMTP servers specified in Database Mail accounts. Therefore, the service account for SQL Server must have permission to access the network, and the SMTP servers must allow connections from the computer that runs SQL Server.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Configuration, Database mail, SQL General | Tagged: Database mail, SQL Server Mail | 5 Comments »

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

Posted by Karthick P.K on August 21, 2012

Do you see below errors in SQL error along with dumps and stuck?

Non-yielding IOCP Listener

* BEGIN STACK DUMP:
* 05/06/12 03:54:59 spid 0
* Non-yielding IOCP Listener

Non-yielding Scheduler
* BEGIN STACK DUMP:
* 04/16/12 10:09:58 spid 6256
* Non-yielding Scheduler

Non-yielding Resource Monitor

* BEGIN STACK DUMP

* 01/22/09 19:11:16 spid 0

* Non-yielding Resource Monitor

External dump process returned no errors.
Date Time Server Process 0:0:0 (0x31e8) Worker 0x000000016F41d140 appears to be non-yielding on Scheduler 4. Thread creation time: 12010668087858. Approx Thread CPU Used: kernel 2 ms, user 60516 ms. Process Utilization 11%. System Idle 83%. Interval: 71227 ms.

Refer “How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps” for analyzing the Non-yielding Scheduler, Non-yielding IOCP Listener and Non-yielding Resource Monitor Dumps.

If you are interested in just finding a quick resolution follow the below steps to get the Non-Yield stack from the dump and check if it is matching with any existing known issues in SQL Server.

To analyze the dump download and Install Windows Debugger from This link

Step 1:

Open Windbg

step 2:

Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 3:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 4:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 5:

Type .cxr sqlservr!g_copiedStackInfo+0X20 for SQL Server2005 and SQL Server2008/2008R2 (or) .cxr sqlmin!g_copiedStackInfo+0X20 for SQL Server2012.

Type kc 100 and look at the stack to see if it matches with the stack of any of known issues in SQL Server listed below.

If kc 100 doesn’t display any stack and throws “WARNING: Frame IP not in any known module. Following frames may be wrong” type .cxr to reset to default scope and try .cxr sqlservr!g_copiedStackInfo+0X00c (In 32-Bit (X86) SQL server valid offset for context is 0X00c Look at This blog to see how we identified the offset)

Note:If your stack doesn’t match with any of the stack listed below then paste the stack in comments session of this blog (or) In This face book group.We will try to find the cause for you. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

Stack 1

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!ExecuteSql

sqlservr!CSpecProc::ExecuteSpecial

sqlservr!CXProc::Execute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB :2344600:FIX: "Non-yielding Scheduler" error may occur when you use the CONTAINSTABLE function together with many OR and AND predicates in SQL Server 2008 or in SQL Server 2008 R2

Stack 2

sqlservr!TMatchPattern

sqlservr!FMatchStrTxt

sqlservr!I8CharindexStrBhI8

sqlservr!CEs::GeneralEval4

sqlservr!CXStmtCond::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands 0x12a

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

2633357 FIX: "Non-yielding Scheduler" error might occur when you run a query that uses the CHARINDEX function in SQL Server 2008 R2

Stack 3

sqlservr!CItvlVal::Copy

sqlservr!CConstraintItvl::PcnstrItvlUnion

sqlservr!CConstraintProp::FBuildItvlFromOr

sqlservr!CConstraintProp::FBuildItvlFromPexpr

sqlservr!CConstraintProp::FAndItvlConstraint

sqlservr!CConstraintProp::AndNewConstraint

sqlservr!CConstraintProp::PcnstrDeriveSelect

sqlservr!CLogOp_Select::PcnstrDerive

sqlservr!CLogOpArg::PcnstrDeriveHandler

sqlservr!CLogOpArg::DeriveGroupProperties

sqlservr!COpArg::DeriveNormalizedGroupProperties

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!COptExpr::DeriveGroupProperties 0xc6

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

KB: 982376 FIX: A non-yielding scheduler error or an error 8623 occurs when you run a query that contains a large IN clause in SQL Server 2005,SQL Server 2008, or SQL Server 2008 R2

Stack 4

sqlservr!COptExpr::AdjustParallelPlan

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtSelect::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB: 943060 FIX: A query that has many outer joins takes a long time to compile in SQL Server 2005

Stack 5

sqlservr!CXid::GetBlockingTask

sqlservr!SNode::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchAndResolve

sqlservr!DeadlockMonitor::SearchTaskAndResolve

sqlservr!DeadlockMonitor::WorkLoop

sqlservr!lockMonitor

sqlservr!lockMonitorThread

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

956854 Cumulative update package 10 for SQL Server 2005 Service Pack 2

Stack 6

ntdll!ZwQueryAttributesFile

ntdll!RtlDoesFileExists_UstrEx

ntdll!LdrpSearchPath

ntdll!LdrpCheckForLoadedDll

ntdll!LdrpLoadDll

ntdll!LdrLoadDll

kernel32!LoadLibraryExW

mswsock!SockLoadHelperDll

mswsock!SockGetTdiName

mswsock!SockSocket

mswsock!WSPSocket

ws2_32!WSASocketW

ws2_32!WSASocketA

sqlservr!CreateSocket

sqlservr!AcceptObject::AsyncAccept

sqlservr!Tcp::AcceptDone

sqlservr!SNIAcceptDoneWithReturnCode

sqlservr!SNIAcceptDoneWrapper

sqlservr!SNIAcceptDoneRouter

sqlservr!SOS_Node::ListenOnIOCompletionPort

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 2711549 FIX: An error message is logged when you start SQL Server 2008 R2 or when a client sends a request to SQL Server 2008 R2

Stack 7

ntdll!ZwOpenKey

advapi32!LocalBaseRegOpenKey

advapi32!RegOpenKeyExW

sqlservr!COledbConnect::GetProviderOptions

sqlservr!COledbConnect::SetClsidFromProvider

sqlservr!COledbConnect::Init

sqlservr!CStmtExecProc::XretRemoteExec

sqlservr!CRemoteProcExecLevel::Execute

sqlservr!CStmtExecProc::XretWrapRemoteExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExec::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

KB2468047 FIX: Error code 17883 or "Non-yielding Scheduler" error may occur when you use the OPENQUERY function on SQL Server 2005

Stack 8

ntdll!ZwQueryVirtualMemory

psapi!QueryWorkingSetEx

sqlservr!BPool::Shrink

sqlservr!BPool::ReleaseAwayBufs

sqlservr!BPool::LazyWriter

sqlservr!lazywriter

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

967908 Cumulative update package 13 for SQL Server 2005 Service Pack 2 or 970279 Cumulative update package 4 for SQL Server 2005 Service Pack 3

Stack 9

sqlservr!LatchBase::ReleaseInternal

sqlservr!XVB::GetRecord

sqlservr!RowsetVersionScan::GetData

sqlservr!CQScanRowsetNew::GetRowWithPrefetch

sqlservr!CQScanRowsetNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::InitForExecute

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CExecuteStatement::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecStr::XretExecStrExecute

sqlservr!CXStmtExecStr::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB : 949595 FIX: Error message when you run a query that uses a join condition in SQL Server 2005: "Non-yielding Scheduler"

Stack 10

sqlservr!SQLServerLogIter::LookupScanCache

sqlservr!SQLServerLogIterForward::GetNextBlock

sqlservr!SQLServerLogIterForward::GetNext

sqlservr!LsMgr::GetEndOfLog

sqlservr!LsMgr::ProcessInternalRollForward

sqlservr!LsWorkRequest::Execute

sqlservr!LsWorker::ThreadRoutine

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 970044 FIX: Error message when you try to break database mirroring between two servers that are running SQL Server 2008: "Non-yielding Scheduler"

Stack 11

sqlservr!CLinkedMap

sqlservr!CCheckReadersAndWriters::Release

sqlservr!CMainIlb::~CMainIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 12

ntdll!ZwFreeVirtualMemory

KERNELBASE!VirtualFree

sqlservr!MemoryNode::VirtualFree

sqlservr!ReservedMemBlock::FreeMemory

sqlservr!MultiPageAllocator::FreePagesInternal

sqlservr!MultiPageAllocator::FreePages

sqlservr!MemoryNode::FreePagesInternal

sqlservr!MemoryClerkInternal::FreePagesInline

sqlservr!CVarPageMgr::Release

sqlservr!CMemObj::Free

sqlservr!CMemThread<CMemObj>::Free

sqlservr!LockBytesSS::~LockBytesSS

sqlservr!LockBytesHolder::`scalar deleting destructor’

sqlservr!LockBytesHolder::DestroyCallback

sqlservr!CacheLbss

sqlservr!LockBytesSS::Release

sqlservr!CQueryIlb::~CQueryIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 13

sqlservr!CompareStringWEnglishNoCase

sqlservr!CTypeInfo::ICompW

sqlservr!CDefaultCollation::ICompW

sqlservr!CDependElem::ICompare

sqlservr!CDependList::Find

sqlservr!CDependList::Insert

sqlservr!CDependList::Concat

sqlservr!CDependList::CollectDependencies

sqlservr!FillSysdepends

sqlservr!CProchdr::CreateProc

sqlservr!CSQLSource::PerformPphFakeExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2306162 FIX: Poor performance and some occasional non-yielding scheduler errors occur when you create a complex view that references a large amount of nested views or tables in SQL Server 2008 or in SQL Server 2008 R2

Stack 14

sqlservr!BaseSharedHoBt::GetHoBtId

sqlservr!HoBtFactory::GetDeferredDropCacheHobt

sqlservr!DropDeferredWorkTables

sqlservr!GhostRecordCleanupTask

sqlservr!CGhostCleanupTask::ProcessTskPkt

sqlservr!TaskReqPktTimer::ExecuteTask

sqlservr!OnDemandTaskContext::ProcessTskPkt

sqlservr!SystemTaskContext::ExecuteFunc

sqlservr!SystemTaskEntryPoint

sqlservr!OnDemandTaskContext::FuncEntryPoint

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SchedulerManager::FiberEntryPoint

kernel32!BaseFiberStart

kernel32!RtlCompareMemoryStub

KB 2505256 FIX: Poor performance when worktables that are marked for deferred drop are cleaned up in SQL Server 2008 R2

Stack 15

ntdll!ZwReadFile

kernel32!ReadFile

sqlservr!DiskReadAsync

sqlservr!FCB::AsyncRead

sqlservr!BackupIoRequest::StartDatabaseRead

sqlservr!BackupCopyMachine::CopyFileToBackupSet0

sqlservr!BackupCopyMachine::CopyFileToBackupSet

KB 960543 FIX: SQL Server 2005 or SQL Server 2008 may stop responding when you are performing a backup

Stack 16

sqlservr!Worker::ProfilingCPUTicks::ProfilingCpuTicksCallback

sqlservr!SOS_Scheduler::TaskTransition

sqlservr!SOS_Scheduler::Switch

sqlservr!SOS_Scheduler::SuspendNonPreemptive

sqlservr!SOS_Scheduler::Suspend

sqlservr!SOS_Task::Sleep

sqlservr!BTreeMgr::Seek

sqlservr!BTreeMgr::GetHPageIdWithKey

sqlservr!IndexPageManager::GetPageWithKey

sqlservr!GetRowForKeyValue

sqlservr!IndexRowScanner::EstablishInitialKeyOrderPosition

sqlservr!IndexDataSetSession::GetNextRowValuesInternal

sqlservr!RowsetNewSS::GetNextRows

sqlservr!CMEDScan::FGetRow

sqlservr!CMEDCatalogOwner::GetOwnerAliasIdFromSid

sqlservr!CMEDCatalogOwner::LookupPrimaryIdInCatalog

sqlservr!CMEDCacheEntryFactory::GetProxiedCacheEntryByAltKey

sqlservr!CMEDCatalogOwner::GetProxyOwnerBySID

sqlservr!CMEDProxyDatabase::GetOwnerBySID

sqlservr!GetDefaultSchemaIdCrossDb

sqlservr!GetCtxtSchemaId

sqlservr!CMEDAccess::GetMultiNameObject

sqlservr!CRangeObject::CImplName::FSameObject

sqlservr!CRangeObject::FCheckImplNames

sqlservr!CRangeObject::XretPostSchemaChecks

sqlservr!CRangeObject::XretSchemaChanged

sqlservr!CRangeTable::XretSchemaChanged

sqlservr!CEnvCollection::XretSchemaChanged

sqlservr!CXStmtQuery::XretSchemaChanged

sqlservr!CXStmtSelect::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

KB 2699013 FIX: SQL Server 2008 R2 or SQL Server 2008 stops responding and a "Non-yielding Scheduler" error is logged

Stack 17

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

KB 967169 FIX: When you run an UPDATE statement against a table that has a FOR UPDATE trigger that joins the DELETED and INSERTED tables, the query takes a long time to finish

Stack 18

msvcr80!memcpy

BackupString::vswcatf

BackupString::swcatf

BackupHistory::GenerateBackupDetails

sqlservr!BackupHistory::GenerateBackupSet

KB 917971 FIX: You may receive more than 100,000 page faults when you try to back up a SQL Server 2005 database that contains hundreds of files and file groups.

Stack 20

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

Stack 21

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

SQL Server Latch & Debugging latch time out

How to Analyze "Deadlocked Schedulers" Dumps

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group to get answers for all your SQL Server related questions.

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Debugging, Performance, SQL General, SQL Server Tools | Tagged: 17883, 17884, 17887, 17888, Configuration, Debugging, error 17883 non yielding, External dump process returned no errors, External dump process returned no errors.DoMiniDump () encountered error, Interval:, IO Completion Listener () Worker appears to be non-yielding on Node . Approx CPU Used: kernel ms, memory dump, non-yielding, Non-yielding IOCP Listener, non-yielding resource monitor, Non-yielding Scheduler, Process 0:0:0 ( ) Worker appears to be non-yielding on Scheduler, Process :0:0 () Worker appears to be non-yielding on Scheduler n. Thread creation time: . Approx Thread CPU Used: kernel 0 ms, SQL General, sql scheduler, SQL Server 2005, SQL Server 2008, SQL Server Engine. Non-yielding Scheduler, SQLServer dump, SQLServer mdmp, Stack Dump, user 0 ms. Process Utilization 0%. System Idle %. Interval: ms., user ms | 161 Comments »

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

Posted by Karthick P.K on August 17, 2012

Note: If you are interested only in finding a quick resolution for Non-Yielding scheduler or Non-yielding IOCP Listener dumps or Non-yielding resource monitor Jump to THIS LINK. Continue reading this article if you like to learn How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps

How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps?

This blog is targeted towards analyzing Non-Yielding scheduler dumps and not targeted on explaining how Non-Yield Detection works please read http://technet.microsoft.com/en-us/library/cc917684.aspx to understand how the Non-Yield Detection works but let us recollect few key points before we get in to analysis.

1. SQL Server has its own logical schedulers to schedule the SQL Server workers.

2. The scheduler is called the User Mode Scheduler (UMS) in SQL Server 2000 and the SQL Server Operating System (SOS) Scheduler in SQL Server 2005

3. Logical scheduler makes the worker non-preemptive to the database engine. The worker owns the scheduler until it yields to another worker on the same scheduler.

What if the threads which owns the scheduler executes for long time without yielding (or) forever and does not yield to give quantum for the other threads waiting in the scheduler?

Answer: Other threads would not get CPU cycles and starve the SQL Server performance.

What if the thread is not able to finish its work with in quantum –(4 Milliseconds) for example large for loop?

SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) is expected to yield back to SQL Server scheduler when it exceeds 4ms and rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum.

What if the thread did not yield after 4 Milliseconds?

SQL Server has its scheduler monitor to track this. SchedulerMonitor algorithm is to check non-Yield condition every 5 seconds during which the basic check (Check if the thread is executing for >4Ms) is done . When the basic check evaluates to true, tracking of the worker begins and if the thread doesn’t yield beyond 10seconds (Nonyield threshold) after the tracking begins then threshold check becomes true. So there is approximately 15 seconds between the time of the last yield on the scheduler and the time that the threshold check becomes true and tracking continues.

A dump is taken when an specific nonyield situation has reached 60 seconds in total duration. Once a 17883 mini-dump is captured, no further 17883 mini-dumps are captured until trace flag -T1262 is enabled or the SQL Server process is restarted. However, 17883 error message reporting continues, regardless of the mini-dump capture. Also when –T1262 is enabled mini-dump is captured when the Non-Yield threshold check becomes true (15 seconds)

and at subsequent 60-second intervals for the same nonyield occurrence. A new nonyielding occurrence causes dump captures to occur again.

When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure also we can compare the current stack of the thread with the one which is copied and check if the thread is progressing.

Non-yielding IOCP Listener

An identical algorithm is used to detect non-yielding I/O completion routines, counting completed I/O completion routines instead of number of yields. Scheduler Monitor takes a dump when it notices the IOCP has not moved for 10 seconds. Analyzing Non-yielding IOCP Listener is also same as analyzing non-yielding scheduler dump

Let us step in to analysis of non-yielding scheduler dump which I got in SQL Server 2012

Sample 1

When a non-yielding scheduler dump is generated following error message is logged in SQL Error log and SQLDump000n.mdmp is generated in log folder.

{

* *******************************************************************************

* BEGIN STACK DUMP:

* 04/16/12 10:09:58 spid 6256

* Non-yielding Scheduler

* *******************************************************************************

Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0. Thread creation time: 12979065797278. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 70110 ms.

}

To analyze the dump download and Install Windows Debugger from This link

Step 1:

Open Windbg . Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Microsoft (R) Windows Debugger Version 6.11.0001.404 X86

Loading Dump File [C:\Users\karthick \Desktop\Karthick\SQLDump0009.mdmp]

User Mini Dump File: Only registers, stack and portions of memory are available

Comment: ‘Stack Trace’

Comment: ‘Non-yielding Scheduler’ èType of the dump

Symbol search path is: *** Invalid ***

Executable search path is:

Windows 7 Version 7601 (Service Pack 1) MP (24 procs) Free x64

Product: Server, suite: Enterprise TerminalServer SingleUserTS à Windows version and system information

Machine Name:

Debug session time: Mon Apr 16 09:09:59.000 2012 (GMT-7)

System Uptime: 9 days 15:57:03.155

Process Uptime: 0 days 0:06:48.000

……………………………………………………….

……………………………..

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for SQL Server by using the debugger command lmvm

0:146> lmvm sqlservr

start end module name

00000000`ffad0000 00000000`ffb0e000 sqlservr T (pdb symbols) c:\websymbols\sqlservr.pdb\21553ADC31784A4D933974A386EE2E052\sqlservr.pdb

Loaded symbol image file: sqlservr.exe

Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.S1\MSSQL\Binn\sqlservr.exe

Image name: sqlservr.exe

Timestamp: Fri Apr 06 08:19:38 2012 (4F7F098A)

CheckSum: 00036498

ImageSize: 0003E000

File version: 2011.110.2316.0

Product version: 11.0.2316.0 èSQL Server Version

File flags: 0 (Mask 3F)

File OS: 40000 NT Base

File type: 1.0 App

File date: 00000000.00000000

Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4

Step 5:

Use !findstack command to find scheduler monitor thread (sqlservr!SQL_SOSNonYieldSchedulerCallback )

0:146> !findstack sqlservr!SQL_SOSNonYieldSchedulerCallback

Thread 006, 1 frame(s) match è Thread ID of scheduler monitor.

* 07 00000000336be420 000007fee36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

Step 6:

Switch to scheduler monitor thread using ~[threadID]s command

0:146> ~[006]s

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3 ret

Step 7:

Use kC or kP command to look at the stack on scheduler monitor thread.

0:006> kP Child-SP RetAddr Call Site

00000000`3369c218 000007fe`fcd210ac ntdll!NtWaitForSingleObject+0xa

00000000`3369c220 00000000`ffaeecce KERNELBASE!WaitForSingleObjectEx+0x79

00000000`3369c2c0 00000000`ffaef1a4 sqlservr!CDmpDump::DumpInternal+0x20e

00000000`3369c360 000007fe`dbe50794 sqlservr!CDmpDump::Dump+0x24

00000000`3369c3a0 000007fe`dbe511e6 sqllang!SQLDumperLibraryInvoke+0x2e4

00000000`3369c640 000007fe`dbe16ddb sqllang!CImageHelper::DoMiniDump+0x426

00000000`3369c830 00000000`ffae307f sqllang!stackTrace+0xbdb

00000000`3369e270 000007fe`e36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

00000000`336be430 000007fe`e36866da sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0x375

00000000`336bebf0 000007fe`e364b53f sqldk!SchedulerMonitor::CheckScheduler+0x307

00000000`336bed60 000007fe`e364aa8f sqldk!SchedulerMonitor::CheckSchedulers+0x211

00000000`336bf1f0 000007fe`e371c779 sqldk!SchedulerMonitor::Run+0xfb

00000000`336bf320 000007fe`e3642f10 sqldk!SchedulerMonitor::EntryPoint+0x9

00000000`336bf350 000007fe`e3642d04 sqldk!SOS_Task::Param::Execute+0x21e

00000000`336bf950 000007fe`e36429e6 sqldk!SOS_Scheduler::RunTask+0xa8

00000000`336bf9c0 000007fe`e365a29f sqldk!SOS_Scheduler::ProcessTasks+0x299

00000000`336bfa40 000007fe`e365a3b0 sqldk!SchedulerManager::WorkerEntryPoint+0x261

00000000`336bfae0 000007fe`e3659fcf sqldk!SystemThread::RunWorker+0x8f

00000000`336bfb10 000007fe`e365aaf8 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8

00000000`336bfbc0 00000000`76ad652d sqldk!SchedulerManager::ThreadEntryPoint+0x236

Step 8:

Switch to the thread which is reported as Non-Yielding in SQL Server error log using ~~[ThreadID]s command.

Recollect the below error in SQL error log. Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0.

0:006> ~~[0x1cb0]s

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3 ret

Step 9:

Look at the current stack of Non-yielding thread. using kc command

0:146> kc 10

Call Site

ntdll!NtWaitForSingleObject

KERNELBASE!WaitForSingleObjectEx

sqldk!SOS_Scheduler::SwitchContext

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqldk!WorkDispatcher::DequeueTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

Recollect the information which we discussed earlier in this blog. When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure also we can compare the current stack of the thread with the one which is copied and check if the thread is progressing.

Look at the above stack it cannot be Non-Yield thread because we see SwitchNonPreemptive and SwitchContex in the thread.

{

SwitchPreemptive or SuspendNonPreemptive forces another worker to become owner of the scheduler. It does this by making the head of the runnable list the new owner and removing the current worker from logical scheduler control. The worker transitions ownership and is removed from SQL scheduler control until the external activity is complete. When the external activity is complete, the worker returns to the end of the runnable list by calling SwitchNonPreemptive.

}

Step 10:

Search for the copied stack structure using X commad

0:146> X sqlmin!*copiedStack*

000007fe`df11bfe0 sqlmin!g_copiedStackInfo = <no type information>

It is sqlmin!g_copiedStackInfo in this dump because this is dump is from SQL Server2012. In earlier versions of SQL Server it is sqlservr!g_copiedStackInfo

Step 11:

We know copied CONTEXT is stored in g_CopiedStackInfo how to find the valid offset in this structure ? If the CONTEXT is valid Rip,Rbp and RSP registers has to be valid if the dump is from X64 system and Eip,Ebp and Esp has to be valid if it is X86 system.

Let us do dd on sqlmin!g_copiedStackInfo (remember it is sqlservr!g_copiedStackInfo in SQL2008/2005/2000)

0:146> dd sqlmin!g_copiedStackInfo

000007fe`df11bfe0 00000001 00000000 3369e2e0 00000000

000007fe`df11bff0 0000a998 00000000 00000000 00000000

000007fe`df11c000 00000000 00000000 00000000 00000000

000007fe`df11c010 00000000 00000000 00000000 00000000

000007fe`df11c020 00000000 00000000 00000000 00000000

000007fe`df11c030 0010000b 00001f80 00000033 00000000

000007fe`df11c040 002b0000 00000246 00000000 00000000

000007fe`df11c050 00000000 00000000 00000000 00000000

Step 12:

Let us dump each address with context and verify if Rip,Rbp and RSP registers are valid. This dump is from 64-bit SQL Server so we are using Rip,Rbp and RSP registers. If the dump is from x86 system use Eip,Ebp and Esp .

0:146> dt 000007fe`df11bfe0 CONTEXT Rip Rsp Rbp èType cast 000007fe`df11bfe0 with CONTEXT . RsP,Rbp and Rip is invalid so this is not valid offset.

ole32!CONTEXT

+0x098 Rsp : 2

+0x0a0 Rbp : 0x5a4d

+0x0f8 Rip : 0

0:146> dt 000007fe`df11bff0 CONTEXT Rip Rsp Rbp è Type cast 000007fe`df11bff0 with CONTEXT . RIP is invalid

ole32!CONTEXT

+0x098 Rsp : 0x72120000

+0x0a0 Rbp : 0x3369e3cc

+0x0f8 Rip : 0xf2

0:146> dt 000007fe`df11c000 CONTEXT Rip Rsp Rbp è Type cast 000007fe`df11c000 with CONTEXT . RIP ,RSP and Rbp is valid. So this should be a valid context.Let us switch to this context and verify

ole32!CONTEXT

+0x098 Rsp : 0x3369e2e0

+0x0a0 Rbp : 0x3369e498

+0x0f8 Rip : 0x76d3139a

Now we know 000007fe`df11c000 is valid context. So 000007fe`df11c000 -sqlmin!g_copiedStackInfo =0x20 hence we can use .cxr sqlmin!g_copiedStackInfo+0X20 directly to switch to the context of copied stack.

Step 13:

Switch the context of copied stack using .cxr 000007fe`df11c000 or .cxr sqlmin!g_copiedStackInfo+0X20

0:146> .cxr 000007fe`df11c000

rax=0000000000000002 rbx=000000003369e3cc rcx=0000000000005a4d

rdx=0000000072120000 rsi=000000000000026c rdi=0000000000000000

rip=0000000076d3139a rsp=000000003369e2e0 rbp=000000003369e498

r8=00000000000000b0 r9=0000000084a85310 r10=0000000000000000

r11=0000000000000000 r12=0000000000000000 r13=0000000000000004

r14=00000000000000f2 r15=0000000000000001

iopl=0 nv up ei pl zr na po nc

cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246

ntdll!NtWriteFile+0xa:

00000000`76d3139a c3 ret

Step 14:

Dump the stack of copied context using kP or kc (kc displays clean stack trace. each display line includes only the module name and the function name)

0:146> Kc

*** Stack trace for last set context – .thread/.cxr resets it

Call Site

ntdll!NtWriteFile

KERNELBASE!WriteFile

kernel32!WriteFileImplementation

sqllang!CErrorReportingManager::WriteToErrLog

sqllang!CErrorReportingManager::SendErrorToErrLog

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqllang!ReportLoginFailure

sqllang!FRedoLogin

sqllang!login

sqllang!process_login_finish

sqllang!process_commands

sqldk!SOS_Task::Param::Execute

sqldk!SOS_Scheduler::RunTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

Now compare the current stack and the copied stack to see if the thread has progressed after No-Yield condition. Stack look completely different So the Non-Yield thread has progressed and completed .It is doing new work now. Also to understand why the thread was Non-Yielding look at the copied stack and not the current unless both the stacks are same.

Current thread stack which we dumped using the thread ID in SQL Errorlog. 0:146> kc	Copied thread stack which SQL Server copied to global structure before generating the dump. 0:146> Kc 10
ntdll!NtWaitForSingleObject	KERNELBASE!WriteFile
KERNELBASE!WaitForSingleObjectEx	kernel32!WriteFileImplementation
sqldk!SOS_Scheduler::SwitchContext	sqllang!CErrorReportingManager::WriteToErrLog
sqldk!SOS_Scheduler::SuspendNonPreemptive	sqllang!CErrorReportingManager::SendErrorToErrLog
sqldk!WorkDispatcher::DequeueTask	sqllang!CErrorReportingManager::CwchFormatAndPrint
sqldk!SOS_Scheduler::ProcessTasks	sqllang!ReportLoginFailure
sqldk!SchedulerManager::WorkerEntryPoint	sqllang!FRedoLogin
sqldk!SystemThread::RunWorker	sqllang!login
sqldk!SystemThreadDispatcher::ProcessWorker	sqllang!process_login_finish
sqldk!SchedulerManager::ThreadEntryPoint	sqllang!process_commands
kernel32!BaseThreadInitThunk	sqldk!SOS_Task::Param::Execute
ntdll!RtlUserThreadStart	sqldk!SOS_Scheduler::RunTask
	sqldk!SOS_Scheduler::ProcessTasks
	sqldk!SchedulerManager::WorkerEntryPoint
	sqldk!SystemThread::RunWorker
	sqldk!SystemThreadDispatcher::ProcessWorker
	sqldk!SchedulerManager::ThreadEntryPoint

Now let us read the copied stack and understand what would have caused a Non-Yield condition (read from bottom to top)

ntdll!NtWriteFile -> WriteFile function is at top of the stack and did not complete in expected time.
KERNELBASE!WriteFile
kernel32!WriteFileImplementation
sqllang!CErrorReportingManager::WriteToErrLog ->Write the error to errorlog
sqllang!CErrorReportingManager::SendErrorToErrLog ->Send the error to SQL Server errorlog
sqllang!CErrorReportingManager::CwchFormatAndPrint ->format the error
sqllang!ReportLoginFailure ->Login failed
sqllang!FRedoLogin
sqllang!login ->Login task is processed

From the above stack we are able to understand SQL Server is writing login failed information to SQL Error log (Synchronously) and the writefile function has taken long time and did not return.So there is Non-Yield scheduler dump.

When will writefile operation take long time?

When there is Disk bottleneck. So the obvious solution for this issue is to fix the performance of the disk.

Similarly there could be numerous other reasons for Non-Yield condition so look at the stack of your Non-Yield scheduler dump using the method above and make out what could have caused the Non-Yield condition.

Also refer THIS LINK to check if your stack matches with any of the known issues in SQL Server.

To Be continued…………………………

How to Analyze "Deadlocked Schedulers" Dumps?

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

SQL Server generated Access Violation dumps while accessing oracle linked servers.

SQL Server Latch & Debugging latch time out

If you liked this post, do like us on Face Book at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Configuration, Debugging, Performance, SQL General, SQL Server Engine | Tagged: 17883, 17884, 17887, 17888, Debugging, error 17883 non yielding, External dump process returned no errors, External dump process returned no errors.DoMiniDump () encountered error, memory dump, non-yielding, Non-yielding IOCP Listener, Non-yielding Scheduler, Process 0:0:0 ( ) Worker appears to be non-yielding on Scheduler, sql scheduler, SQL Server 2005, SQL Server 2008, SQLServer dump, SQLServer mdmp, Stack Dump | 28 Comments »

SQL Server performance degraded in 32-Bit SQL Server after adding additional RAM.

Posted by Karthick P.K on May 18, 2012

Do you know that adding additional RAM can affect the performance of SQL Server Sometimes?

I am not going to write how Optimizer can some times choose suboptimal plans when we have large amount of memory on the system but We will see how the memory which can be used by other memory clerks (aks: stolen memory) can shrink when we have large physical memory and AWE enabled.

If you notice performance of 32-Bit SQL Server degraded after you added additional RAM or if you see SQL Server memory errors like ones below after adding RAM then it could be because of Large BUF structures which reduced the size of Bpool.

Errors:

SQL Server 2005/2008

Buffer Pool errors:

BPool::Map: no remappable address found.

Either BPool or MemToLeave errors:

Error: 17803 “Insufficient memory available..”

Buffer Distribution: Stolen=7901 Free=0 Procedures=1 Inram=201842 Dirty=0 Kept=572…

Extract from SQL Server memory design

{

SQL Server "User address space" is broken into two regions: MemToLeave and Buffer Pool

Size of MemToLeave (MTL) and Buffer Pool (BPool) is determined by SQL Server during start up as below.

MTL (Memory to Leave)= (Stack size * max worker threads) + Additional space to load Dll’s.

Stack size =512 KB per thread for 32 Bit SQL Server (904K under WOW)

I.e. = (256 *512 KB) + 256MB =384MB

Additional space to load Dll’s= 256 MB from SQLServer2000. This space is used to store COM objects, Extended stored procedure, Linked server in SQL Server process

Note: Additional space to load Dll’s can be modified using -g startup parameter.

on any machine with less than 4 processors the Maximum worker Thread’s is always 256 by default (unless we change the value using SP_configure)

SQL Server Buffer Pool is minimum of “Physical RAM “ or “user mode memory(2GB or 3GB) – MTL- BUF structures”

BPool = Minimum (Physical memory, User address space – MTL) – BUF structures

}

When AWE is enabled in 32-Bit SQL Server M_pbuf (part of BUF structures) which is mentioned earlier is calculated and allocated for entire physical memory on the system . Regardless of “MAX Server Memory” This is to adjust Max server memory without restarting SQL Server.

SQL Server requires 8MB to create M_pbuf for every 1GB of RAM available on the server.

Machine with 64 GB RAM can consume 64 (RAM) *8MB (M_pbuf for each GB) =512 MB just for the BUF array alone.

So the amount of BPOOL available for SQL Server is adversely affected.

Going back to the previous formula for BPOOL. Size of Bpool for 32-Bit SQL Server with AWE enabled and 64 GB of RAM would be.

BPool = Minimum (Physical memory, User address space – MTL) – BUF structures

BPool= Minimum (64GB, (2GB-384MB)) – BUF structures (512+ MB)

Bpool would approximately become 1GB. Since size BPOOL become very small we might end up with memory errors.

Note: In 32-Bit SQL Server Only data pages an index pages can be placed in AWE memory. So the memory available for other SQL Server memory objects is still limited to BPOOL and MTL.

How to resolve this issue?

Remove few GB of RAM from server J if you can convince your management that removing RAM will improve performance.

(Or)

There is a startup trace flag TF 836 which you can use to indicate that BUF’s need to be allocated only for the configured max server memory setting. Enable this Trace Flag (836) and Reduce the “MAX Server Memory” of SQL Server.

(Or)

Enable /3GB. This will increase the Size of SQL Server BPOOL by 1GB providing relief to SQL Server BPOOL pressure.

Note: When the physical RAM in the system exceeds 16 GB and the /3GB switch is used, the operating system will ignore the additional RAM until the /3GB switch is removed.

Trouble shooting working set trim “A significant part of SQL Server process memory has been paged out”

SQL Server lock pages in memory should I use it?

SQL Server memory leak

What is new in SQL Server 2012 Memory

How to set max server memory and min server memory

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Memory, Performance, SQL General, SQL Server Engine | Tagged: Adding RAM SQLServer, BPool::Map: no remappable address found., Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE, LazyWriter: warning, no free buffers found, sql server performance, SQL Server performance degraded after I added additional RAM.SQL Server performance degraded in 32-Bit SQL Server after I added additional RAM., There is insufficient system memory to run this query. | 6 Comments »

Copy database wizard or replication setup might fail due to broken dependency

Posted by Karthick P.K on May 4, 2012

Copy database wizard would fail with below error while creating views or user defined functions if the dependency lists of objects is broken.

Error:

failed with the following error: “Invalid object name ‘dbo. .”. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.

helpFile= helpContext=0 idofInterfaceWithError={8BDFE893-E9D8-4D23-9739-DA807BCDC2AC}

StackTrace: at Microsoft.SqlServer.Management.Dts.DtsTransferProvider.ExecuteTransfer()

at Microsoft.SqlServer.Management.Smo.Transfer.TransferData()

at Microsoft.SqlServer.Dts.Tasks.TransferObjectsTask.TransferObjectsTask.TransferDatabasesUsingSMOTransfer()

Replication setup would also fail while applying the scripts if the dependency lists of objects is broken.

Error:

The schema script ‘XXX_4.sch’ could not be propagated to the subscriber. (Source: MSSQL_REPL, Error number: MSSQL_REPL-2147201001)

Get help: http://help/MSSQL_REPL-2147201001

Unable to replicate a view or function because the referenced objects or columns are not present on the Subscriber. (Source: MSSQL_REPL, Error number: MSSQL_REPL20164)

Get help: http://help/MSSQL_REPL20164

Invalid object name ‘. (Source: MSSQLServer, Error number: 208)

Get help: http://help/208

--Below script will fix the broken dependencies on all the objects
----------------------------------------------------------------------------
--List of objects for which referenced objects are missing.
--ex: View created on table XYZ and table XYZ is dropped
----------------------------------------------------------------------------
SELECT OBJECT_NAME (referencing_id),referenced_database_name, referenced_schema_name, referenced_entity_name
FROM sys.sql_expression_dependencies
WHERE referenced_entity_name not in (select name from sysobjects)

create table #t_excluded_modules (module_name sysname)
go

create table #t_modules_refreshed_in_end (module_name sysname)

go

------------------------------------------------------------------------------
--
-- get the list of modules whose dependencies have to be refreshed
--
-- Comment:
-- in the list we're not considering procedures or triggers because
-- because they can be created in any order, which means they can be refreshed
-- in any order
--
------------------------------------------------------------------------------
create table #t_user_views_or_tables (module_id int)
insert into #t_user_views_or_tables(module_id)
		select object_id from sys.objects where
			type in ('V', 'FN', 'IF', 'TF')
			and name not like 'MSMerge%'
			and is_ms_shipped <> 1
			and name not in (select * from #t_modules_refreshed_in_end)
			and name not in (select * from #t_excluded_modules)

insert into #t_user_views_or_tables(module_id)
		select object_id from sys.objects where
			name in (select * from #t_modules_refreshed_in_end)

----------------------------------------
--
-- get the dependency table
-- |---------------------------------|
-- |  referencing_id | referenced_id |
-- | ----------------|---------------|
-- |      XXX        |     XXX       |
-- | ----------------|---------------|
--
----------------------------------------
Declare @module int, @message varchar(1000), @str nvarchar(1000)
create table #t_dependency_table (referencing_id int, referenced_id int)
DECLARE modules_cursor CURSOR FOR SELECT module_id FROM #t_user_views_or_tables
open modules_cursor
fetch next from modules_cursor into @module

IF @@FETCH_STATUS <> 0
	PRINT '            <<None>>	No module to refresh'

while @@FETCH_STATUS = 0
	begin
		select @str = quotename(schema_name(objectproperty(@module, 'schemaid'))) + '.' + quotename(object_name(@module))
		select @message = '            trying to refresh ' + @str
		print @message
		exec sys.sp_refreshsqlmodule @str
		select @message = '            ' + @str + ' was refreshed'
		print @message

		insert into #t_dependency_table (referencing_id, referenced_id)
			select distinct object_id as referencing_id, referenced_major_id as referenced_id
				from sys.sql_dependencies
				where object_id <> referenced_major_id      -- to avoid self recursion for functions
					  and object_id = @module

		fetch next from modules_cursor into @module
	end

close modules_cursor
deallocate modules_cursor

-------------------------------------------------------------------------------------------
--
-- get the the bottom of the dependency list i.e. independent modules
-- i.e.
-- get the list of referenced_ids in the dependency table which
-- don't occur in the referencing_ids column
--
-- Comment:
-- if there are circular dependencies then the few modules which form a circular dependency
-- would be ignored in the independent modules list
--
-------------------------------------------------------------------------------------------
create table #t_independent_modules (modules int)

insert into #t_independent_modules (modules)
	select #t_dependency_table.referenced_id from
			#t_dependency_table left outer join #t_dependency_table t2
			on #t_dependency_table.referenced_id = t2.referencing_id
	where
			t2.referencing_id is NULL

-------------------------------------------------------------------------------------------
--
-- build the ordered list of dependencies starting with the independent modules
-- in the beginning first few rows, ones dependent on it in the following rows and so on...
--
-- there can be tricky cases of dependencies such as
-- V1 -> V2 -> V3
--  |           ^
--  +-----------+
--
-- in the above example the refresh order would be V3, V1, V2, V1.
-- note that V1 is being refreshed twice, the last refresh of V1 after V2 is important.
--
-------------------------------------------------------------------------------------------
create table #t_final_dependency_list (id_num int IDENTITY(1,1), modules int)

while exists (select * from #t_independent_modules)
	begin
		-- append the set of independent modules into a list
		insert into #t_final_dependency_list select * from #t_independent_modules

		-- get the set of dependent modules
		select distinct #t_dependency_table.referencing_id into #temp_table
			from #t_dependency_table
			where #t_dependency_table.referenced_id in (select * from #t_independent_modules)

		-- clear up the list of independent modules
		truncate table #t_independent_modules

		-- the dependent modules now become the independent modules
		insert into #t_independent_modules select * from #temp_table

		-- delete the dependent modules list
		drop table #temp_table
	end

-----------------------------------------------------------
--
-- refresh the modules once more but in the right order now
--
-----------------------------------------------------------
declare modules_cursor_final cursor for
	select modules from #t_final_dependency_list order by #t_final_dependency_list.id_num

open modules_cursor_final
fetch next from modules_cursor_final into @module

IF @@FETCH_STATUS <> 0
	PRINT '            <<None>>	No module to refresh'

while @@FETCH_STATUS = 0
	begin
		if (select type from sys.objects where object_id = @module) in ('V', 'FN', 'IF', 'TF')
			and (select is_schema_bound from sys.sql_modules where object_id = @module) = 0
			begin
				select @str = quotename(schema_name(objectproperty(@module, 'schemaid'))) + '.' + quotename(object_name(@module))
				select @message = '            trying to finally, once more, refresh ' + @str
				print @message
				exec sys.sp_refreshsqlmodule @str
				select @message = '            ' + @str + ' was finally refreshed once again'
				print @message
			end

		fetch next from modules_cursor_final into @module
	end

close modules_cursor_final
DEALLOCATE modules_cursor_final

-----------
--
-- cleanup
--
-----------
drop table #t_excluded_modules
drop table #t_modules_refreshed_in_end
drop table #t_user_views_or_tables
drop table #t_dependency_table
drop table #t_independent_modules
drop table #t_final_dependency_list

Posted in Copy database wizard, Replication, SQL General | Tagged: copy database, copy database Invalid object name, Script to fix dependencies, sql server copy, The schema script could not be propagated to the subscriber | 6 Comments »

SQL Server Agent is taking long time to start

Posted by Karthick P.K on April 19, 2012

SQL Server Agent might take long time to start because of slow communications with Certificate Authorities.

If you enable verbose logging for SQL Server agent (-v) and look at the SQL Server agent log you will notice that ‘ANALYSISQUERY’ subsystem has taken long time to start

2012-02-15 15:42:42 – ? [124] Subsystem ‘QueueReader’ successfully loaded (maximum concurrency: 800)

2012-02-15 15:47:08 – ? [124] Subsystem ‘ANALYSISQUERY’ successfully loaded (maximum concurrency: 800)

2012-02-15 15:47:08 – ? [124] Subsystem ‘ANALYSISCOMMAND’ successfully loaded (maximum concurrency: 800)

Also if you collect dumps during the SQLServer agent startup you will notice the stack like one below.

ntdll!ZwWaitForSingleObject

kernel32!WaitForSingleObjectEx

cryptnet!CryptRetrieveObjectByUrlWithTimeout

cryptnet!CryptRetrieveObjectByUrlW

crypt32!ChainRetrieveObjectByUrlW

crypt32!CCertChainEngine::RetrieveCrossCertUrl

crypt32!CCertChainEngine::UpdateCrossCerts

crypt32!CCertChainEngine::Resync

crypt32!CCertChainEngine::CreateChainContextFromPathGraph

crypt32!CCertChainEngine::GetChainContext

crypt32!CertGetCertificateChain

wintrust!_WalkChain

wintrust!WintrustCertificateTrust

wintrust!_VerifyTrust

wintrust!WinVerifyTrust

mscorsec!GetPublisher

mscorwks!PEFile::CheckSecurity

mscorwks!PEAssembly::DoLoadSignatureChecks

mscorwks!PEAssembly::PEAssembly

mscorwks!PEAssembly::DoOpenHMODULE

mscorwks!PEAssembly::OpenHMODULE

mscorwks!AppDomain::BindExplicitAssembly

mscorwks!AppDomain::LoadExplicitAssembly

mscorwks!ExecuteDLLForAttach

mscorwks!ExecuteDLL

mscorwks!CorDllMainForThunk

mscoree!CorDllMainWorkerForThunk

mscoree!VTableBootstrapThunkInitHelper

mscoree!VTableBootstrapThunkInitHelperStub

SQLAGENT!LoadSubsystem

SQLAGENT!StartSubSystems

SQLAGENT!DumpAndCheckServerVersion

SQLAGENT!ServiceMain

advapi32!ScSvcctrlThreadW

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

ANALYSISQUERY subsystem has assembly which has an Authenticode signature. When the CLR loads an assembly which has an Authenticode signature, it will always try to verify that signature.

This verification can be quite time intensive, since it can require hitting the network several times to download up to date certificate revocation lists, and also to ensure that there is a full chain

of valid certificates on the way to a trusted root.

If you can’t get to the internet to authenticate signature or want to bypass the Authenticode signature you can try creating a sqlagent.exe.config file with the following xml in Binn directory. This bypasses the check

Create a sqlagent.exe.config file with:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<runtime>
<generatePublisherEvidence enabled="false"/>
</runtime>
</configuration>

Thanks

Karthick P.K

Posted in SQL General | Tagged: Delay starting SQLAgent, SQL Server agent performance, SQL Server agent slow, SQLAgent | 1 Comment »

SQL-Server resource fails to come online IS Alive check fails

Posted by Karthick P.K on January 31, 2012

SQL-Server resource fails to come online with below Error:

[sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = 35; message = [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible.

Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.

Resolution:

Look at the version of (c:\windows\system32\sqsrvres.dll) and install the same version of SQL Server native client.

Cause:

When Higher version of SQL-Server is installed on a cluster in which lower version of SQL Server is already installed, the lower version SQL Server Resource DLL (c:\windows\system32\sqsrvres.dll) is upgraded to higher version and Higher resource DLL will be loaded by the resource monitor process to monitor Lower version as well.

For example: The Denali SQL Server Resource uses SNAC 11.0 to connect to the SQL instance and because SNAC 11.0 can be used to connect to Shiloh, Yukon and Katmai as well this side by side configuration will work. However if Denali is uninstalled, the Denali SQL Server resource DLL is not downgraded to Katmai, Yukon or Shiloh version and hence care should be taken to not uninstall SNAC 11.0 otherwise Yukon or Shiloh instance cannot be brought online.

Similarly When we install Yukon and Shiloh together, Yukon SQL Server Resource uses SNAC to connect to the SQL instance and because SNAC can be used to connect to Shiloh as well this side by side configuration will work. However if Yukon is uninstalled, the Yukon SQL Server resource DLL is not downgraded to Shiloh version and hence care should be taken to not uninstall SNAC otherwise Shiloh instance cannot be brought online.

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Configuration, Connectivity, SQL General, SQL Server Cluster | Tagged: A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible., ISAlive failure, SQLServer Cluster, SQLServer Resource not comming online, [sqsrvres] checkODBCConnectError: sqlstate = 08001; native error = 35; message = [Microsoft][SQL Server Native Client 10.0] | 8 Comments »

How to move the LOB data from one file group to other?

Posted by Karthick P.K on January 17, 2012

We do not have a direct way to move the LOB data from one file group to other. Using ALTER TABLE and/or CREATE INDEX to support moving LOB data is unavailable till current version of SQL (SQL Server 2008).

Only way to move the LOB data is to

1. create new table in new file group

2. Move the data from existing table to new table.

3. Drop the existing table.

4. Change the name of new table to Old table.

Management studio has easy way to create script for all the above task.

1. In management studio Right click the table –>Design –>change the file group in properties windows (Click View—> properties window if you do not see properties window)

2. Generate Change Script.

3. Script similar to following script is generated.

4. Copy the script and run in Query window.

/* To prevent any potential data loss issues, you should review this script in detail before running it outside the context of the database designer.*/

BEGIN TRANSACTION

SET QUOTED_IDENTIFIER ON

SET ARITHABORT ON

SET NUMERIC_ROUNDABORT OFF

SET CONCAT_NULL_YIELDS_NULL ON

SET ANSI_NULLS ON

SET ANSI_PADDING ON

SET ANSI_WARNINGS ON

COMMIT

BEGIN TRANSACTION

CREATE TABLE dbo.Tmp_BLOB_TABLE

(

BLOBName varchar(100) NULL,

BLOBData varbinary(MAX) NULL

) ON [PRIMARY]

TEXTIMAGE_ON Lob2

ALTER TABLE dbo.Tmp_BLOB_TABLE SET (LOCK_ESCALATION = TABLE)

IF EXISTS(SELECT * FROM dbo.BLOB_TABLE)

EXEC(‘INSERT INTO dbo.Tmp_BLOB_TABLE (BLOBName, BLOBData)

SELECT BLOBName, BLOBData FROM dbo.BLOB_TABLE WITH (HOLDLOCK TABLOCKX)’)

DROP TABLE dbo.BLOB_TABLE

EXECUTE sp_rename N’dbo.Tmp_BLOB_TABLE’, N’BLOB_TABLE’, ‘OBJECT’

COMMIT

Thanks

Karthick P.K

Posted in Space management, SQL General, SQL Server Tools | Tagged: Moving LOB data | 34 Comments »

Script to free cache

Posted by Karthick P.K on December 6, 2010

DBCC FREESYSTEMCACHE ( 'ALL' ) WITH MARK_IN_USE_FOR_REMOVAL

GO

DBCC FREESESSIONCACHE WITH NO_INFOMSGS

GO

DBCC FREEPROCCACHE WITH NO_INFOMSGS

GO

DBCC DROPCLEANBUFFERS 

GO

Posted in Performance, SQL General, SQL Query | Tagged: Clear cache, clear sqlserver cahe, free sqlserver cache, Free sqlserver memory, Performance, SQL Query | 59 Comments »

How to rebuild index and update statistics for all the tables in database.

Posted by Karthick P.K on September 26, 2010

 
EXEC sp_MSforeachtable 'UPDATE STATISTICS ? WITH FULLSCAN'   --  {can be run anytime}

Exec sp_MSforeachtable "dbcc dbreindex('?')"      --- {Always run this on a off-peak hour on any SQL Server instance}

Thanks

Karthick

Posted in Optimizer, Performance, SQL General, SQL Query | Tagged: Rebuild indexfor all tables, reindex all tables in database, Script to rebuild index for all tables, SQL General, SQL Query | 3 Comments »

How to check if local system is connected to a network and identify the type of network connection

Posted by Karthick P.K on July 26, 2010

#include <windows.h> 
#include <iostream> 
using namespace std;
#pragma comment(lib, "Sensapi.lib")
#include <Sensapi.h>

void main()
{
    
bool a; 
LPDWORD lpdwFlags;
lpdwFlags = new DWORD;
a=IsNetworkAlive( lpdwFlags);

        if(GetLastError()!=0)
        {
            cout<<"IsNetworkAlive failed:%d"<<GetLastError();
        }
        else if(GetLastError()==0 & (!a) )
        {
            cout<<"Network is not connected";
        }
        else if(GetLastError()==0 & (a))
        {
            cout<<"Network is connected.Type: "<< *lpdwFlags;  //1=Lan and 2=WAN
        }

}

Regards

Karthick P.K

Posted in Programming, SQL General | 2 Comments »

“Value cannot be null” when i connect SQL Server from SSMS

Posted by Karthick P.K on May 26, 2010

I get this below when I connect to SQL Server using SSMS…… What should i do?

Error

Value cannot be null.
Parameter name: viewInfo (Microsoft.SqlServer.Management.SqlStudio.Explorer)

Resolution

Right click SSMS “run as administrator” 🙂

If the “run as administrator doesn’t resolve the problem verify if %Temp% environment variable to set properly for the logged on widows account

If %Temp% is not set properly in environment variables we might end up with error.

If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups

Thank you,

Posted in SQL General, SQL Server Tools, SSMS | Tagged: error Value cannot be null, Error when opening SSMS, SQL Server management studio, SQL Server management studio fail to open, SSMS, SSMS fails to open, Value cannot be null ssms | 14 Comments »

How to get SQL Text and Query Plan for statements which are executing now

Posted by Karthick P.K on February 1, 2010

 
SELECT getdate() as "RunTime", st.text as batch,
SUBSTRING(st.text,statement_start_offset / 2+1 , 
( (CASE WHEN a.statement_end_offset = -1 
THEN (LEN(CONVERT(nvarchar(max),st.text)) * 2) 
ELSE a.statement_end_offset END)  - a.statement_start_offset) / 2+1)  as current_statement
,qp.query_plan, a.* FROM sys.dm_exec_requests a CROSS APPLY sys.dm_exec_sql_text(a.sql_handle) as st CROSS APPLY sys.dm_exec_query_plan(a.plan_handle) as qp 
order by CPU_time desc

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in SQL General, SQL Query | Tagged: Current queries executed by SQLserver, Queries which are executing now with plan, Query and plans which are running now, SQL Query, SQL Server CPU is 100% how to find the query | 7 Comments »

Script to clear stats

Posted by Karthick P.K on January 20, 2010

How to reset SQL Server stats with out restarting.

DBCC SQLPERF ('spinlockstats', CLEAR);

GO

DBCC SQLPERF ('netstats', CLEAR);

GO

DBCC SQLPERF ('rastats', CLEAR);

GO

DBCC SQLPERF ('iostats', CLEAR);

GO

DBCC SQLPERF ('threads', CLEAR);

GO

DBCC SQLPERF ('logspace', CLEAR);

GO

DBCC SQLPERF ('umsstats', CLEAR);

GO

DBCC SQLPERF ('waitstats', CLEAR);

GO

Posted in Performance, SQL General, SQL Query | Tagged: SQL Query | 2 Comments »

Monitoring Tempdb usage

Posted by Karthick P.K on January 13, 2010

Monitoring Tempdb space usage and identifying the session and query which Consumes Tempdb

The total space used by Tempdb consists of

1. User Objects

2. Internal Objects

3. Version Store

4. Free Space.

Use Below Query to Track which objects (above) is consuming pace in TempDb.

SELECT

SUM(unallocated_extent_page_count) AS [free pages], (SUM(unallocated_extent_page_count)*1.0/128) AS [free space in MB],SUM(version_store_reserved_page_count) AS [version store pages used],

(SUM(version_store_reserved_page_count)*1.0/128) AS [version store space in MB],SUM(internal_object_reserved_page_count) AS [internal object pages used],

(SUM(internal_object_reserved_page_count)*1.0/128) AS [internal object space in [MB],SUM(user_object_reserved_page_count) AS [user object pages used],

(SUM(user_object_reserved_page_count)*1.0/128) AS [user object space in MB]

FROM sys.dm_db_file_space_usage;

go

Once you have identified the objects identify the query and session which is consuming tempdb using the query listed below

–Use below query to identify which Query and Session is consuming the space in TempDB

 
SELECT R1.session_id, R1.request_id, R1.Task_request_internal_objects_alloc_page_count, R1.Task_request_internal_objects_dealloc_page_count,

R1.Task_request_user_objects_alloc_page_count,R1.Task_request_user_objects_dealloc_page_count,R3.Session_request_internal_objects_alloc_page_count ,

R3.Session_request_internal_objects_dealloc_page_count,R3.Session_request_user_objects_alloc_page_count,R3.Session_request_user_objects_dealloc_page_count,

R2.sql_handle, RL2.text as SQLText, R2.statement_start_offset, R2.statement_end_offset, R2.plan_handle FROM (SELECT session_id, request_id, 

SUM(internal_objects_alloc_page_count) AS Task_request_internal_objects_alloc_page_count, SUM(internal_objects_dealloc_page_count)AS 

Task_request_internal_objects_dealloc_page_count,SUM(user_objects_alloc_page_count) AS Task_request_user_objects_alloc_page_count,

SUM(user_objects_dealloc_page_count)AS Task_request_user_objects_dealloc_page_count FROM sys.dm_db_task_space_usage 

GROUP BY session_id, request_id) R1 INNER JOIN (SELECT session_id, SUM(internal_objects_alloc_page_count) AS Session_request_internal_objects_alloc_page_count,

SUM(internal_objects_dealloc_page_count)AS Session_request_internal_objects_dealloc_page_count,SUM(user_objects_alloc_page_count) AS Session_request_user_objects_alloc_page_count,

SUM(user_objects_dealloc_page_count)AS Session_request_user_objects_dealloc_page_count FROM sys.dm_db_Session_space_usage 

GROUP BY session_id) R3 on R1.session_id = R3.session_id 

left outer JOIN sys.dm_exec_requests R2 ON R1.session_id = R2.session_id and R1.request_id = R2.request_id

OUTER APPLY sys.dm_exec_sql_text(R2.sql_handle) AS RL2
Where 
Task_request_internal_objects_alloc_page_count >0 or  
Task_request_internal_objects_dealloc_page_count>0 or 
Task_request_user_objects_alloc_page_count >0 or 
Task_request_user_objects_dealloc_page_count >0 or 
Session_request_internal_objects_alloc_page_count >0 or 
Session_request_internal_objects_dealloc_page_count >0 or 
Session_request_user_objects_alloc_page_count >0 or 
Session_request_user_objects_dealloc_page_count >0

Known issues related to TEMPDB Shrink

FIX: The used space in the tempdb database increases continuously when you run a query that creates internal objects in the tempdb database in SQL Server 2005

Thank you,

Karthick P.K | My Facebook Page |My Site| Blog space| Twitter

Karthick P.K on SQL Server

Karthick P.K

Categories

Archives

Tags

Do you have a question in SQL Server or stuck in SQL Server issue? Click here to join our facebook group and post your questions to SQL Server experts

Email Subscription

SQLWiki

Subscribe

feedburner email subscription

Visits

SQL Server Blogs

Author

Other SQL blogs

Archive for the ‘SQL General’ Category

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Disclaimer:

Share this:

Share this:

Disclaimer:

Share this:

Max Server Memory

AWE feature removed from SQL Server 2012

Dynamic virtual address space management

SQLCLR loaded at startup

Disclaimer:

Share this:

Disclaimer

Share this:

Share this:

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Share this:

Share this:

Share this:

Disclaimer:

Share this:

Share this:

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Share this:

Share this:

Share this:

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: