MSSQLWIKI

Karthick P.K on SQL Server

Top SQL Server blogs from MSSQLWIKI

Posted by Karthick P.K on January 11, 2014

Top SQL Server blogs  from MSSQLWIKI 

Year 2013 was a wonderful year for MSQLWIKI.com and these are the top blogs that got the most views in 2013. I am summarizing the blogs based on category for easy read  

 

Debugging ( Using public symbols )

What is SQL Server Latch & Debugging latch time out

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps  

SQL Server memory leak

Debugging memory Leaks using Debug diagnostic tool.

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

SQL Server generated Access Violation dumps while accessing oracle linked servers.

 

SQL Server Memory

Troubleshooting SQL Server Memory
Basics of SQL Server Memory Architecture

SQL Server lock pages in memory

Max server memory – Do I need to configure?

A significant part of SQL Server process memory has been paged out
SQL Server memory leak

Debugging memory Leaks using Debug diagnostic tool.

SQL Server and VMware ballooning

SQL Server fails to start with error "Failed allocate pages: FAIL_PAGE_ALLOCATION 1" During startup

SQL Server -g

SQL Server 2012 Memory

False warning “A significant part of sql server process memory has been paged out”

What is RESOURCE_SEMAPHORE_QUERY_COMPILE?

What is Target Server Memory (KB)?

SQL Server performance degraded in 32-Bit SQL Server after adding additional RAM.

What does MemoryUtilization in sys.dm_os_ring_buffers and Memory_utilization_percentage in sys.dm_os_process_memory represents?

 

SQL Server operating system

SQL Server NUMA load distribution

SQL Server Operating system (SOS) Series 3

SQL Server Operating system (SOS) Series 2

SQL Server Operating system (SOS) Series 1

 

Latch contention

Tempdb latch contention

SQL Server Latch

 

SQL Server performance

SQL Server Query optimization

SQL Server Parameter sniffing

Optimizer Timeout or Optimizer memory abort

Troubleshooting SQL Server high CPU usage

I/O requests taking longer than 15 seconds to complete on file

SQL Server NUMA load distribution

Script to get all the Query and cached plans

Trivial Plan

Script to get current blocking tree with wait types

How to get SQL Text and Query Plan for statements which are executing now

Script to clear stats

Script to free cache

How to rebuild index and update statistics for all the tables in database.

SQL Server Database tuning advisor

Capture context switches from dm_os_ring_buffers

 

Replication

Transactional Replication Part-1

Transactional Replication Part-2

Troubleshooting Transactional replication Latency using AgentStatistics

I get error while I update a row in immediate updating subscription what is the cause?

How to start replication agent in command prompt.

How to create transactional replication in SQL Server

How to configure distribution for replication

Connectivity

SQL Server connectivity, Kerberos authentication and SQL Server SPN (Service Principal Name for SQL Server)

Behavior of SQL Server default instance on a NON-Default port

 

Always on

The connection to the primary replica is not active. The command cannot be processed

 

Ring buffers

Inside sys.dm_os_ring_buffers

 

Data structures

SQL Server: Ghost records

 

Database mail

Database Mail errors in SQL Server (Troubleshooting steps)

 

Copy Data

Copy database wizard or replication setup might fail due to broken dependency

How to move the LOB data from one file group to other?

Create script for all objects in database with data

SQL Server agent

SQL Server Agent is taking long time to start

 

Security

“An error occurred during decryption”

Configuring SSL for SQL Server using Microsoft Certificate Authority Server

How to find who altered my SQL Server Login

(SQLServer) Initializing the FallBack certificate failed with error code: 1, state: 1, error number: -2146893802.

How to enable Constraint delegation for SQLServer2005

Bulk insert fails with error 4861 Cannot bulk load because the file could not be opened

 

 

SQL Server cluster

SQL-Server resource fails to come online IS Alive check fails

SQLServer LooksAlive and IsAlive Check

Cannot bring the Windows Server Failover Clustering (WSFC) resource (ID ‘ ‘) online (Error code 5018).

SQL Server cluster installation checklist

How to add node to SQL Server cluster

HOW TO CREATE CLUSTER USING HYPER-V

HOW TO INSTALL SQL Server CLUSTER IN HYPER-V

How to install cluster in windows 2008 and windows 2012

windows cluster freezes at “waiting for notification that node ‘‘ is a fully functional member of the cluster”

 

Tempdb usage

How to monitor the Session and query which Consumes Tempdb

 

SQL Server setup

Service pack ,Hotfix and CU installation for SQL Server 2005 might fail with “Unable to install Windows Installer MSI file“

Installation of SQLServer2008 fails (The registry key SYSTEM\CurrentControlSet\Services\RsFx0102\InstancesShares is missing)

Installation of SQLserver2008 cluster fails on windows2008.(The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0x8007139F)

Installation of SQLServer2005/2008 Fails on Windows2008 Cluster.

Transaction log for the database is growing and system SPID is holding open transaction

Unable to start SQLServer agent resource on cluster after upgrading to 9.00.3186 or Higher

A failure was detected for a previous installation, patch, or repair during configuration for features [SQL_PowerShell_Engine_CNS,SQL_PowerShell_Tools_ANS]. In order to apply this patch package (KB968369), you must resolve any issues with the previous operation that failed.

SQL Server2008/SQL Server2012: Script level upgrade for database ‘master’ failed because upgrade step ‘sqlagent100_msdb_upgrade.sql’ encountered error 574, state 0, severity 16

SQLServer 2008 Fails to come online on cluster after upgrade

Transaction log for the database is growing and system SPID is holding open transaction

SQL Server Error while enabling Windows feature : NetFx3, Error Code : -2146498298

A SQL product other than SQL Server 2014 CTP1 is detected. You cannot install this release until the existing instances of SQL products are uninstalled

 

Backup & restore

The backup of the file or filegroup "" is not permitted because it is not online. BACKUP can be performed by using the FILEGROUP or FILE clauses to restrict the selection to include only online data.

 

Ring buffers

Inside sys.dm_os_ring_buffers

How to find SQL Server and system CPU usage history

 

SQL Agent and SQL Server startup issues

SQL Server Agent is taking long time to start

The database ‘model’ is marked RESTORING and is in a state that does not allow recovery to be run.Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive

We do not see the SQL server and SQL Agent status from management studio. When we right click the instance, start, stop and resume options is disabled.

(SQLServer) Initializing the FallBack certificate failed with error code: 1, state: 1, error number: -2146893802.

 

Profiler and SQL Server tools

My “c:\” Drive gets full when I open the profiler trace

Beyond XP_READERRORLOG (Parameters of XP_READERRORLOG)

“Value cannot be null” when i connect SQL Server from SSMS

How to find all the profiler traces running on my SQL Server

I cant not backup my database from SSMS….

I get Exception when i open SQL Server management studio

Tasks and connection icons missing after importing DTS in SQL Server management studio

Saving changes is not permitted. The changes you have made require the following tables to be dropped and re-created

Runtime error: ActiveX component can’t create object: ‘SQLDMO.SQLServer’

 

Others

System stored procedures like sp_addsrvrolemember or sp_addserver may fail because of McAfee Host Intrusion Prevention

 

Mssqlsystemresource database

How to Browse (or) view objects and there code in mssqlsystemresource Database

 

Checkdb failures

DBCC CheckDB fails with error "The database could not be checked as a database snapshot could not be created and the database or table could not be locked

 

SSIS/DTS

SSIS package fails with out of memory errors

SSIS package fails with out of memory errors

SSIS package fails when executed as job using proxy account

 

Programming

Multi Threaded OVELAPPED and Nonbuffered I/O Example

Asynchronous I/O example

How to Create process in c++….. CreateProcess function

how to Open CreateFile (Createfile example)

How to get the current state of cluster resource?

QueryMemoryResourceNotification & CreateMemoryResourceNotification (How SQL Server identifies low system memory on the system and respond to low system memory?)

AWE allocator API’s (How SQL Server AWE works)

How to check if local system is connected to a network and identify the type of network connection

How to check if my account has LPM privilege

The backup of the file or filegroup "" is not permitted because it is not online. BACKUP can be performed by using the FILEGROUP or FILE clauses to restrict the selection to include only online data.

How to retrieve information about the file system and volume associated with the specified root directory (GetVolumeInformation function)

How to add an IP Address when we Add new NIC to node where SQLServer2005 instance is running.

CryptAcquireContext and CryptReleaseContext example

Criticalsection example

CreateProcess example

CreateFileMapping or MapViewOfFileEx example

 

How to & Other miscellaneous blogs

PREEMPTIVE_OS_AUTHORIZATIONOPS waits in SQL Server

SQL Server Backup compression

Types of isolation levels in SQL Server

SQL Server database snapshot

How to create table with filestream column and Insert data

How to enable and configure Filestream in SQL SERVER 2008 / 2012

How to import/Export data in SQL Server

Steps to enable Alwayson in SQL Server 2012

How to create database mirroring

How to create log shipping in SQL Server

How to create merge replication in SQL Server

How to create shared disk using iSCSI Software Target

NON CLUSTERED COLUMNSTORE INDEX

How to create clustered and non-clustered index

How to create SQL Server Agent Jobs

How to create backups using database maintenance plan

How to set Max degree of parallelism (MAXDOP)

How to configure maximum server memory

How to stimulate a SQL Server resource failure in cluster

Error 601 : Could not continue scan with NOLOCK due to data movement.

SQL Server monitoring

Removing database mirroring

Builtin\Administrators cannot login in to SQL Server

SQL Server assert in Location: purecall.cpp:51

Trace waits in SQLServer (SQLTRACE_BUFFER_FLUSH,TRACEWRITE,SQLTRACE_WAIT_ENTRIES,SQLTRACE_LOCK)

How to view the Space used by each table in database

FILESTREAM feature is disabled

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

 

 

 

Posted in Summary | Tagged: , , | 2 Comments »

Transactional Replication Part -2

Posted by Karthick P.K on November 22, 2013

Transactional Replication Part -2 of transactional replication series covers

Demo of data flow, configuring distributor, publisher, publication, subscription etc. After watching this video you will be able to correlate the concepts we discussed in earlier video, configure transactional replication on your own , Understand different replication agents like snapshot agent, log reader agent and distribution agent and how to monitor these agents after the transactional replication is configured

 

How to configure transactional replication By Gaurav Mathur

Posted in SQL General | Tagged: , , | 2 Comments »

Transactional Replication Part -1

Posted by Karthick P.K on November 22, 2013

Transactional Replication Part -1 of transactional replication series covers about

1. Architecture and transactional replication data flow.
2. Different entities involved in transactional replication like Publisher Server, Distributor Server and Subscriber Server, publication, publication database, subscription, subscription database, articles are discussed in this video.
3. Replication agents involved in one way transactional replication and their usage of different agents like snapshot agent, log reader agent and distribution agent are also discussed
4. Steps involved in configuring transactional replication like configuring distributor, publisher and subscriber along with configuring publication and subscription are also discussed in this video.
5. Any DBA can look into this video and can learn the Transactional Replication Data flow, working and how to configure Transactional replication.

After watching the below video you can look at the Transactional replication Part 2 demo video which will help you to learn the above concepts practically and will enable you to configure replication on your servers.

Transactional Replication internals and architecture by Gaurav Mathur

Posted in Replication, SQL General | Tagged: , , , , | 1 Comment »

Tempdb latch contention

Posted by Karthick P.K on September 17, 2013

You might see Page latch contention in tempdb when you repeatedly drop and create TempDb objects (Temp tables, table variables etc.).

When you notice PAGELATCH_* contention on tempdb (Wait resource in sysprocesses starts with 2: ) check if the latch wait is on PFS,GAM or SGAM page. When there is latch contention on tempdb you will see lot of sessions waiting on Pagelatch_* similar to one below.

In the below output session is waiting on resource 2:15:121320 . If we decode the wait resource it is 2: database id of tempdb ,  15: file number , 121320 is page number. 121320 is in multiple of 8088 so it is a PFS page, similarly identify if the page we are waiting is GAM or SGAM page if it is not PFS page.

Wait type                            Wait resource

PAGELATCH_UP               2:15:121320

PAGELATCH_UP               2:15:121320

How to identify if page is PFS,GAM or IAM?

PFS Page: A PFS page occurs once in 8088 pages. SQL Server will attempt to place a PFS page on the first page of every PFS interval(8088Pages). The only time a PFS page is not the first page in its interval is in the first interval for a file. File header page is first, and the PFS page is second. (Page ID starts from 0 so the first PFS page is at Page ID 1). If (page number)/8088 is round value then the page is PFS page.

GAM Page: GAM page is page 2 in the data file, next GAM page is placed at 511230 Page after first GAM page (GAM interval). If (page number-1)/511230 is round value then the page is GAM page.

SGAM Page: SGAM page is page 3 in data file , next SGAM page is placed at 511230 Page after first SGAM page. If (page number-2)/511230 is round value then the page is GAM page.

clip_image002

How to resolve?

1. Increase the number of TEMPDB data files files and size them equally. As a general rule, if the number of logical processors is less than or equal to 8, use the same number of data files as logical processors. If the number of logical processors is greater than 8, use 8 data files and then if contention continues further increase the number of data files by multiples of 4 (You may not see improvement once you reach 32 files). 

2. Enable server side trace flag 1118.

3. If you further see latch contention on PFS  page after following above two steps then the only option is to modify your application to limit the tempdb usage.

4. If you see contention on 2:1:103 (Page 103 is for system table sys.sysmultiobjrefs. This table manages the relationship between created objects in every database). The only way to reduce contention on this page is reduce the relation. Example creating lot of temp tables with primary key can cause this contention because the relation between the table and PK constraint has to be updated in sys.sysmultiobjrefs.

What’s the best practice ?

1. Create multiple tempdb data files instead of creating 1 large file and size them equally in all your SQL Server instances.

2. Make TF1118 (Uniform allocation) as default. (Extra space required by this trace flag shouldn’t really matter as amount additional space required is minimal and storage cost is not that high these days).  

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Performance, Space management, SQL General, SQL Server Engine | Tagged: , , , , | 1 Comment »

Troubleshooting Transactional replication Latency using Agent Statistics

Posted by Prabhakar Bhaskaran on September 13, 2013

Troubleshooting latency issues in replication is black box for many DBA’s, In this post I will explain how you can leverage the agent statistics to troubleshoot the latency issues.

Before understanding how to decode the agent statistics, lets take a look at the some of the basic things which will help us to troubleshoot the replication performance issue in better way.

The following MSDN diagram depicts the transactional replication architecture in simple manner.

Transactional replication components and data flow

Troubleshooting latency issues is multi step approach, first step is identify which agent is slow,

  • Log reader Agent (Publisher to Distributor)
  • Distribution Agent (Distributor to Subscriber)

So, the problem can be either log reader or distribution agent, we can identify this by just simply inserting the tracer token.

Once we find out the problematic agent the next step is to identify within the agent which particular thread causing the issue.

Let me introduce you to the important threads and its work on these replication agents in nutshell.

Log Reader Agent

Reader Thread – It scans the publisher database transaction log using sp_replcmds

Writer Thread – Add the queued transactions to Distribution database using sp_MSadd_repl_commands

Distribution Agent

Reader thread – It finds the watermark from the table Msreplication_subscriptions(on subscriber) and uses this information to retrieve pending commands from the Distribution database. It basically uses the stored procedure sp_MSget_replcommands to achieve it.

Writer thread – Writer thread uses the Batched RPC calls to write the information to subscriber database.

Now that we understood the threads in the replication agents.  let’s assume we already identified which agent is slow by inserting tracer token. Next is to dig deeper on thread level, this is where our replication agent statistics comes to rescue us.

Agent statistics entries appended to history tables every 5 minutes by default. It provides the historical view of how the agent has been performing and keeps the last 3 days data. You can keep for more days by changing the history retention period.

MSlogreader_history

MSdistribution_history

the above two tables are located in Distribution database. The statistics information is added as XML blob in comments column of these tables.

Now, lets take a look at how to decipher this XML Data for each agents.

Log Reader Agent statistics

<stats state=”1″ work=”948″ idle=”351940″ >
<reader fetch=”859″ wait=”0″/>
<writer write=”822″ wait=”395390″/>
<sincelaststats elapsedtime=”300″ work=”49″ cmds=”176998″ cmdspersec=”3543.000000″><reader fetch=”17″ wait=”0″/><writer write=”29″ wait=”350833″/></sincelaststats></stats>

– State = 1 means stats after batch commit

–Work = cumulative time spent by the agent since restart – idle time

–Idle = Time spent waiting to call sp_replcmds

–Reader fetch = Time to do execute sp_replcmds

Wait = Time spent waiting on writer to release buffer

–Writer write = Time spent writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer

Note: Each thread will have their own buffer with 40k in size.

Here,we need to look at the wait time to understand where the bottleneck exist.For example, if you notice wait time for Reader thread is high then it essentially means your writer thread is slow since reader thread is waiting for writer to release the buffer. Similarly, if you notice high wait time for writer thread then your reader thread is performing slow.

The simple way to decode this is,

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the commands to distribution database)

HIGH Wait time on Writer thread =  Reader thread is slow ( thread which scans the transaction log)

Distribution Agent Statistics

<stats state=”1″ work=”154″ idle=”351464″>
<reader fetch=”144″ wait=”11″/>
<writer write=”12″ wait=”338″/>
<sincelaststats elapsedtime=”305″ work=”10″ cmds=”81262″ cmdspersec=”8041.000000″><reader fetch=”0″ wait=”9″/><writer write=”10″ wait=”0″/></sincelaststats></stats>
– State =1 means stats after a batch commit

– Work = cumulative time spend by the agent since restart – idle time (seconds)

– Idle = Time spend waiting to call sp_msget_repl_commands

– Reader fetch = Time to do execute sp_msget_repl_commands

Wait = Time spent waiting on writer to release buffer.

– Writer write = Time spend writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer.

Similar to log reader agent, the decoding of wait time is same way we did for log reader agent.

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the subscriber database using batched RPC Calls)

HIGH wait time on Writer thread = Reader thread is slow ( thread which takes the pending commands from Distribution database)

Distributor Writer thread Slow Scenario

We would be able to understand this concepts better by looking at the example statistics, In this below case, I explicitly started the transaction on subscriber table to simulate blocking at the subscriber side making the writer thread of distribution agent to wait and build up latency.

This is how stats looked,

  • <stats state=”1″ work=”755″ idle=”354505″>
  • <reader fetch=”153″ wait=”604″/>
  • <writer write=”613″ wait=”346″/>
  • <sincelaststats elapsedtime=”636″ work=”515″ cmds=”45033″ cmdspersec=”87.000000″><reader fetch=”0″ wait=”515″/><writer write=”515″ wait=”0″/></sincelaststats></stats>

We can clearly see Reader thread wait time is high(515) which means writer thread is slow since we simulated the blocking on subscriber side.

Similarly,we can simulate the blocking on replication tables msrepl_commands and msrepl_transactions which will cause Log reader writer thread to be slow and stats will show Reader thread wait time as high.

Ok, now we isolated the source of bottleneck in thread level, After this we can just follow the standard performance troubleshooting approach described in this Whitepaper to troubleshoot the slowness of the replication session.

For instance, check out the video where Joe Sack talks about using Extended events to troubleshoot the Distributor writer thread slowness.

In Summary

1. Find which agent is causing slowness using tracer token.

2. Leverage the Agent statistics to narrow down problem to thread level .

3. Follow standard performance troubleshooting approach to resolve the issue.

Thanks for reading! I hope this will help you to troubleshoot the replication performance better next time.

-Prabhakar Bhaskaran

Posted in Performance, Replication, SQL General | Tagged: , , , , , | 1 Comment »

The connection to the primary replica is not active. The command cannot be processed

Posted by Karthick P.K on June 20, 2013

When you configure SQL Server always on available group from management studio it may fail with below error while joining secondary replica to the availability group.

 

Error 1

 

{

Joining database on secondary replica resulted in an error.  (Microsoft.SqlServer.Management.HadrTasks)

——————————

ADDITIONAL INFORMATION:

Failed to join the database ‘AG’ to the availability group ‘AG1′ on the availability replica ‘NODE2′. (Microsoft.SqlServer.Smo)

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

——————————

The connection to the primary replica is not active.  The command cannot be processed. (Microsoft SQL Server, Error: 35250)

}

 

Error 2

 

{

TITLE: Microsoft SQL Server Management Studio

——————————

Failed to join the instance ‘NODE2′ to the availability group ‘AG1′. (Microsoft.SqlServer.Management.SDK.TaskForms)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=11.0.2100.60+((SQL11_RTM).120210-1917+)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

——————————

ADDITIONAL INFORMATION:

Failed to join local availability replica to availability group ‘AG1′.  The operation encountered SQL Server error 41106 and has been rolled back.  Check the SQL Server error log for more details.  When the cause of the error has been resolved, retry the ALTER AVAILABILITY GROUP JOIN command. (Microsoft SQL Server, Error: 41158)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=11.00.2100&EvtSrc=MSSQLServer&EvtID=41158&LinkId=20476

}

 

 

You may get below error when you configure AG availability group using  alter database command mentioned below or synchronization might fail with 35250 error mentioned below.

 

ALTER DATABASE [AG] SET HADR AVAILABILITY GROUP = [Group name];

 

Error 1

 

Msg 35250, Level 16, State 7, Line 1

The connection to the primary replica is not active.  The command cannot be processed.

 

 

To resolve  above errors

 

1. Ensure always on endpoint ([Hadr_endpoint]) are not blocked by firewall (Default port 5022).

 

2. Make sure startup account of primary server is added to all secondary server’s and Startup accounts of all secondary servers are added to primary servers.(Startup account of each replica to be added to other replica’s)

 

3. If log on account of SQL Server is “Nt service\” or local system account then ensure system account (Domainname\systemname$) of each replica is added to other replicas.

{

CREATE LOGIN [MSSQLWIKI\node2$] FROM WINDOWS

}

 

4. Grant connect on always on endpoints created on each replicas for startup account of other replica servers (Grant connect on endpoints even if startup account of other replicas are added as sysadmins).

{

GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [MSSQLWIKI\node1$]

}

 

5.  Make sure SQL Server name (select @@servername) matches with hostname.

6. Make sure cluster service startup account is part of SQL Server logins (More details in This link).

 

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

 

Posted in Always On, Configuration, Connectivity, Security, SQL General | Tagged: , , , , , , | 7 Comments »

False warning “A significant part of sql server process memory has been paged out”

Posted by Karthick P.K on June 13, 2013

In A significant part of SQL Server process memory has been paged out we discussed about SQL Server working set trim warning, when they can occur and how to troubleshoot them.

I the same blog I mentioned SQL Server will log “A significant part of sql server process memory has been paged out” warning when working set reaches 50% or below of the overall committed memory by SQL Server memory manager. In this blog I will try to cover when this warning could be a false warning and how to identify it.

Let us recollect what is committed memory and different states of committed bytes in windows.

Committed: Total memory that is allocated by process (allocated bytes can be in RAM or Page file)

Committed working set: Committed memory which is currently in RAM

Committed Paged : Committed memory which is currently page file

Committed Mapped : Committed mapped to page file.

Committed untouched: committed memory which is never accessed (When a page is committed in windows it will never become working set unless accessed).

Let us understand what Committed untouched is. Download Memoryallocator exe from This link and Keep committing memory using the same exe.

You will notice that the committed memory of the Memoryallocator process increases, but the physical memory usage (RAM usage or Working set) (or) Page file usage will not increase at all. Only the committed memory of the process and committed memory of overall system increases.

Why?

When a page is committed in windows it will not become part of working set or page file unless it is accessed.

Similarly when SQL Server estimates the memory requirements of different clerks and allocates them during startup or on need. These allocated memory is part of committed memory but will not have a page in RAM or Page file unless accessed for the first time.

So during this condition SQL Server’s working set can go far below the committed bytes and once working set reaches 50% or below of overall committed bytes then ““A significant part of sql server process memory has been paged out” warning is logged in SQL Server errorlog.

How do you identify if this warnings are false warnings?

We can identify if these warnings are false using the SQL Server memory dump or using the Perfmon counters.

Let us stimulate a false warning situation using the below backup query and see how to identify if the warning is false.

Run the below query in your test system.

Note: If you do not get the warning message increase the buffer count in below query. If you get “There is insufficient system memory in resource pool” then reduce the buffer count.

WARNING: dumptrigger  and below trace flag’s are undocumented and should be used only in test environments with caution  (or) under Microsoft Support supervision. There is no guarantee that they will work in future versions of SQL Server.

DBCC TRACEON(8026,-1) --Trace flag –T8026 tells dump trigger to remove the trigger after the first dump has been triggered.
go
DBCC DUMPTRIGGER('SET',17890)
go
BACKUP DATABASE MSDB TO DISK = N'msdb.BAK' WITH NOFORMAT, INIT,NAME = N'msdb', SKIP, NOREWIND, NOUNLOAD, STATS = 1 ,BUFFERCOUNT = 10000,BLOCKSIZE = 65536 ,MAXTRANSFERSIZE=2097152

Once you run the above backup command you will see   error: 17890 “A significant part of sql server process memory has been paged out” and a mini memory dump will be created in Errrorlog  folder along with SQLDump00nn.txt

Using memory dump

Open the SQLDump00nn.txt and review the memory section in SQLDump00nn.txt. This will give you the system memory information when the error occurred.

Snippet from my SQLDump00nn.txt.

{

Memory                              

MemoryLoad = 26%                    

Total Physical = 131067 MB          

Available Physical = 96691 MB       

Total Page File = 393201 MB         

Available Page File = 334217 MB     

Total Virtual = 8388607 MB          

Available Virtual = 8166328 MB      

**Dump thread – spid = 0, EC = 0x00000020F19C2B90                                                               

***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL11.RBS\MSSQL\LOG\SQLDump0048.txt    

}

In the above output  “Available Physical  is  96,691 MB” which indicates there is no physical memory pressure when SQL Server raised  17890 warning so widows is not trimming the working set and obviously we can come to conclusion that this instance of warning is false warning.

Note: Above method may not work well in earlier versions of windows in which working set of all processes are hard trimmed when there memory pressure in the system.

Using perfmon counters

In below perfmon graph I have collected three counters

1. Process\SQLServr\Working set (highlighted Black)

2. Process\SQLServr\Private bytes (Committed memory. Green line)

3. Memory\AvailableMbytes  (Red line)

I you review your SQL Server error log you would notice 17890 warning at the same time when Private bytes (Green line) spiked.

How to conclude that the warning is printed because of “untouched committed pages”  by SQL Server.

In general when a page is committed and accessed it will be part of working set as long as there is enough available memory on the system. If you look at below graph you will notice that the private bytes (committed) is increasing but the working set is not increasing at same phase though there is adequate available memory in the system. This can happen only when the pages are committed and not accessed (If you look at below graph carefully committed memory increased and dropped with in 10 seconds so when you configure Perfmon choose sample rate every 1 second else perfmon might miss the data and you will find some thing like this happened). 

clip_image002

 

Related posts

Troubleshooting steps for all SQL Server Memory errors

Trouble shooting working set trim “A significant part of SQL Server process memory has been paged out”

SQL Server lock pages in memory should I use it?

SQL Server memory leak

What is new in SQL Server 2012 Memory

How to set max server memory and min server memory

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

 

Posted in Memory, SQL Server Engine, SQL Server memory | Tagged: , , , , | 3 Comments »

What does MemoryUtilization in sys.dm_os_ring_buffers and Memory_utilization_percentage in sys.dm_os_process_memory represents?

Posted by Karthick P.K on June 2, 2013

Few days back someone asked me an interesting question.  Why memory_utilization_percentage (working set ) is 100 % when Virtual_address_space_committed_Kb ( committed ) is around 10 GB and Physical_memory_in_use_kb is just 1.7 GB (refer below image)?

clip_image002

Physical_memory_in_use_kb is Memory allocated by the SQL Server process which is currently in RAM. (This includes AWE and Large pages allocation).

Virtual_address_space_committed_Kb is total memory that is allocated by process (allocated bytes can be in RAM, page file, mapped or in not used state)

Memory_utilization_percentage is ratio between Physical_memory_in_use_kb and Memory allocated by SQL Server using SQL Server memory manager(derived from dm_os_memory_nodes). If the Memory_utilization_percentage  is greater than 100% then it is  capped to 100% . Memory used by external components in the SQL Server address space is not considered while SQL Server derive the memory utilization percentage.

Memory_utilization_percentage =  (Physical_memory_in_use_kb/Memory allocated by SQL Server using SQL Server memory manager) * 100

To reproduce the above behavior download VirtualallocLeak.dll from THIS link and copy to ‘C:\EXE\’ folder.

Execute the below script and then query the sys.dm_os_process_memory DMV

select * from sys.dm_os_process_memory
exec sp_addextendedproc  'VirtualallocLeak','C:\exe\VirtualallocLeak.dll' 
exec VirtualallocLeak –This allocates 1048576 bytes per execution
go 300
select * from sys.dm_os_process_memory

Review the memory_utilization_percentage , Virtual_address_space_committed_Kb and Physical_memory_in_use_Kb

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in SQL Server Engine, SQL Server memory | Tagged: , , , | 4 Comments »

SQL Server monitor

Posted by Karthick P.K on May 11, 2013

Every SQL Server DBA would have faced situations similar to SQL Server not accepting connections for few minutes, SQL Server not responding for few minute or Applications not able to connect with SQL Server for few minutes. Before DBA’s gets alerted about the situation and starts troubleshooting the issue everything becomes normal. Challenge in this situations is it becomes very difficult to understand where the underlying problem was, It could be a network connectivity, Application server problem or It might be an issue with SQL Server itself. How do we collect diagnostic data to prove that SQL Server was stable at the time of issue (or) If the issue is with SQL Server then how to collect data we need for diagnosing the issue?

 

You can use SQL Monitor to monitor SQL Server instances

SQL Server Monitoring exe monitors the SQL Server services and creates diagnostic data and memory dump if SQL Server service is down (or) If SQL Server is not accepting connections (or) If SQL Server is not responding to Queries

How it works?

SQL Monitor checks the SQL Server in 3-Phases

1. Check the status of all the SQL Server service through the windows service control manager every 60 seconds.

2. If the service is running then check if SQL Server is accepting connections every 60 seconds.

3. If SQL Server is accepting Connections then probe to perform a simple query and see if SQL Server is responding properly.

4. If any of the SQL Server is not accepting Connections then connect to SQL Server using DAC, take a filtered stack dump which will be stored in errorlog directory of the instance , executes custom diagnostic script (c:\sqlmonitor\failoveranalysis.sql) and stores the output in c:\SQLmonitor\ with name “Servername+instancename.txt” which can be used to identify if there is any issue in SQL Server.

5. Once dump is taken release the DAC connection and wait for some time before we attempt to connect again. If connection is successful during subsequent attempt SQLMonitor.exe will continue monitoring the instance but if the connection fails again a new dump is generated and new diagnostic data is collected and appended to Servername+instancename.txt file in SQLMonitor folder.

There will be a gap of X minute between each diagnostic data and stack dump collection when the issue is continuing where X is (Number of Diagnostic data/dump already collected for this instance  * Number of Diagnostic data/dump already collected for this instance)

 

How to Configure?

1. Create a folder called SQLMonitor in C:\

2. Create a Text file called serverlist.txt  and enter all the SQL Servers in your environment to be monitored in below format.

Format:

Servername [TAB] Servicename;

Ex:

Server1 MSSQLServer;

Server2 MSSQL$Prod;

3. Invoke command prompt and open SQLmonitor.EXE.

Advantage:

1. Multi-threaded each server and service is verified using its own thread so retrieving information from one server will not affect the pooling interval to other server.

2. Single exe can be scaled to monitor more than 1000 servers and 1000 services.

3. Uses few MB of memory and system resources.

Requirements:

1. This exe can be invoked from any of the client systems with SQL Server client tools and SQL native clients installed.

2. Remote DAC connection has to be enabled in SQL Servers which are monitored.

3. EXE should be invoked under credential of user who has access to all the SQL Servers which are monitored and permission to view service control manager of windows servers in which SQL Server is running.

You can Download SQLMonitor.exe from this link

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Connectivity, SQL Server Engine, SQL Server Tools | Tagged: , , | 2 Comments »

Max server memory – Do I need to configure?

Posted by Karthick P.K on April 22, 2013

Do I need to configure Max server memory and min server memory?  What is the right value for  this configuration and how to determine it?

There are many debates around this and above questions are raised by many SQL Server DBA’s frequently in many forums. If you ask me , “It depends” on various factors.

Before we choose to configure or leave this value to default it is very important to understand how SQL Server grow and shrink its memory usage based on the available memory in operating system even when Max server memory is not configured or defaulted.

How SQL Server grow and shrink its memory usage based on the available memory in operating system even when Maximum server memory is not configured or defaulted?

 SQL Server memory management is designed to dynamically adjust its memory usage based on the amount of available memory on the system. SQL Server will keep allocating memory based on its need as long as there is memory available I.e. as long as MEMPHYSICAL_HIGH (HighMemoryResourceNotification )notification is signaled in widows  and will scale down its usage when there is MEMPHYSICAL_LOW (LowMemoryResourceNotification) signaled in windows. When available memory is between the low memory and high memory SQL Server will try to maintain the memory usage stable( RESOURCE_MEM_STEADY) with some exceptions.

You can download the ResourceNotificationHighandLow.exe from This link to see memory notifications from windows.

The default level of available memory that signals a LowMemoryResourceNotification event is approximately 32 MB per 4 GB, to a maximum of 64 MB. (By default, the threshold is 64mb on most systems).

The default level that signals a high-memory-resource notification event is three times the default low-memory value (By default, the threshold is 64*3=192 MB on most systems).

Key points:

1. Once the available memory on the system goes below 192 MB HighMemoryResourceNotification (MEMPHYSICAL_HIGH) signal is revoked by windows and SQL Server will not grow its Bpool.

2. Once the available memory on the system goes below 64 MB LowMemoryResourceNotification (MEMPHYSICAL_low) is signaled by windows and SQL Server will shrink its Bpool (reduce its memory usage).

3. When the available memory in the system is between 192Mb and 64 Mb (I.e between LowMemoryThreshold and HighMemoryThreshold) SQL Server will not grow or shrink its usage (With some exceptions which we will see in a while)

Note: So unless there is an crazy application in the system that keeps allocating and releasing memory in Zigzag fashion making  windows trigger HighMemoryResourceNotification and LowMemoryResourceNotification one after the other SQL Server will not grow and shrink its memory usage in Loop continuously. If there are such application in system then even configuring max server memory may not help.

The default Low memory threshold 64MB may not be ideal for all systems.  Ex: Let as assume an application is requesting 150MB of memory suddenly when the available memory is 190 MB and the grant is successful. Available memory will now drop to 40 MB making windows signal the LowMemoryResourceNotification. SQL Server will start responding to the LowMemoryResourceNotification from windows but at the same time windows working set manager will also start trimming the working set of all the processes. Which will bring down the overall performance of the system.

We can increase the LowMemoryThreshold value by making the following registry changes If LowMemoryThreshold set to higher value  OS will notify applications such as SQL on low memory conditions much earlier and SQL Server can respond to memory pressure much early before the system starves for memory and before windows working set manger starts trimming the working set of all the processes.

In Regedit -> go to

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\SessionManager\MemoryManagement

Right click on the right pane,

Select New -> select click DWORD Value -> enter LowMemoryThreshold

Double Click LowMemoryThreshold -> value (choose decimal) -> 512

System Reboot is required to take effect.

In the above example I have set the LowMemoryThreshold to 512 MB hence the MEMPHYSICAL_LOW notification will be signaled as soon as the available memory drops to 512MB and HighMemoryResourceNotification (MEMPHYSICAL_HIGH) will be in signaled state till the available memory is  1536MB (LowMemoryThreshold *3).

After making the above change SQL Server will grow its Bpool memory till the available memory in the system is greater than 1536 MB and as soon as the available memory drops below 1536MB HighMemoryResourceNotification signal will be revoked by windows causing SQL Server to maintain steady state and will not grow its memory usage further but that  doesn’t  mean SQL Server will wait for the LowMemoryResourceNotification notification to scale down its memory usage after the  HighMemoryResourceNotification notification is revoked. SQL Server will always try to keep the available physical memory in the system high (I.e. SQL Server will try to keep the available memory in system to  HighMemoryThreshold (LowMemoryThreshold  * 3 ).

What if I have multiple instances of SQL Server on same server and how they load balance the memory among themselves?

SQL Server will try to balance to balance its memory usage with other instances of SQL Server running on the same box .  As I mentioned earlier SQL Server will try to maintain the available memory on the system to High memory threshold. SQL Server Lazy writer checks If there is disk reads performed in last 10 seconds and if there is no reads for last 10 seconds then SQL Server will reduce its memory usage until HighMemoryResourceNotification is signaled by OS.

Let us see this with an example :

Let us assume there are 2 SQL Server instances running on server with 32 GB of RAM and Lowmemorythreshold is set to 512MB on the system (so HighMemoryThreshold is 1536 (Lowmemorythreshold *3)).

1.            When the OS starts  HighMemoryResourceNotification  is set to on because there is adequate available memory on the server.

2.            SQL Server instance 1 starts first and it will consume memory till the HighMemoryResourceNotification resource notification is revoked ( HighMemoryResourceNotification  will be revoked when available memory drops below 1536 MB).

2.            Now the 2nd SQL Server instance is started and it finds High memory resource notification is revoked so it will not increase its memory usage. 

3.            Lazy writer thread of 1st instance checks if there is any disk reads performed by 1st Instance  in last 10 seconds , If there is no disk reads then first instance will scale down its usage until HighMemoryResourceNotification is signaled by OS (HighMemoryResourceNotification  will be signaled again when available memory becomes 1536 MB).

4.            2nd Instance which is hungry for memory sees the High memory resource notification and starts growing its usage till the high memory notification is revoked.  Once the high memory notification is revoked 2nd instance will stop growing.

5.            1st  instance finds the high memory notification is revoked and will again check if there are any disk reads in last 10 seconds and if there are no reads then It will further scale down till there is high memory resource notification.

6.            Once the high memory is signaled 2nd instance will start growing again.

7.            Over a time each instance will very well balance their memory requirements among themselves. ( I.e. if there is read performed from disk with in last 10 seconds we assume there is additional memory requirement for the instance so it will not scale down while on other hand if there is no reads for more than 10 seconds and if the memory available is below the high memory threshold instance it will scale down to give memory for other instance)  

8.            Instance with higher memory requirement will be consuming more memory than the instance with low memory requirement in some time. This way both the instances will balance their memory requirements with each other.

Note:

1. Above logic may not fit well if the total Physical memory on the system is very low compared with the memory requirements of multiple SQL Server instances running on the system because if you start the second instance while the first SQL Server is running with full memory utilization but still performing lot of reads I.e. RESOURCE_MEM_STEADY and still lot of reads , second instance may take long time or may not scale up its memory usage soon. In such case you can cap the max server memory but the performance of SQL Server will be very poor because of memory contention.

2. Also be cautious when you increase the value of LowMemoryThreshold beyond 512 MB.  Increasing this threshold  increases the range of memory that is available where neither the LowMemoryResourceNotification or HighMemoryResourceNotification object is signaled ( RESOURCE_MEM_STEADY).  So when you have multiple instance , if you start the second instance while the first SQL Server is running with full memory utilization and with lot of reads  I.e. RESOURCE_MEM_STEADY + reads continuously , second instance may take time scale up its memory usage soon and chances of getting Lowmemorythreshold is low because of wider range  of  RESOURCE_MEM_STEADY

FAQ:

1. What will happen when MTL allocations increases?

Available memory in system drops when the MTL consumption increases. If the MEMPHYSICAL_HIGH is set then there will not be any effect to bPool. If MTL consumption increases drastically it might cause available memory to drop further causing windows to trigger LowMemoryResourceNotification.

If LowMemoryThreshold is siganled SQL Server will scale down its bPool usage.

2. Will windows working set manager starts trimming the working set of all processes as soon as the LowMemoryResourceNotification is signaled?

No.

3. What are the other effects of changing LowMemoryThreshold?

There might be other application and drivers which is also using memory notification from windows to grow and shrink memory usage. They will also shrink and grow when there is notification from windows.

4. Why would I need to CAP my SQL Server memory when we have a great dynamic mechanism in SQL Server to grow and shrink its memory usage?

You can leave the max server memory as default  If your operating system is Windows 2008 or above and if you have all the fixes in This link and This link and if you do not have any faulty drivers or applications which will request large amount of memory suddenly and if you are not using large pages memory model else I would suggest capping the Max server memory

If you have decided to configure the Max server memory remember it will not control the overall memory used by SQL Server. There are significant changes in memory allocations controlled by Max server Memory between SQL Server2012 and earlier versions. Let us understand what allocations it controls in SQL Server 2012 and earlier versions of SQL Server

What is controlled by SQL Server Max Server Memory (Extract from SQLServer2012 Memory) ?

SQL Server memory is internally divided in to two regions known as BPOOL and NonBPool (aka MTL or MTR) More details about BPOOL and MTL can be found in This blog.

In earlier versions of SQL Server (Till 2008 R2) “Max Server Memory” controlled the Maximum physical memory Single page allocator (BPOOL)  can consume in SQL Server user address space.

Only the single page allocator was part of BPOOL and Max server memory controlled only BPOOL, so the following allocations came outside BPOOL (Max server memory)

1.Multi-Page allocations from SQL Server [These are allocations which request more > 8 KB and required contiguous memory]

2.CLR allocations [These include the SQL CLR heaps and its global allocations created during startup]

3.Memory used for thread stacks within SQL Server process (Max worker threads * thread stack size). Thread stack size is 512K in 32 bit SQL Server, 904 K in WOW mode and 2 MB in 64-Bit 

4.Direct windows allocations made by Non-SQL Server dll’s ([These include windows heap usage and direct virtual allocations made by modules loaded into SQL Server process. Examples: allocations from extended stored procedure dll's, objects created using OLE Automation procedures (sp_OA calls), allocations from linked server providers loaded in sqlserver process)

SQL Server 2012 memory manager has now clubbed single page allocator and multipage allocator together  as any-size page allocator . As a result, the any-size page allocator now manages allocations categorized in the past as single page and Multi-Page allocations.

1. "max server memory" now controls and includes “Multi pages allocations”.

2. In earlier versions of SQL Server CLR allocated memory was outside BPOOL (Max server memory)   . SQL Server 2012 includes SQL CLR allocated memory in "max server memory".

SQL Server 2012 "max server memory" configuration does not include only the following allocations:

1. Memory allocations for thread stacks within SQL Server process

2. Memory allocation requests made directly to Windows [Ex: Allocations (Heap, Virtualalloc calls ) from 3rd party Dll’s loaded in SQL Server process , objects created using OLE Automation procedures (sp_oa) etc]

Hope you got clarity on allocations controlled by  Max server memory , Let us see how to set it.

How to set correct value for SQL Server Max server memory?

There is no magic formula for this. Estimate the memory required by other applications running on same server, Operating system, Drivers , SQL Server Non- bPool allocations, jobs, anti virus etc.. Make sure you have acceptable available physical memory even when the system is under heavy load.

1. Consider the operating system memory requirement.

     Approximately 1 GB (Would increase if it is DC, cluster etc.)

2. Consider the memory requirements by other applications/processes running on the server.

   You have to derive it based on applications/processes/AV’s running on the system and their memory requirements. (Perfmon Process-> Private bytes and Working set can help)

3. Consider the memory requirements of the drivers/firmwares.

    You have to derive it based on memory requirements by drivers installed on the system. (RAMMAP can help)

4. Consider the NonbPool (aka MTL or MTR) memory requirements by SQL Server.

select  sum(multi_pages_kb)/1024 as multi_pages_mb from sys.dm_os_memory_clerks

(You can skip above query if your SQL Server version is 2012)

+

Max worker threads * 2MB

+

Memory for direct Windows allocations approximately 0 to 300 MB in most of the cases but you may have to increase it if there are many 3 party components loaded in SQL Server process (Including linked server dll’s, 3rd party backup dll’s etc.)

+

If you are using CLR extensively add some additional memory for CLR.   

5. Consider the memory requirement by jobs (Including replication agents, Log shipping etc. )  and packages that will run on the server.

You have to derive (May vary from few Mb’s to GB’s)

6. Consider SSAS and RS memory requirements.

You have to derive

7. Make sure there is good enough free space for operating system.

Approximately (100 MB for each GB till 4G) + (50 MB for each additional GB till 12GB) + (25 MB for each additional GB till your RAM size)

8. Other memory requirements.

If you have any other memory requirement specific to your environment.

Once you have calculated a reasonable value for all the above memory requirements  take the sum of all the above requirements and deduct it with total physical memory to derive an ideal value for your max server memory.

Max server memory=  Total physical memory  – (1+2+3+4+5+6+7+8)

If you still see LowMemoryResourceNotification  or working set below 100% frequently then use This exe which will print the memory information of all the processes and system wide memory information (Global memory status) when the operating system signals  low memory notification. Once you get the output from the exe when there is LowMemoryResourceNotification  review requirements of each process and tweak Max server Memory accordingly.

Important: Make sure you have this fix if you are on windows2003 http://support.microsoft.com/kb/938486

What about Min server memory and should I configure it?

I mentioned earlier that when LowMemoryResourceNotification comes from Windows or HighMemoryResourceNotification is revoked+No reads for 10 seconds , SQL Server scales down its memory usage.

How much it scales down?

Until “Minimum server memory”  is reached (If there is continuous memory pressure on the system).

What happens when you set Max server memory and min server memory to same value?

SQL Server will never scale down its memory usage even when there is memory pressure system wide (Lowphysicalmemory notification  set at system level). Note: This setting does not affect OS from paging.

What are the affects?

When there is LowMemoryResourceNotification  If LPM is not enabled SQL Server’s working set (Bpool + Non bPool )will be paged. If LPM is enabled system will starve for memory  and non-bpool will be paged.

If you do not want SQL Server to scale down its usage when there is LowMemoryResourceNotification  in windows configure Min server memory and Max server memory to same value (Bad choice). 

If you want to limit  “how much SQL Server wants to scale down“ you can configure this value.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

 

Posted in Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: , , , , , | 14 Comments »

SQL Server and VMware ballooning

Posted by Karthick P.K on March 31, 2013

VMware and SQL Server performance

If you are running production SQL Server on VM-Ware double check if you have configured/disabled ballooning for the virtual machine in which SQL Server is running.

What is ballooning?  Method by which VMware host can  reclaim memory from the Virtual machines.

Is it really bad to give memory from Guest to Host ? My opinion  is yes if you are running production SQL Server on VM-Ware.

Why I think it’s bad? SQL Server is memory intensive application and requires adequate memory for smooth running. If SQL Server doesn’t have adequate memory to run you see poor response time, Resource_semaphore/ Resource_semaphore_Query_compile waits, increased I/O, OOM errors, Non-Yield condition’s etc. In addition to all this when memory is reclaimed from virtual machines available memory in windows drops triggering windows to page out the working set of all the processes and you will face all side affects discussed in A significant part of SQL Server process memory has been paged out  .

In worst case it is better not to give memory for SQL Server instead of give and take back. Remember Max server Memory is also a factor which will impact the generation of execution plan by the optimizer, So plan generated when you have X GB of max server memory may not be the right plan to use when you have Y GB actual memory after ballooning reclaimed memory from guest OS.

What if hypervisor runs low in physical memory?  It gives a hint that you  have a poor consolidation. You can pick up the other Virtual machines that are not hosting production SQL Servers  (or SQL Servers) from same Hypervisor and tweak  reservations (or) Increase the maximum memory that can be reclaimed when the hypervisor is under memory pressure.

What if I don’t disable ballooning for my production SQL Server?  Ballooning can slowly take the memory from virtual machine in which SQL Server is hosted and can cause all the problems I mentioned above.

To make things confusing when you look at task manager you may not even realize that ballooning has reclaimed memory from Guest OS, Because total physical memory shown in performance tab includes the memory taken by ballooning driver.

How to identify > Look at the driver locked memory from RAMMAP sysinternals tool. (VM memory performance counters can also be used)

Some of the RAMMAP output captured in production SQL Servers can self-explain. Driver locked value would be few MB’s in normal systems, If the value is very high in VMware virtual machines then you can assume ballooning is reclaiming the remaining memory .  

Below is output of RAMMAP from Virtual machine with 12 GB memory hosting SQL Server with max server memory capped to 8GB.  

Driver locked is around 8GB. So the system is running with less than 4Gb of RAM and how much is for SQL Smile?

clip_image002

Below is output of RAMMAP from Virtual machine with 24 GB memory hosting SQL Server with max server memory capped to 20GB.  

Driver locked is around 16 GB. So the system is running with less than 8Gb of RAM and how much is for SQL Smile?

clip_image004

How to disable ballooning ? Refer http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002586

You don’t agree ? I respect your view but my view is different Smile.

Note: I have not recommended to disable ballooning in every virtual machines. I recommend to do it for your performance sensitive SQL Server and if you find your hypervisor is running low in memory revisit your consolidation (or) Configure other non critical virtual machines running on same host in such a way that hypervisor can reclaim memory from them when it is under low memory condition.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Performance, SQL Server Engine, SQL Server memory | Tagged: , , , , | 2 Comments »

Inside sys.dm_os_ring_buffers

Posted by Karthick P.K on March 29, 2013

Sys.dm_os_ring_buffers DMV can be used to troubleshoot connectivity errors, track exceptions, monitor system health, memory pressure, Non-yielding/Deadlocked schedulers and a lot more.

You can use below scripts to query the data from sys.dm_os_ring_buffers during troubleshooting.

USE master
go
SET NOCOUNT ON
SET QUOTED_IDENTIFIER ON
GO
PRINT 'Start Time: ' + CONVERT (varchar(30), GETDATE(), 121)
GO
PRINT ''
PRINT '==== SELECT GETDATE()'
SELECT GETDATE()
PRINT ''
PRINT ''
PRINT '==== SELECT @@version'
SELECT @@VERSION
GO
PRINT ''
PRINT '==== SQL Server name'
SELECT @@SERVERNAME
GO
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_CONNECTIVITY - LOGIN TIMERS'
 
SELECT a.* FROM
(SELECT 
x.value('(//Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(30)') AS [RecordType], 
x.value('(//Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(30)') AS [RecordSource], 
x.value('(//Record/ConnectivityTraceRecord/Spid)[1]', 'int') AS [Spid], 
x.value('(//Record/ConnectivityTraceRecord/OSError)[1]', 'int') AS [OSError], 
x.value('(//Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') AS [SniConsumerError], 
x.value('(//Record/ConnectivityTraceRecord/State)[1]', 'int') AS [State], 
x.value('(//Record/ConnectivityTraceRecord/RecordTime)[1]', 'nvarchar(30)') AS [RecordTime],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') AS [TdsInputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') AS [TdsOutputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') AS [TdsInputBufferBytes],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]', 'int') AS [TotalLoginTimeInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]', 'int') AS [LoginTaskEnqueuedInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]', 'int') AS [NetworkWritesInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]', 'int') AS [NetworkReadsInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]', 'int') AS [SslProcessingInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]', 'int') AS [SspiProcessingInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]', 'int') AS [LoginTriggerAndResourceGovernorProcessingInMilliseconds]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers 
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') AS R(x)) a
where a.RecordType = 'LoginTimers'
order by a.recordtime 
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_CONNECTIVITY - TDS Data'
 
SELECT a.* FROM
(SELECT 
x.value('(//Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(30)') AS [RecordType], 
x.value('(//Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(30)') AS [RecordSource], 
x.value('(//Record/ConnectivityTraceRecord/Spid)[1]', 'int') AS [Spid], 
x.value('(//Record/ConnectivityTraceRecord/OSError)[1]', 'int') AS [OSError], 
x.value('(//Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') AS [SniConsumerError], 
x.value('(//Record/ConnectivityTraceRecord/State)[1]', 'int') AS [State], 
x.value('(//Record/ConnectivityTraceRecord/RecordTime)[1]', 'nvarchar(30)') AS [RecordTime],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') AS [TdsInputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') AS [TdsOutputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') AS [TdsInputBufferBytes],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]', 'int') AS [PhysicalConnectionIsKilled],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]', 'int') AS [DisconnectDueToReadError],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]', 'int') AS [NetworkErrorFoundInInputStream],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]', 'int') AS [ErrorFoundBeforeLogin],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]', 'int') AS [SessionIsKilled],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]', 'int') AS [NormalDisconnect]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers 
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') AS R(x)) a
where a.RecordType = 'Error'
order by a.recordtime
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_SECURITY_EORROR'
 
SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, rbf.[timestamp] - tme.ms_ticks, GETDATE()) as [Notification_Time],
cast(record as xml).value('(//SPID)[1]', 'bigint') as SPID,
cast(record as xml).value('(//ErrorCode)[1]', 'varchar(255)') as Error_Code,
cast(record as xml).value('(//CallingAPIName)[1]', 'varchar(255)') as [CallingAPIName],
cast(record as xml).value('(//APIName)[1]', 'varchar(255)') as [APIName],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time],tme.ms_ticks as [Current Time]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
where rbf.ring_buffer_type = 'RING_BUFFER_SECURITY_ERROR'
ORDER BY rbf.timestamp ASC
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_EXCEPTION'
 
SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as Time_Stamp,
cast(record as xml).value('(//Exception//Error)[1]', 'varchar(255)') as [Error],
cast(record as xml).value('(//Exception/Severity)[1]', 'varchar(255)') as [Severity],
cast(record as xml).value('(//Exception/State)[1]', 'varchar(255)') as [State],
msg.description,
cast(record as xml).value('(//Exception/UserDefined)[1]', 'int') AS [isUserDefinedError],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type], 
cast(record as xml).value('(//Record/@time)[1]', 'int') AS [Record Time],
tme.ms_ticks as [Current Time]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
cross join sys.sysmessages msg
where rbf.ring_buffer_type = 'RING_BUFFER_EXCEPTION' 
and msg.error = cast(record as xml).value('(//Exception//Error)[1]', 'varchar(500)') and msg.msglangid = 1033 
ORDER BY rbf.timestamp ASC

PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_RESOURCE_MONITOR to capture external and internal memory pressure'

SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime], 
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as [Notification_Time],  
cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') AS [Notification_type],  
cast(record as xml).value('(//Record/MemoryRecord/MemoryUtilization)[1]', 'bigint') AS [MemoryUtilization %],  
cast(record as xml).value('(//Record/MemoryNode/@id)[1]', 'bigint') AS [Node Id],  
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') AS [Process_Indicator],  
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') AS [System_Indicator], 
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@reversed)[1]', 'int') AS [reserved], 
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[1]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@reversed)[1]', 'int') AS [reserved],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[2]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@reversed)[1]', 'int') AS [reserved],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[3]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/MemoryNode/ReservedMemory)[1]', 'bigint') AS [SQL_ReservedMemory_KB],  
cast(record as xml).value('(//Record/MemoryNode/CommittedMemory)[1]', 'bigint') AS [SQL_CommittedMemory_KB],  
cast(record as xml).value('(//Record/MemoryNode/AWEMemory)[1]', 'bigint') AS [SQL_AWEMemory],  
cast(record as xml).value('(//Record/MemoryNode/SinglePagesMemory)[1]', 'bigint') AS [SinglePagesMemory],  
cast(record as xml).value('(//Record/MemoryNode/MultiplePagesMemory)[1]', 'bigint') AS [MultiplePagesMemory],  
cast(record as xml).value('(//Record/MemoryRecord/TotalPhysicalMemory)[1]', 'bigint') AS [TotalPhysicalMemory_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [AvailablePhysicalMemory_KB],  
cast(record as xml).value('(//Record/MemoryRecord/TotalPageFile)[1]', 'bigint') AS [TotalPageFile_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailablePageFile)[1]', 'bigint') AS [AvailablePageFile_KB],  
cast(record as xml).value('(//Record/MemoryRecord/TotalVirtualAddressSpace)[1]', 'bigint') AS [TotalVirtualAddressSpace_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [AvailableVirtualAddressSpace_KB],  
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],  
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],  
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time], 
tme.ms_ticks as [Current Time] 
FROM sys.dm_os_ring_buffers rbf 
cross join sys.dm_os_sys_info tme 
where rbf.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR' --and cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') = 'RESOURCE_MEMPHYSICAL_LOW' 
ORDER BY rbf.timestamp ASC


PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_SCHEDULER_MONITOR to Monitor system health'

SELECT  CONVERT (varchar(30), GETDATE(), 121) as runtime, DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE()) AS Notification_time,    a.* , sys.ms_ticks AS [Current Time]  
FROM   (SELECT x.value('(//Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS [ProcessUtilization],    
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle %],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/UserModeTime) [1]', 'bigint') AS [UserModeTime],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/KernelModeTime) [1]', 'bigint') AS [KernelModeTime],    
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/PageFaults) [1]', 'bigint') AS [PageFaults],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/WorkingSetDelta) [1]', 'bigint')/1024 AS [WorkingSetDelta],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/MemoryUtilization) [1]', 'bigint') AS [MemoryUtilization (%workingset)],   
x.value('(//Record/@time)[1]', 'bigint') AS [Record Time]  FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers    
WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR') AS R(x)) a  CROSS JOIN sys.dm_os_sys_info sys ORDER BY DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE())


 

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in SQL General, SQL Server Engine, SQL Server memory, SQL Server Tools | Tagged: , , , , , | 1 Comment »

SQL Server lock pages in memory

Posted by Karthick P.K on March 26, 2013

Lock pages in memory is again a black box for many SQL Server DBA’s. Configuration choice to enable lock pages in memory depends on various aspects.

Before we get in to internals we will recollect some of the basics of SQL Server lock pages in memory in 64-Bit SQL Servers.

What is Locked pages in memory in windows?

Its user right for windows account and can be enabled by using secpol.msc or gpedit.msc

Why do we need this privilege for SQL Server startup account?

There are 3 different memory models in 64-bit SQL Server. They are conventional ,locked pages and large pages memory model.

Locked pages memory model: In lock pages memory mode SQL Server uses allocateuserphysicalpages and mapuserphysicalpages function to allocate memory. Caller token of this function should have LPIM privilege else the function call would fail, hence you need LPIM for startup account of SQL Server  to use lock pages memory mode.

Large pages memory model: In large pages memory model I.e When you use TF834 in enterprise edition on systems with physical memory  >8GB  SQL Server uses large pages memory model. In this memory model SQL Server uses vitualalloc API with MEM_LARGE_PAGES allocation type. For using MEM_LARGE_PAGES in virtualalloc caller token must have LPIM privilege.

Memory allocated using AWE allocator API’s (or) Virtualalloc function with MEM_LARGE_PAGES are not part of Process working set ,hence cannot be paged out and not visible in private bytes or working set in task manger and Perfmon. process. Private bytes (or) Perfmon.process. working set.

What is the Advantage of using Lockedpages or Largepages? SQL Server working set (BPOOL) cannot be paged by windows  even when there is system wide memory pressure.

Disadvantage: Operating system will starve for memory when there is system wide memory pressure. OS has to completely rely on SQL Server to respond to low memory notification and scale down its memory usage . SQL Server may not respond fast enough to low memory condition at system level because OS is already starving for memory. LPIM prevents only the BPOOL from paging, Non-Bpool components can still be paged and we have critical portions of SQL Server memory like thread stack, SQL Server Images/DLL’s  in Non-Bpool which can still be paged by OS.

So many disadvantage…. But still why do we recommend LPIM in some places?

In earlier versions of windows 2003 (If This fix is not applied) when there is system wide memory pressure windows memory manger would trim one-quarter of working set of all the process. Imagine If SQL Server is using 200GB of RAM and there is system wide memory pressure, Windows memory manager would move 50 GB of SQL Server working set to page file and we would end with performance problems. If LPIM is enabled OS cannot trim. Imagine there is a faulty application/drivers in the server and it leaks memory  fast , It might consume all the memory in the server and windows memory manager might trim all of SQL Server working set.

Known issues in windows like the one in This and few in windows 2008 mentioned in  This  link can cause windows memory manager to trim the working set of SQL Server process suddenly. Windows has a background process which keeps  writing the contents of working set to page file, so when there is paging only the dirty pages  needs to be moved to the page file others are already backed by back ground process, So paging would be very fast and SQL Server working set would be moved to page file in seconds before SQL Server responds to low memory resource notification from OS causing negative performance. In systems with large amount of memory (Ex: 1 TB )we might get non yielding scheduler situations when allocating memory in conventional memory model. LPIM is only option is this case. LPIM can be used in servers in which it might take long time to identify the cause of the working set trim. It is always suggested to identify the cause of TRIM before choosing LPIM in first place. You can use the steps mentioned in This link to troubleshoot working set trims.

Note:

1. Local system account has LPIM privilege by default, so if you are using local system as startup account of SQL Server then SQL Server might be using lock pages memory model by default with out your knowledge.

2. In earlier versions of SQL Server (Till 2008R2) you need TF845 with fix in KB970070 in standard and BI edition to make use of lock pages memory model.

 

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Memory, SQL Server Engine, SQL Server memory | Tagged: , , | 4 Comments »

SQL Server -g

Posted by Karthick P.K on March 5, 2013

I decide to write this quick blog after seeing a lot of confusion around sqlserver -g switch.  sql server -g switch is nop in 64-Bit sqlserver and it is used in 32-bit sqlserver to increase the size of MTL(AKA MemToLeave).

The default value of -g switch is 256 MB. I.e. if you do not specify value for sqlserver -g switch it is defaulted to 256 MB.

Initialization of sqlserver memory during the startup of SQL Server is as follows.

1. Calculate the size of MemtoLeave and reserve it using the algorithm below

MTL (Memory to Leave)= (Stack size * max worker threads) + Additional space (By default 256 MB and can be controlled by -g).

Stack size =512 KB per thread for 32 Bit SQL Server and 904 KB for 32Bit SQL Server running on 64-Bit systems.

I.e = (256 *512 KB) + 256MB =384MB

-g switch is used to increase the additional space from 256 to any desired value.

2. Calculate the size of BPOOL using below algorithm.

SQL Server Buffer Pool is minimum of “Physical RAM “ or “user mode memory(2GB or 3GB) – MTL- BUF structures”

BPool = Minimum (Physical memory, User address space – MTL) – BUF structures

Buf structures are arrays maintained by sqlserver to track the status of each buffer in BPOOL . SQL Server makes maximum of 32 allocation requests to the OS to reserve bpool pages.

SQL Server maintains contiguous array to track status information associated with each buffer (8 KB page) in the BPool. In addition SQL Server maintains a second array to track the committed and reserved bitmap pages.

This bit can be 0 or 1 . 1 indicates buffer is committed and 0 indicated page is reserved.

Size of Buf structure is approximately 16 MB when AWE is not enabled and when AWE is enabled buf structures use additional 8MB per each GB of RAM in the system.

3. Release the MTL region which is reserved initially. We reserve MTL at startup and releases it after BPOOL is reserved to ensure MemToLeave region are contiguous.

More details about the SQL Server memory architecture in http://mssqlwiki.com/sqlwiki/sql-performance/basics-of-sql-server-memory-architecture/

If you are in SQL Server 2012 read http://mssqlwiki.com/2012/10/21/sql-server-2012-memory-2/

 

Related posts:

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Memory, SQL Server Engine, SQL Server memory | Tagged: , , , , | 8 Comments »

SQL Server Operating system (SOS) – Series 3

Posted by Karthick P.K on February 11, 2013

Thread synchronization

When we discussed about thread I mentioned In multi-threaded applications each thread has to synchronize their activities among other threads. Sometimes a thread has to wait for other thread to complete before it can execute (Ex: SQL Server blocking)   sometimes a thread has to synchronize with other thread and continue execution (Ex: CX packet).  If we allow multiple thread access the same resource they might get corruption or inconsistency.

Windows offers different ways to synchronize multiple threads before we jump in to different synchronization techniques let us see why thread synchronization is very important using the small program below.

In the below program I have declared two global called a and b and set the values of this globals to 0 . We define the number of thread which we are going to create in global named Threadcount (64).

We create 64 thread in main function and each thread start executing a function called Submain. In Submain each thread increases the value of a and b  1000 times.

So ideally value of A and B has to be 64,000 at the end of program execution  (64 Threads *1000 increments).

Let us check what happens.

#include <windows.h>
#include <string>
#include <iostream>
#include <process.h>    /* _beginthread, _endthread */
long a=0;
long b=0;
long g_InUse = FALSE;
long g_fResourceInUse = FALSE;
int Threadcount=64;
int s=Threadcount;
bool d=FALSE;

void Submain(void *x)
{
       for (int L=0;L<1000;L++)
              {

                     a=a++;

                     while (InterlockedExchange (&g_InUse, TRUE) == TRUE)
                     {
                           Sleep(0);
                     }
       //Sleep(10); //-->How Spinlock can cause CPU Spike
       b=b++;

       InterlockedExchange (&g_InUse, FALSE);
       }

/*
s=s-1;  //Simple synchronization technique. May be useful if you like to increase the thread count WaitForMultipleObjects support value defined for MAXIMUM_WAIT_OBJECTS 64
       if(s==0)
       {
              d=TRUE;
       }
*/
_endthread();
}

void main()

{

HANDLE *hThreads;
hThreads = new HANDLE[Threadcount] ;
for (int i=0;i<Threadcount;i++)
{
hThreads[i]=  CreateThread(NULL,NULL,(LPTHREAD_START_ROUTINE  )Submain,  NULL,  0,  NULL);

              if (hThreads[i]==NULL)
              {
                     printf("\nThread creation failed for thread %d with error %d",i,GetLastError());
              }

}
SetLastError(0);

DWORD rw=WaitForMultipleObjects(Threadcount,hThreads,true,INFINITE);

//while(!d); //Simple synchronization technique

printf("Value of a is:%d\n" ,a);
printf("Value of b is:%d\n" ,b);
system("pause");
}

 
clip_image002

Why the value of a and b are different and why b is accurate while a is not. If you look at the program closely atomic access to global b is guaranteed using the InterlockedExchange function so only one thread could access global b any time while there was no synchronization for global a so the end value is incorrect.

Thread synchronization can be achieved in user mode or using kernel objects

User mode thread synchronization: Threads can be synchronized in User mode using interlocked family functions or using critical sections. User mode thread synchronization is faster than using kernal objects. In the above program we used interlocked family function to synchronize the threads to access of gloabal b.  interlocked family functions should be used with caution on multiprocessor system and should be avoided in uniprocessor machines.

Spinlock: A method by which we continuously check  if the resource is available. I the above program global a and b are resource. Since we guaranteed atomic access to global b. Only one thread can access it at any time so what about the other threads they continuously spin to check if the resource becomes available.  Look at the below portion of above program.  While loop checks the value of g_InUse. If the value is FALSE the resource was not is use and calling thread will set the value to TRUE so other threads cannot access it and continue the execution. Once it completes its task  i.e. incrementing the value of b it sets the value of g_InUse to false so others can access it.  If the value is false then some other thread is currently using the global resource b and the while loop continues to spin.

while (InterlockedExchange (&g_InUse, TRUE) == TRUE)
                     {
                           Sleep(0);
                     }
                     //Sleep(10); //-->How Spinlock can cause CPU Spike
                     b=b++;
              InterlockedExchange (&g_InUse, FALSE);
                    }

 

Incorrect use of spinlock can waste CPU and can cause extreme CPU spikes. In the above program uncomment the line “Sleep(10); //–>How Spinlock can cause CPU Spike” and build the exe and execute it. Look at your task manger and check the CPU utilization. It would be extremely high because each time a thread takes lock on  of g_InUse. It sleeps for 10 milliseconds, increments the value id b and then releases the lock. While the other threads continuously spins to check if the resources are available thus causing CPU spike. In real time a thread may not sleep after taking a lock but assume it is performing some task which takes time and other threads will keep spinning consuming CPU.

Critical section: Like interlocked family functions critical section is also used to guarantee atomic  access to a resource. Major difference between the interlocked functions and critical section is  when criticalsection is owned by other thread calling thread is immediately placed in waitstate, so thread transits from user to kernel mode  and this transition is expensive (about 1000 CPU cycles as per Jeffrey Richter)  . when the thread which owns the critical section releases the critical section one of the waiting thread is signaled and scheduled. Programmers should make wise decision on when to use interlocked family functions and critical sections.  Above program caused severe CPU spike after we uncommented line “Sleep(10); //–>How Spinlock can cause CPU Spike” let us do the same implementation using Critical section in below program.

#include <windows.h>
#include <string>
#include <iostream>
#include <process.h>    /* _beginthread, _endthread */
long a=0;
long b=0;
int Threadcount=64;
int s=Threadcount;
bool d=FALSE;
CRITICAL_SECTION  gcs;

void Submain(void *x)
{
       for (int L=0;L<1000;L++)
              {

                     a=a++;
                     EnterCriticalSection(&gcs);
                     Sleep(10); //-->How Spinlock can cause CPU Spike
                     b=b++;
                     LeaveCriticalSection(&gcs);
              }

/*
    s=s-1;  //Simple synchronization technique. May be useful if you like to increase the thread count WaitForMultipleObjects support value defined for MAXIMUM_WAIT_OBJECTS 64
       if(s==0)
       {
              d=TRUE;
       }
*/
_endthread();
}

void main()

{

HANDLE *hThreads;
hThreads = new HANDLE[Threadcount] ;
InitializeCriticalSection(&gcs);
for (int i=0;i<Threadcount;i++)
{
hThreads[i]=  CreateThread(NULL,NULL,(LPTHREAD_START_ROUTINE  )Submain,  NULL,  0,  NULL);

              if (hThreads[i]==NULL)
              {
                     printf("\nThread creation failed for thread %d with error %d",i,GetLastError());
              }

}
SetLastError(0);

DWORD rw=WaitForMultipleObjects(Threadcount,hThreads,true,INFINITE);
DeleteCriticalSection(&gcs);
//while(!d); //Simple synchronization technique

printf("Value of a is:%d\n" ,a);
printf("Value of b is:%d\n" ,b);
system("pause");
}

 

After building the above program run the executable and you will notice that it doesn’t consume high CPU. Does this mean critical section is better than interlock functions? No. It depends.  In this exe lock is held for long time so critical section was ideal.  Assume each thread would have got access to the resource after spinning once (or) twice  then definitely interlock functions would have been an ideal choice because we would have avoided transition of each thread from user mode to kernel mode which is expensive. There is also a API called InitializeCriticalSectionAndSpinCount. What is the difference between InitializeCriticalSection and InitializeCriticalSectionAndSpinCount? InitializeCriticalSectionAndSpinCount Spins to acquire resource  n mumber of time and only if all attempts fail then the thread transits to kernel mode.

Thread deadlock: Similar to SQL Server locks what happens when two threads wait to acquire critical sections owned on resource owned by other? If there is no timeout threads will attempt to wait forever and will never get scheduled. In SQL Server we have deadlock monitor to detect this condition and rollback one of the transaction but windows doesn’t offer any such facility.

Orphan or unreleased critical section: When a thread takes critical section it is expected to release it, Assume a flaw in code or exception caused a thread to abort after taking a critical section and before releasing it, Critical section taken by the terminate thread is never destroyed and all the other threads will wait indefinitely on it. 

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in SQL Server Engine, SQLServer SOS | Tagged: , , , , , , , , | 1 Comment »

SQL Server Operating system (SOS) – Series 2

Posted by Karthick P.K on January 13, 2013

Context Switching:
When a thread is yielded from CPU windows stores the CONTEXT (current state such as CPU registers, program counters Etc) information of the current thread and loads the CONTEXT of the new thread which will run in the CPU.  
Why? A thread is yielded when executing an instruction, How the thread will resume from same point when it is rescheduled in CPU. Context information is stored while yielding and loaded when thread in rerun.
Note: yielding from CPU=Coming out of CPU

If there is high Context switching then system would spend more time on doing context switching than doing meaningful work.
 
Yielding:
When a thread moves out of CPU it is called as yielding.

When a thread can yield?
Thread which is running on CPU can yield out of the CPU under following key condition

Voluntary yield
Thread decides to yield by itself because of the logic in code executed by the thread. Generally a thread will voluntarily yield by calling Sleep(0) or SwitchToThread. When a thread voluntarily yields it is placed at the end of runnable list.
Ex: SQL Server thread will yield after sorting 64K of sort records.

Preempted
Thread which is running on the CPU will be forced to yield from CPU when a thread with higher priority is ready to run. When a thread is preempted it is placed in the beginning of the runnable list. 

Quantum end
All the threads which is executed in operating system will get a time slice called as quantum. When a thread completes its quantum it is yielded and next thread is run. If there is no other thread is ready to run than thread run for another quantum.

Termination
Thread is terminated when it finishes execution and destroyed by calling TerminateThread

Thread and process priorities

windows supports thread priority level ranging from 0 (Lowest) to 31 (Highest). If all the threads have same priority they are scheduled in  round robin basis, but in reality threads running on OS will have different priorities. Among all the threads which can be run windows scheduler picks thread with highest priority to run first. Priority of a thread can be changed using WINAPI SetThreadPriority, Similarly priority of a process can be set while creating the process  (WINAPI CreateProcess  dwCreationFlags ) ,using WINAPI SetPriorityClass after the process is created and by using tools like task manager.

Let us attach the debugger to SQL Server process and see how to view the threads, thread stack and how SQL Server threads wait with out being scheduled.  

Download the windows debugger from below link

Windbg 32-bit package:

http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

Windbg X64 package:

http://msdl.microsoft.com/download/symbols/debuggers/dbg_amd64_6.11.1.404.msi

 

1. Start SQL Server in your test system.

2. Attach the debugger to SQL Server process. Refer below image. If you have more than one instance use the process id to attach with correct SQL Server process.

 

clip_image002

3. On command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

4. Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

5. Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

lmvm sqlservr

6. Type

!peb  to display the information about process from process environment block.

7.  Type ~ to display all the threads in the process. First column in the output represents thread ID.

8. To  look at the stack of a specific thread,switch to the thread using thread ordinal (or) thread ID.

Debugger command ~ displays all the thread’s of the process.

Thread ordinal: is decimal value used by debugger to identify the thread starts from 0 (First columns in below output).

Thread ID : Is the ID assigned to each thread by operating to system. You can switch to a thread using thread ID by debugger command ~~[ThreadID]s     (4th column in ~ output )

In the below image I have printed all the thread’s and stack of thread ordinal 8 which is scheduler monitor thread.

 

clip_image002[8]

9. Type  g in command prompt to resume the process.

10. Connect to SQL Server from management studio. Run select * from sysprocesses. All the sessions which has a non zero value for KPID has a valid windows thread associated with the session. When executing queries which choose parallel plan there will be more than one row for same session and each rows will have different KPID.

11. To look at the thread stack of a session which is currently executing a task or background process . Convert KPID value associated with session in sysprocess in Hexadecimal value and type below command window of debugger.

~~[Hex value of a thread]s

Type kC

12.  Let us create a small blocking scenario to understand how threads wait in SQL Server.

Session-1

create table a (A int)
go
insert into a  values (1);
go
begin transaction
update a set A =1+1

Session-2

select * from a

Session-2 will be blocked. Look at the stack of blocked thread

13. Run select * from sysprocesses where blocked<>0

Identify the KPID of the session which is blocked.

14. Convert KPID in to hexadecimal value

15. Break the debugger to execute the debugger commands (CTRL+B) or 7th Icon in menu bar.

16. Look at the stack of the thread which is waiting for lock (Blocked) by using the Hex value of KPID

~~[Hex value of KPID]s

kC

clip_image002[10]

 

17. Above thread from second session which we created is waiting for an event using WaitForsingleObject. When the first session releases the lock this thread will be signaled and resumes execution.

We will see the details about WaitForsingleObject ,Waitformultipleobjects, Event (Manual auto reset) etc. in more details in forthcoming blogs

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

Posted in SQL Server Engine, SQLServer SOS | Tagged: , , , , , , , , | 2 Comments »

SQL Server Operating system (SOS) – Series 1

Posted by Karthick P.K on January 10, 2013

Before we start studying SQLOS we will recollect some of the basic OS concepts. 

What is process?

Process is instance of service or application which is running. Each process will have address space that contains all the Executable, Dlls, data, thread stacks etc.  Operating system maintains a kernel objects for each process to manage the process. One or more threads which runs under the context of the process and execute the code in address space. Each thread can execute the code and maintains its own set of CPU registers and stack. When a process is created primary thread of the process is created and starts executing the main() or Other similar function.  Primary thread can create additional threads using create thread function and lpStartAddress (Thread entry point ) defines the function that is to be executed by the thread which is created.

 

 

Ex: Ureadfile will be the thread entry point for the new thread which is created using below code.

CreateThread(0,0,(LPTHREAD_START_ROUTINE  )Ureadfile,(LPVOID)&PSUreadfile[i],  0,  NULL);

 

 

What is thread?

Threads executes the code in process.  There can be one  (Single threaded ) or more( Multi-threaded) threads for every process. In multi-threaded applications each thread has to synchronize their activities among other threads. Ex: Allowing one thread to modify a Global while other is reading it can cause race conditions.

SQL Server uses synchronization techniques like Spinlocks, Latches, Events Etc.

   

Threads states

Threads can be in below core states (There are other states which we will discuss on need)

 

Waiting

Wait state represents thread is waiting for some resource. A thread in this state is not eligible to be scheduled from OS.

SQL Server threads can be in wait state in multiple places. A thread requesting for lock has to wait till it is available and goes for sleep unless signaled when the lock is available.

A thread can call WaitForSingleObject  and wait without competing  CPU resource

 

Ex:

LMHandle = CreateMemoryResourceNotification(LowMemoryResourceNotification);

WaitForSingleObject( LMHandle,INFINITE);

In above example Thread will wait till there is Lowmemoryresourcenotification from windows.

Running

Thread is running in CPU.

 

Ready

Thread is ready to run and waiting for its CPU slice. In SQL Server thread which are ready to run on scheduler will stay in runnable list of scheduler till they get chance to run on scheduler.

Quantum
All the threads which is executed in operating system will get a time slice to run in CPU called as quantum.Thread is yielded from scheduler after its quantum is completed.

 

Scheduling

 

Preemptive scheduling:

Operating system can interrupt the thread execution any time. OS can halt the thread execution and schedule another thread to run at any time.

 

Non-Preemptive scheduling:

Operating system cannot interrupt the thread execution any time. The worker owns the scheduler until it yields to another worker on the same CPU. If the thread which runs on CPU(Scheduler) don’t yield in time it monopolizes the CPU until it finishes. 

Windows 3.x and DOS were using Non-Preemptive scheduling. In Non-Preemptive scheduling context switching is generally reduced because the operating system does not interrupt code execution and It is easier to implement a multi-threaded application in Non-Preemptive mode because synchronization may be less of an issue.  A bad application can easily ‘hang’ the entire system if thread from application does not yield from CPU allowing other applications threads to execute.

 

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in SQL Server Engine, SQLServer SOS | Tagged: , , , , , , | 2 Comments »

SQL Server fails to start with error "Failed allocate pages: FAIL_PAGE_ALLOCATION 1" During startup

Posted by Karthick P.K on January 6, 2013

SQL Server fails to start and If you look at the SQL Server Error log you will find "Failed allocate pages: FAIL_PAGE_ALLOCATION" and SQL Server generating exception dump. Similar to the SQL Server error log below.

 

Note: This blog is applicable when you out get of memory error during startup (or) with event ID: 2019 in system event log. For general troubleshooting of SQL Server out of memory errors follow steps in Troubleshooting SQLServer Memory

{

2013-01-02 12:31:20.91 Server      Microsoft SQL Server 2008 R2 (SP1) – 10.50.2500.0 (Intel X86)

                Jun 17 2011 00:57:23

                Copyright (c) Microsoft Corporation

                Enterprise Edition on Windows NT 5.2 <X86> (Build 3790: Service Pack 2)

2013-01-02 12:31:20.91 Server      (c) Microsoft Corporation.

2013-01-02 12:31:20.91 Server      All rights reserved.

2013-01-02 12:31:20.91 Server      Server process ID is 1583.

2013-01-02 12:31:20.91 Server      Authentication mode is MIXED.

2013-01-02 12:31:20.91 Server      Logging SQL Server messages in file ‘C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\Log\ERRORLOG’.

2013-01-02 12:31:20.91 Server      This instance of SQL Server last reported using a process ID of 9240 at 1/3/2013 7:31:20 PM (local) 1/4/2013 12:31:20 AM (UTC). This is an informational message only; no user action is required.

2013-01-02 12:31:20.91 Server      Registry startup parameters:

                 -d C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\DATA\master.mdf

                -e C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\Log\ERRORLOG

                -l C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\DATA\mastlog.ldf

2013-01-02 12:31:20.92 Server      SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.

2013-01-02 12:31:20.92 Server      Detected 24 CPUs. This is an informational message; no user action is required.

2013-01-02 12:31:20.94 Server      Address Windowing Extensions is enabled. This is an informational message only; no user action is required.

2013-01-02 12:31:27.33 Server       Failed allocate pages: FAIL_PAGE_ALLOCATION 1

2013-01-02 12:31:27.33 Server     

Memory Manager                                   KB

—————————————- ———-

VM Reserved                                 1534584

VM Committed                                  51576

AWE Allocated                                     0

Reserved Memory                                1024

Reserved Memory In Use                            0

2013-01-02 12:31:27.33 Server      Error: 17311, Severity: 16, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.

2013-01-02 12:31:27.33 Server      Using ‘dbghelp.dll’ version ’4.0.5′

2013-01-02 12:31:27.34 Server      **Dump thread – spid = 0, EC = 0×00000000

2013-01-02 12:31:27.34 Server      ***Stack Dump being sent to C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\LOG\SQLDump0008.txt

2013-01-02 12:31:27.34 Server      * *******************************************************************************

2013-01-02 12:31:27.34 Server      *

2013-01-02 12:31:27.34 Server      * BEGIN STACK DUMP:

2013-01-02 12:31:27.34 Server      *   01/03/13 19:31:27 spid 4344

2013-01-02 12:31:27.34 Server      *

2013-01-02 12:31:27.34 Server      * ex_handle_except encountered exception C0000005 – Server terminating

 

}

 

 

Why would SQL Server fail with out of memory error (FAIL_PAGE_ALLOCATION)during the startup? Only possible reason that I could think of is Paged or NonPaged pool is empty.

How to prove if my Paged / NonPaged pool is empty?  Look at the system event log for the Event ID: 2019

 

You will find error in system event log similar to one you see below.

{

Event Type:        Error

Event Source:    Srv

Event Category:                None

Event ID:              2019

Date:                     2013-01-02

Time:                     12:31:00 PM

User:                     N/A

Computer:          MSSQLWIKIServer

Description:

The server was unable to allocate from the system nonpaged pool because the pool was empty.

}

 

Above error indicates nonpaged pool is empty, When Nonpaged pool is empty every application would fail. How to identify who is consuming Nonpaged pool?

 

Use poolmon.exe from windows support tools. (Steps are documented in This KB).

If you r OS is windows 2003 or above you can simple run the exe from command prompt and identify who is consuming (Leaking J) space in Paged / NonPaged pool.

Below is sample output of poolmon.exe which I collected from my test system

 

clip_image002[4]

 

Memory consumption by each tag is printed in above output. After finding the tag which is leaking the memory (Highest bytes)identify the Driver which is using the tag by using  find command or strings utility from sysinternals (search for TAG in drivers folder %Systemroot%\System32\Drivers). Once you identify the driver, check if there are any known issue with the driver or you may have to contact the vendor of the driver to identify why the driver is consuming large amount of pooled /Non-pooled memory.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group MSSQLWIKI and post your SQL Server questions to get answered by experts.

 

 

Thank you,

Karthick P.K |

 My Facebook Page |My Site| Blog space| Twitter

 

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Posted in Configuration, Memory, SQL Server Engine, SQL Server memory, Startup failures | Tagged: , , , , | 16 Comments »

Debugging memory Leaks using Debug diagnostic tool.

Posted by Karthick P.K on December 6, 2012

In my previous post (SQL Server memory leak ) I explained how to identify the modules  which are leaking the memory using ‘!heap’ commands.  Sometimes we may not be able to find the cause by displaying the memory using ‘!d’ commands to find the patterns or using search memory commands (s).

In such scenarios  we can use  Debug Diagnostic Tool or UMDH to track memory leaks.   This blog will explain how to identify the memory leaks using Debug diagnostics tools.

 

Download and install Debug Diagnostic Tool from http://www.microsoft.com/en-us/download/details.aspx?id=26798

 

1. Go to ToolsàOptions ->Preferences àSelect Record call stacks immediately when monitoring the leaks.

 

clip_image002

 

2. Go to the rules tab and select add rule.

 

3. Choose Native (non .Net) memory leak and handle leak.

 

4. Select the SQL Server or any process which has to be tracked for memory leak.

 

5. Click next and leave the default options (you can choose auto-unload Leak track when rule is completed or deactivated).

 

6. Click next and Activate the rule now.

 

7. Leaktrack.dll would have loaded to the process for which we are tracking the allocations.

 

8.  Now you can wait for the leak to happen again.

 

{

–If you are learning how to troubleshoot SQL Server memory leak follow the steps which we followed in previous post (http://mssqlwiki.com/2012/12/04/sql-server-memory-leak/)to leak the memory.

 

–Download HeapLeak.dll from this link.

–Create  an extended stored procedure in SQL Server

sp_addextendedproc ‘HeapLeak’,‘C:\HeapLeakdll\HeapLeak.dll’

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

}

 

9. Once you suspect memory is leaked. Go to the rules and take a full user dump by right clicking the Leak rule.

 

clip_image004

 

 

10. After the dump is captured , go to the advanced analysis tab, Add data files and select the dump which we generated.

 

11. Go to ToolsàOptions ->set the symbol path for analysis. Default Microsoft symbol path is below.

 

srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;c:\Release

 

Important: Replace c:\Release with symbol path of dll’s which you have loaded in SQL Server (optional)

 

 

11. In the available analysis script select memory pressure analyzers (memory analysis.asp).

 

12. Click start analysis.

 

clip_image006

 

 

13. Analysis might take a while depending on time it takes to load the symbols. Once the analysis is completed it would generate and open a HTML report.

This HTML report is  stored in C:\Program Files\DebugDiag\Reports\ by default and can be used for later reference.

 

I have attached a sample report which I collected when leaking memory using heapleak.dll in This link. You can use it for reference.

 

 

Report generated by debug diagnostic tool memory pressure analyzer will have the analysis summary and below Table Of Contents

 

sqlservr.exe__…………dmp

   Virtual Memory Analysis Report

   Heap Analysis Report

   Leak Analysis Report

   Outstanding allocation summary

    Detailed module report (Memory)

    Detailed module report (Handles)

 

 

 

14.  Analysis summary is  good portion in the report to start with and would give the module which is leaking the memory. Look at the below report.

 

clip_image008

 

 

15. Report has clearly indicated HeapLeak.dll has 255 MB of outstanding allocations. In heapleak.dll “Sub“ is the function which allocated this memory at offset 23.

 

 

16. Look at the virtual memory summary. It gives complete picture about memory distribution in the virtual address space. In the below summary memory reserved is 1.57 GB which is normal in 32-Bit SQL Server, but native heaps is 272.94 MB which is not normal.

Look at the heap summary there are 50 heaps.

 

 

 

 

 

clip_image010

 

clip_image012

 

17. Now look at the Outstanding allocation summary. It gives top 10 modules by allocation count and allocation size. In below summary HeapLeak has 26,182 allocations with size of 255.6 MB.

Note: In this report it is HeapLeak but in real time it might be any module which is leaking the memory

.

 

clip_image014

 

18. You can  also look at detailed module report(Memory).  It gives the memory allocation from each module along with function  and source line which allocated the memory (If you set the symbols for all the modules loaded).

 

clip_image016

 

 

By now we are sure that sub function in HeapLeak.dll has allocated 255 MB in line number 14 and has not released. The report also gives you the callstack samples that show the code path when functions was doing allocations. Refer This sample HTML report file. 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

Related posts:

Basics of SQL Server Memory Architecture

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

 

Posted in Debugging, Memory, Performance, SQL Server memory | Tagged: , , , , , | 4 Comments »

SQL Server memory leak

Posted by Karthick P.K on December 4, 2012

What is memory leak?

When a process allocates memory it is supposed to de-allocate it and release it back to OS. If it misses to de-allocate the memory due to flaw in code it is called as leak and It can cause memory pressure both to the operating system and application.

 

Myth about SQL Server memory leak

SQL Server memory management is designed to dynamically grow and shrink its memory based on the amount of available memory on the system and Max server memory setting in SQLServer.

Many times system admins look at the memory usage of SQLServer and assume SQLServer is leaking memory if they find SQL Server memory usage is high.

This is incorrect SQL Server is server based application and its memory manager is designed in such a way that it will keep growing its memory usage on need (Exception large pages) and will not scale down its usage unless there is low memory notification from Windows. We can control the memory usage of SQL Server using Max server memory setting in SQLServer. This setting limits the Bpool usage of SQL Server and doesn’t control the overall memory usage of SQLServer. There are portions of SQLServer memory that is allocated outside BPOOL (aks: MTL or MTR) we do not have a way to control how much memory SQL Server can use outside bpool, but non bool memory usage will be normally low and can be easily estimated by studying the components running in SQL Server.

Ex: If you want to set SQLServer to use only 10GB RAM on server. Consider how much memory SQL Server might need outside Bpool and set the “max server memory” setting accordingly. In this case if you estimate SQL Server will use 1.5GB outside Bpool then set the Max server memory to 8.5GB.

What can cause SQL Server Memory leak?

SQL Server code has a logic to allocate memory but doesn’t de-allocate it. If any of the components in SQL Server is causing a memory leak in SQL Server it can be identified easily using the DMV’s like sys.dm_os_memory_allocation,sys.dm_os_memory_clerks and sys.dm_os_memory_objects etc., but most of the memory leaks in SQL Server is caused by 3rd party Dll’s which are loaded in SQL Server process.

 

Note: All the memory allocations by Non SQL server Dll’s loaded in SQL Server will happens in “Mem to Leave”(outside the Bpool) and they are called as direct windows allocations (DWA) 

 

When there is out of memory conditions in SQL Server and if you suspect there is a memory leak.First thing to determine is who is consuming the memory. If SQL Server is not using majority of the memory in MemToLeave and still you get Mem to leave errors probably there is a leak and it caused by some DLL’s loaded in

SQL Server. Refer Section 1 (MTL error) in http://mssqlwiki.com/sqlwiki/sql-performance/troubleshooting-sql-server-memory/

 

Below query can be used to determine actual memory consumption by SQL Server in MTL.

select sum(multi_pages_kb)  from sys.dm_os_memory_clerks

 

If the memory consumption by SQL Server is very low and still if you see SQL Server memory errors like few below then focus on Leaks.

 

Ex:

SQL Server 2000

                WARNING: Failed to reserve contiguous memory of Size= 65536.

                WARNING: Clearing procedure cache to free contiguous memory.

                Error: 17802 “Could not create server event thread.”

                SQL Server could not spawn process_loginread thread.

SQL Server 2005/2008

                Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 122880

 

 

 

How to identify and troubleshoot the memory leak?

 

There are multiple ways in windows to identify who is leaking memory in process. We will discuss how to identify the memory leak using  

 

1. Windows debugger 2. Debug diagnostics tools for windows and 3. UMDH in this blog.

 

Let us create a sample DLL to load in SQL server process to leak memory and see how to use the tools I mentioned above to troubleshoot the leak. 

 

Download HeapLeak.dll from This link and install Microsoft Visual C++ 2010 Redistributable Package from this links 32-Bit or 64-Bit to make this DLL work.

 

–Create  an extended stored procedure in SQL Server

exec sp_addextendedproc  'HeapLeak','C:\HeapLeakdll\HeapLeak.dll'

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

 

 

We will also enable below trace flags in SQL Server to automatically generate filter dump when there  is out of memory errors and see how to identify who is leaking.

 

 

dbcc traceon (2551,-1) – 2551 is used to enable filter dump.

go

dbcc traceon (8004,-1) –8004 is used to take memory dump on first occurrence of OOM condition

go

–Note: Both the trace flags listed above are un-documented, So use it at your own risk and there is no guarantee that this trace flags will work in future versions of SQL Server

 

 

Once we enable the trace flag . We have to cause out memory error in SQL Server to generate OOM memory dump. We have leaked around 300 MB of memory from MTL by executing above extended SP 30 times.

 

Let use execute below script which create XML handles. Memory for xml handles is allocated from MTL we will get out of memory errors very soon because extended stored procedure which we executed has already leaked the memory.

(Do not run below XML script directly with out executing HeapLeak  Below script will cause OOM error because of handle created for each execution, but it is accounted as SQL Server allocation so will not help us to understand the  how to debug leaks caused by 3rd party DLL’s)

 

Note: 1. SQL Server memory dump will be generated in SQL Server error log folder. 
2. Size of MTL is 256 MB + Max worker threads *.5  in 32-Bit SQL Server.  So approximately 384 MB unless modified using –g switch.

 

DECLARE @idoc int
 
DECLARE @doc varchar(1000)
 
SET @doc ='<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
<Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
     <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
      <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
   </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
   <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
   <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
   </Order>           
</Customer>
</ROOT>'
 
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
 
go 10000

We will receive below error after few executions.

Msg 6624, Level 16, State 12, Procedure sp_xml_preparedocument, Line 1

XML document could not be created because server memory is low.

To analyze the dump download and Install Windows Debugger from http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

 

Step 1 (Load the memory dump file to debugger):

 

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

 

Step 2 (Set the symbol path to Microsoft symbols server):

 

on command window type

 

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

 

Step 3 (Load the symbols from Microsoft symbols server):

 

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

 

 

Step 4 (check if symbols are loaded):

 

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

:028> lmvm sqlservr

start    end        module name

01000000 02ba8000   sqlservr   (pdb symbols)          c:\websymbols\sqlservr.pdb\93AACB610C614E1EBAB0FFB42031691D2\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Oct 14 15:35:29 2005 (434F82E9)

    CheckSum:         01B73B9B

    ImageSize:        01BA8000

    File version:     2005.90.1399.0

    Product version:  9.0.1399.0

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0409.04e4

    CompanyName:      Microsoft Corporation

    ProductName:      Microsoft SQL Server

    InternalName:     SQLSERVR

    OriginalFilename: SQLSERVR.EXE

    ProductVersion:   9.00.1399.06

    FileVersion:      2005.090.1399.00

    FileDescription:  SQL Server Windows NT

    LegalCopyright:   © Microsoft Corp. All rights reserved.

    LegalTrademarks:  Microsoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

    Comments:         NT INTEL X86

 

Step 5 : (!address to display the memory information)

 

Use !address command to display the memory information of the process from dump.

 

 

0:028> !address -summary

 

——————– Usage SUMMARY ————————–

    TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage

   686a7000 ( 1710748) : 81.58%    81.80%    : RegionUsageIsVAD

     579000 (    5604) : 00.27%    00.00%    : RegionUsageFree

    4239000 (   67812) : 03.23%    03.24%    : RegionUsageImage

     ea6000 (   15000) : 00.72%    00.72%    : RegionUsageStack

      1e000 (     120) : 00.01%    00.01%    : RegionUsageTeb

   122d0000 (  297792) : 14.20%    14.24%    : RegionUsageHeap

          0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap

       1000 (       4) : 00.00%    00.00%    : RegionUsagePeb

       1000 (       4) : 00.00%    00.00%    : RegionUsageProcessParametrs

       1000 (       4) : 00.00%    00.00%    : RegionUsageEnvironmentBlock

       Tot: 7fff0000 (2097088 KB) Busy: 7fa77000 (2091484 KB)

 

——————– Type SUMMARY ————————–

    TotSize (      KB)   Pct(Tots)  Usage

     579000 (    5604) : 00.27%   : <free>

    4239000 (   67812) : 03.23%   : MEM_IMAGE

     5fc000 (    6128) : 00.29%   : MEM_MAPPED

   7b242000 ( 2017544) : 96.21%   : MEM_PRIVATE

 

——————– State SUMMARY ————————–

    TotSize (      KB)   Pct(Tots)  Usage

   1b7bd000 (  450292) : 21.47%   : MEM_COMMIT

     579000 (    5604) : 00.27%   : MEM_FREE

   642ba000 ( 1641192) : 78.26%   : MEM_RESERVE

 

Largest free region: Base 00000000 – Size 00010000 (64 KB)

 

 

Look at the RegionUsageHeap it is around 297792 KB and largest free region is just 64KB. We know SQL Server doesn’t use Heap’s extensively so normally the heap allocated by SQL Server will not go beyond few MB. In this case it is consuming around 290 MB and so other components which use MTL can easily fail.

Let us try to understand why the Heap is around 297792 KB and try to identify if there is  a pattern.

 

Step 6: (Let us use !heap –s to display summary information about the heap)

 

 

0:028> !heap -s

LFH Key                   : 0x672ddb11

  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast

                    (k)     (k)    (k)     (k) length      blocks cont. heap

—————————————————————————–

000d0000 00000002    1024    896    896      6     1     1    0      0   L 

001d0000 00008000      64     12     12     10     1     1    0      0     

002c0000 00001002    1088     96     96      2     1     1    0      0   L 

002e0000 00001002      64     52     52      3     2     1    0      0   L 

007c0000 00001002      64     64     64     56     1     0    0      0   L 

00d10000 00001002     256     24     24      8     1     1    0      0   L 

340b0000 00001002      64     28     28      1     0     1    0      0   L 

340c0000 00041002     256     12     12      4     1     1    0      0   L 

342a0000 00000002    1024     24     24      3     1     1    0      0   L 

34440000 00001002      64     48     48     40     2     1    0      0   L 

61cd0000 00011002     256     12     12      4     1     1    0      0   L 

61d10000 00001002      64     16     16      7     1     1    0      0   L 

61d20000 00001002      64     12     12      4     1     1    0      0   L 

62a90000 00001002    1024   1024   1024   1016     2     0    0      0   L 

62b90000 00001002    1024   1024   1024   1016     2     0    0      0   L 

62c90000 00001002     256     40     40      7     1     1    0      0   LFH

00770000 00001002      64     16     16      2     2     1    0      0   L 

63820000 00001002      64     24     24      3     1     1    0      0   L 

63830000 00001001   10240  10240  10240    160    21     0    0    bad     

64230000 00001001   10240  10240  10240    160    21     0    0    bad     

64c30000 00001001   10240  10240  10240    160    21     0    0    bad     

65630000 00001001   10240  10240  10240    160    21     0    0    bad     

66030000 00001001   10240  10240  10240    160    21     0    0    bad     

66a30000 00001001   10240  10240  10240    160    21     0    0    bad     

67430000 00001001   10240  10240  10240    160    21     0    0    bad     

68130000 00001001   10240  10240  10240    160    21     0    0    bad     

68b30000 00001001   10240  10240  10240    160    21     0    0    bad     

69530000 00001001   10240  10240  10240    160    21     0    0    bad     

69f30000 00001001   10240  10240  10240    160    21     0    0    bad     

6a930000 00001001   10240  10240  10240    160    21     0    0    bad     

6b330000 00001001   10240  10240  10240    160    21     0    0    bad     

6bd30000 00001001   10240  10240  10240    160    21     0    0    bad     

6c730000 00001001   10240  10240  10240    160    21     0    0    bad     

6d130000 00001001   10240  10240  10240    160    21     0    0    bad     

6db30000 00001001   10240  10240  10240    160    21     0    0    bad     

6e530000 00001001   10240  10240  10240    160    21     0    0    bad     

6ef30000 00001001   10240  10240  10240    160    21     0    0    bad     

6f930000 00001001   10240  10240  10240    160    21     0    0    bad     

70330000 00001001   10240  10240  10240    160    21     0    0    bad     

70d30000 00001001   10240  10240  10240    160    21     0    0    bad     

7a160000 00001001   10240  10240  10240    160    21     0    0    bad     

7ab60000 00001001   10240  10240  10240    160    21     0    0    bad     

7b560000 00001001   10240  10240  10240    160    21     0    0    bad     

7d0d0000 00001001   10240  10240  10240    160    21     0    0    bad     

7e030000 00001001   10240  10240  10240    160    21     0    0    bad     

7ea30000 00001001   10240  10240  10240    160    21     0    0    bad     

67f90000 00001003     256     16     16     14     1     1    0    bad     

71850000 00001003     256      4      4      2     1     1    0    bad     

71890000 00001003     256      4      4      2     1     1    0    bad     

67fd0000 00001002      64     16     16      4     1     1    0      0   L 

718d0000 00001003     256     40     40      3     1     1    0    bad     

71910000 00001003     256      4      4      2     1     1    0    bad     

71950000 00001003     256      4      4      2     1     1    0    bad     

71990000 00001003     256      4      4      2     1     1    0    bad     

67ff0000 00001002      64     16     16      4     1     1    0      0   L 

719d0000 00001003    1792   1352   1352      5     2     1    0    bad     

71a10000 00001003     256      4      4      2     1     1    0    bad     

71a50000 00001003     256      4      4      2     1     1    0    bad     

71a90000 00001002      64     16     16      1     0     1    0      0   L 

—————————————————————————–

 

 

If you look at the above out put you can clearly identify a pattern. There are multiple created and each of them is 10 MB. But how to identify who actually created them?

 

Step 7:

 

Let us pickup one of the heap which is 10 MB and display all the entries (allocations) with in this 10 MB heap using !heap with –h parameter

 

Heap I have picked is 63830000.

 

 

0:028> !heap -h 63830000

Index   Address  Name      Debugging options enabled

19:   63830000

    Segment at 63830000 to 64230000 (00a00000 bytes committed)

    Flags:                00001001

    ForceFlags:           00000001

    Granularity:          8 bytes

    Segment Reserve:      00100000

    Segment Commit:       00002000

    DeCommit Block Thres: 00000200

    DeCommit Total Thres: 00002000

    Total Free Size:      00005048

    Max. Allocation Size: 7ffdefff

    Lock Variable at:     00000000

    Next TagIndex:        0000

    Maximum TagIndex:     0000

    Tag Entries:          00000000

    PsuedoTag Entries:    00000000

    Virtual Alloc List:   63830050

    UCR FreeList:        63830588

    FreeList Usage:      00000000 00000000 00000000 00000000

    FreeList[ 00 ] at 63830178: 6422de88 . 638ad7e0      Unable to read nt!_HEAP_FREE_ENTRY structure at 638ad7e0

(1 block )

    Heap entries for Segment00 in Heap 63830000

        63830608: 00608 . 00040 [01] – busy (40)

        63830648: 00040 . 02808 [01] – busy (2800)

        641b6698: 02808 . 02808 [01] – busy (2800)

        ……………………………………

        ……………………………………

        ……………………………………

        ……………………………………

       

Step 8: (Let us pickup one of the heap entry (allocation) and try to identify what is in it)

 

 

0:028> db 641b6698

641b6698  01 05 01 05 93 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

641b66a8  6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20  ng the heap for

641b66b8  64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b  demo at MSSQLWIK

641b66c8  49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00  I.COM………..

641b66d8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b66e8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b66f8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b6708  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

 

0:028> db 63830648

63830648  01 05 08 00 89 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

63830658  6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20  ng the heap for

63830668  64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b  demo at MSSQLWIK

63830678  49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00  I.COM………..

63830688  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

63830698  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306a8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306b8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

 

 

Similarly you can dump multiple heap allocations to identify a pattern.

 

Now if you look at the memory dumped you see a string which might help you to identify the DLL which created the heap. There is a pattern in above heaps. All the heap allocations have below string

“Iam filing the heap for demo at MSSQLWIKI.COM”

 

Note : You can use L Size to dump more memory using db or dc command’s example db 63830648 L1500

 

Step 9:

Let us open the DLL which we loaded in SQL Server for testing using notepad and see if there is string which matches the pattern

 

clip_image002

 

 

Yes there is which proves that this DLL’s has caused the leak. In real time you may have to play with different heap allocations to identify the pattern.

 

This is one way to find the leaks from the memory dump after the leak has actually happened. It may not be always easy to find a pattern and identify the modules who allocated the memory, In such scenarios  you may have to track the leak using the tools like debug diagnostic tool, UMDH etc.In the my next blog I will post how to track memory leak using Debug diagnostics tool.

Continued in Debugging memory Leaks using Debug diagnostic tool

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

Related posts:

Basics of SQL Server Memory Architecture

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, Memory, Performance, SQL General, SQL Server Engine | Tagged: , , , , | 26 Comments »

SQL Server NUMA load distribution

Posted by Karthick P.K on November 22, 2012

When port affinity is not configured all the connection to SQL Server enters through single port and connections are tied to nodes in round robin basis.

 

We might end with Imbalance of Workload in NUMA systems under below conditions.

 

1. When a connection is tied (or) affinitized to a node, all the work from that connection will be completed on the same node (in which connection is directed)  if plans are serial.  We don’t consider the CPU load across the NUMA to pick up the node for serial plans, We use the same node in which connection is made for serial plan execution. Parallel query would use any NUMA node regardless of node this query came from. When all the queries execute from connections made to same node and if plans are also serial we might end up with overloading one Node while others are not fully used.

 

2. State of each nodes is internally maintained by SQL Server and updated every 2 seconds so there is  remote possibility that all parallel queries end with same node some times and cause spike in one node, while the other nodes are unused.

 

3. When there is imbalance between the number of online schedulers in each node (Ex: 16-CPU in Node1 and 4-CPU in Node2 ) and if all plans are serial (assume we have set Max DOP 1) We might end up with overloading the schedulers in node with least schedulers. while the schedulers on other node is underused, similarly when memory is shared across nodes we share it equally irrespective of number of schedulers on each node so in this case first 16 schedulers would have got  half of memory and 4 schedulers of second node would have got remaining half. So ensure you choose the CPU affinity carefully (Specially when you have installed SQL Server with limited processor license on system with larger number of CPU’s). 

 

 

Image 1:  sys.dm_os_schedulers (6 – CPU’S on node-0 and  1- CPU on node-1. Look at current task count)

 

clip_image002

 

Image 2 (Look at the current and pending task in node 0 and in node 1)

clip_image002[4]

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

 

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Debugging, Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: , , , | 2 Comments »

SQL Server Query optimization

Posted by Karthick P.K on November 6, 2012

SQL Server Query optimization (or) Tuning slow queries in SQL Server.

How to troubleshoot (or) tune slow queries in SQL Server, Optimize slow queries to run faster , resolve error sql server -2147217871 Query timeout expired  and make them run faster?

A query in considered to be slow when it is executing for longer duration than expected. Total duration of the query can be broken in to compile time, CPU time and Wait time.

 

Before you start troubleshooting the query which is running for longer duration, Identify if the query is slow because it is long waiting (or) Long running (or) Long compiling.

 

Compile time:Time taken to compile the query.  compile time can be identified by looking at the

 

1. CompileTime=”n”  in XML plan

2. SQL Server parse and compile time when Set statistics time on is enabled.

 

CPU time: Time taken by the query in CPU (Execution time – (compile time+ wait time). CPU time can be identified by looking at the

 

1. CPU column in profiler.

2.  CPU time under SQL Server Execution Times when statistics time on is enabled.

 

Execution time: Time taken by the query for complete execution ( Execution time = CPU time (CPU time for compilation+execution) +Wait time). Total duration of the query can be identified by using the

 

1.Duration column in profiler

2. SQL Server Execution Times, elapsed times when statistics time on is enabled.

 

What is long waiting query?

 

A query is considered to be long waiting query, when it spend most of its time waiting for some resource.

 

How to identify if the query is long waiting?

 

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set .

When a query is waiting for a resource (such as lock, network I/O, Page_I/O Etc) it will not  consume CPU.  So if you see duration being higher than CPU (Difference between Duration and  CPU is wait time),It indicates that the query has spent large amount of time waiting for some resource.

               

Let us see an example of long waiting query. I have collected profiler trace while executing the query.

 

set statistics io on

set statistics time on

go

–Place your query here

select top 10000 * from a

go

set statistics io off

set statistics time off

go

 

 

clip_image001[15]

Look at the Duration and CPU column in the profiler Cpu=256 and duration =1920. So this query has spent majority of time waiting for some resource.

clip_image002[16]

Look at the output of statistics time and statistics I/O in above image.

SQL Server has spent only 2 milliseconds compiling the query and 256 milliseconds on CPU, but the overall duration was 1920 milliseconds so the query has spent maximum time waiting for some resource.

 

Identify the resource in which this query is waiting on using one of the steps listed below.

1. Look at the wait type column of the sysprocesses for the spid which is executing query while the query is executing.

2. If there is no other activity on the server collect sys.dm_os_wait_stats output before and after the query execution and identify the wait (Will not help in tuning queries running for short duration)

3. Collect XEvent to gather the wait stats of individual query.

 

Once you identify the resource in which the query is waiting on tune the resource. Most of the times queries would be slow waiting for below resource.

 

PAGEIOLATCH_* or Write log: This indicates I/O resource bottleneck follow the detailed troubleshooting steps mentioned in  This Link to fix the I/O bottleneck. If you find SQL Server spawning excessive I/O Create necessary indexes.

a. Logical reads + Physical reads in statistics I/O output (Refer above image) or Reads and writes in profiler will indicate the I/O posted by this query. If you see very high reads for query compared with the result rest retuned by query it is an indication of  missing indexes or bad plan. Create necessary indexes (You can use DTA for index recommendations.).

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO. 

a.This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid The wait_resource is the page number (the format is  dbid:file:pageno) that is being accessed. 

b. We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine  whether contention  is for allocation, data or text.

c. If the pages that SQL Server is most frequently waiting on are in Tempdb database ,check the wait resource column for a page number in dbid 2 Ex(2:1:1 or 1:1:2). Enable TF 1118 and increase the number of TEMPDB data files and size them  equally (You may be facing tempdb   llocation latch contention mentioned in http://support.microsoft.com/kb/328551)

d. If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table.  In this case we need to choose a different clustered index key to spread the work across different pages.

LATCH_*:    Non-buf latch waits can be caused by variety of things.  We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101). 

a. A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

b. Auto Grow and auto shrink while query is executed.

c. Queries going for excessive parallelism.

Blocking (LCK*): Use the query in This Link  to identify the blocking. Tune the head blocker.

Asynch_network_io (or) network IO: Keep the result set returned by the query smaller. Follow detailed troubleshooting refer  This Link

Resource_semaphore waits: Make sure there is no memory pressure on the server Follow steps in This Link  for detailed troubleshooting.

SQL Trace: Stop all the profiler traces running on the server. Identify the traces which are running on the server using the query in This Link

Cx packet: Set the Max degree of parallelism. But remember Cxpacket wait type is not always a problem.

a. For servers that have eight or less processors, use the following configuration where N equals the number of processors: max degree of parallelism = 0 to N .

b. For servers that use more than eight processors, use the following configuration: max degree of parallelism = 8.Refer This Link

SOS_SCHEDULER_YIELD : Identify if there is CPU bottleneck on the server. This waiting means that the thread is waiting for CPU.

a.  SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) Will ( is expected to) yield back to SQL Server scheduler when it exceeds 4ms and before it yields back it check if there are any other runnable threads, If there is any runnable threads then the thread which is in top of runnable list is  scheduled and current thread will go to the tail of the runnable list and will get rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum. The time thread spends in runnable list waiting for its quantum is accounted as SOS_SCHEDULER_YIELD. You will see this type when multiple threads are waiting to get CPU cycle. Follow trouble shooting the steps mentioned This Link

Important: In SQL Server instances when there more than 1 CPU it is possible that the CPU is higher than the duration. Because CPU is sum of time spend by query in all the CPU’s when choosing a parallel whereas the duration is actual duration of the query.

 

What is long running query?

A query is considered to be long running query, when it spend most of its time on CPU and not waiting for some resource.

How to identify if the query is long running ?

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set . If the CPU and duration is close than the query is considered to be long running. If the query is long running identify where the query spend the time ,It could be for compiling or post compilation (For executing the query). compare the duration of the query with CompileTime (XML plan compile time (or) SQL Server parse and compile time when statistics time is on refer above image).

High Compile time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on).Compile time will normally be in few millisecond . Follow the below steps if you see high compile time

1. Identify if you have large token perm refer http://support.microsoft.com/kb/927396

2. Create necessary indexes and stats. Tune the query manually (or) in DTA and apply the recommendation

3. Reduce the complexity of query. Query which joins multiple tables (or) having large number of IN clause can taking  a while to compile.

4. You can reduce the compile’s by using force parameterization option.  

High CPU time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on).  If the compile time is very low compared to the duration. Then follow the below steps.

1. Update the stats of tables and indexes used by the query (If the stats are up to date Estimated rows and estimated execution will  be approximately same in execution plan .If there is huge difference stats are out dated and requires update) .

2. Identify if the query has used bad plan because of parameter sniffing (If the ParameterCompiledValue and ParameterRuntimeValue is different in XML plan). Refer THIS LINK to know more about Parameter Sniffing

3. If updating the stats and fixing the parameter sniffing doesn’t resolve the issue it is more likely optimizer is not able to create efficient plan because of lack of indexes and correct statistics. Run the query which is driving the CPU in database tuning advisor and apply the recommendations. (You will find missing index detail in xml plan but DTA is more efficient).

4. If the query which is running longer and consuming CPU is linked server query try changing the security of linked server to ensure linked server user has ddl_admin or dba/sysadmin on the remote server. More details regarding the issue in THIS LINK.

5. Ensure optimizer is not aborting early and creating bad plan. For details refer THIS LINK

6. Ensure the query which is spiking the CPU doesn’t have plan guides (xml plan will have PlanGuideDB attribute.  Also sys.plan_guides will have entries) and query hints(index= or (option XXX join) or inner (Join Hint) join).

7.  Ensure that SET options are not changed.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

Posted in Performance, SQL General, SQL Query | Tagged: , , , , , , , , , | 7 Comments »

SQL Server 2012 Memory

Posted by Karthick P.K on October 21, 2012

SQL Server 2012 has made many changes to the memory manager to govern the SQL Server memory consumption in efficient way compared with earlier versions. Important changes to SQL Server 2012 memory which every DBA should be aware of is documented in this blog. If you are not familiar with the SQL Server memory architecture of earlier versions I would recommend reading THIS ARTICLE  before you continue with changes in Denali memory manager.

clip_image001
Max Server Memory

In previous versions of SQL Server “Max Server Memory” controlled the Maximum physical memory Single page allocator (BPOOL)  can consume in SQL Server user address space.

Only the single page allocator was part of BPOOL and Max server memory controlled only BPOOL, so the following allocations came outside BPOOL (Max server memory)

1.Multi-Page allocations from SQL Server [These are allocations which request more > 8 KB and required contiguous memory]

2.CLR allocations [These include the SQL CLR heaps and its global allocations created during startup]

3.Memory used for thread stacks within SQL Server process (Max worker threads * thread stack size). Thread stack size is 512K in 32 bit SQL Server, 904 K in WOW mode and 2 MB in 64-Bit 

4.Direct windows allocations made by Non-SQL Server dll’s ([These include windows heap usage and direct virtual allocations made by modules loaded into SQL Server process. Examples: allocations from extended stored procedure dll's, objects created using OLE Automation procedures (sp_OA calls), allocations from linked server providers loaded in sqlserver process)

 

SQL Server 2012 memory manager has now clubbed single page allocator and multipage allocator together  as any-size page allocator . As a result, the any-size page allocator now manages allocations categorized in the past as single page and Multi-Page allocations.

1. "max server memory" now controls and includes “Multi pages allocations”.

2. In earlier versions of SQL Server CLR allocated memory was outside BPOOL (Max server memory)   . SQL Server 2012 includes SQL CLR allocated memory in "max server memory".

SQL Server 2012 "max server memory" configuration does not include only the following allocations:

1. Memory allocations for thread stacks within SQL Server process

2. Memory allocation requests made directly to Windows [Ex: Allocations (Heap,Virtualalloc calls ) from 3rd party Dll’s loaded in SQL Server process , objects created using OLE Automation procedures (sp_oa) etc]

These changes allow DBA’s to configure and control SQL Server more accurately in accordance with the memory requirements and using resource governor.

-g startup parameter

We used the -g startup option to change the default value of a region in SQL Server user address space known as "Memory-To-Reserve". This region was also known as "memory-to-leave or MTL.  The "Memory-To-Reserve" (or) -g configuration option are relevant only for a 32-bit instance of SQL Server.

Multi pages allocation and CLR was part of Mem-to-reserve (-g)  in In previous SQL Server versions until SQL Server 2008 R2 , From Denali they are part of BPOOL (Controlled by Max server memory)  So you may have to remove –g if you have set it to give space for multipage allocator or CLR in earlier versions and migrating to Denali now.

 

AWE feature removed from SQL Server 2012

AWE feature was used in earlier versions of 32-Bit SQL Server to address more than 4GB of memory . This feature is now removed in Denali  refer:"AWE deprecation".  So if you need more memory then you may need to migrate to 64-Bit SQL server.

 

Locked pages in memory

Trace flag 845 is no more required to Lock Pages in memory. As long as the startup account of SQL Server has “Lock pages in memory” privilege Datacenter, Enterprise, standard and Business intelligence edition will use AWE allocator Api’s for memory allocations in BPOOL and this allocations will be locked.

 
Dynamic virtual address space management

In earlier versions of SQL Server 32-Bit we reserved Bpool at the startup and remaining addresses are left for MTL (Memory to reserve or Memory to leave) . In Denali virtual address space management is dynamic (we  don’t reserve at startup) , So it is possible for 3rd part components to use more memory than what is  configured in –g parameter.

 
SQLCLR loaded at startup

In earlier SQL Server versions, Common language runtime (CLR) functionality is initialized inside SQL Server process when the first SQL CLR procedure or function is invoked. SQL Server 2012 performs SQL CLR initialization at startup. The initialization is independent of the ‘clr enabled’ configuration option.

You will notice the following messages in the SQL Server error log during server startup:

2012-10-18 15:23:13.250 spid8s       Starting up database ‘master’.

2012-10-18 15:23:13.930 Server       CLR version v4.0.30319 loaded.

Total Physical memory and memory model used

Total physical memory available on the server and the memory model  used is logged in SQL Server error log

2012-10-18 15:23:06.690 Server       Detected 131067 MB of RAM. This is an informational message; no user action is required.

2012-10-18 15:23:06.700 Server       Using locked pages in the memory manager

2012-10-22 15:32:20.450 Server       Detected 131067 MB of RAM. This is an informational message; no user action is required.
2012-10-22 15:32:20.450 Server       Using conventional memory in the memory manager.

 

DMV and Performance counter changes

In earlier version of SQL Server most of the DMV’s used single_pages_kb and  multi_pages_kb to refer allocations by SQL Server with in BPOOL and outside BPOOL. Now they are represented together as  pages_kb. More details in THIS link

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

 

Posted in Memory, SQL General, SQL Server Engine, SQL Server memory | Tagged: , , , , , , , , , | 15 Comments »

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

Posted by Karthick P.K on October 16, 2012

 

I have got few request’s from  SQL Server DBA’s in past to blog about analyzing SQL Server exceptions and assertions . After seeing lot of DBA’s getting stuck when they get EXCEPTION_ACCESS_VIOLATION (or) Assertion in SQL ServersI decided to write this blog.

This blog is published with intention to make DBA’s analyze and resolve EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion before contacting Microsoft support.  Exception and assertion are two different things. SQL handles both assertions and exceptions by writing the current thread’s stack to the Error log and generating a dump.  In simple An exception is an event that occurs during the execution of a program, and requires the execution of code outside the normal flow of control and assertion is the check that the programmer inserted into the code to make sure that some condition is true, If it returns false an assert is raised. SQL handles both assertions and exceptions by writing the current thread’s stack to the Error log and generating a dump, so trouble shooting steps are similar. 

 

You will find messages similar to one below in SQL Serve error logs when you get Exception or EXCEPTION_ACCESS_VIOLATION .

{

Error

External dump process returned no errors.
Using ‘dbghelp.dll’ version ’4.0.5′
SqlDumpExceptionHandler: Process 510 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
* *******************************************************************************
* BEGIN STACK DUMP:
*  Exception Address = 000000007752485C Module(ntdll+000000000002285C)

*   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION

*   Access Violation occurred reading address 0000041EA9AE2EF0

* Input Buffer 510 bytes –

ex_terminator – Last chance exception handling

}

You will find messages similar to one below in SQL Server error logs when you get an Assertion.

{

Error

spid323     Error: 17065, Severity: 16, State: 1.

spid323     SQL Server Assertion: File: < .cpp>, line = 2576 Failed Assertion = ‘fFalse’  This error may be timing-related. If the error persists after rerunning the statement, use DBCC CHECKDB to check the database for structural integrity, or restart the server to ensure in-memory data structures are not corrupted

SQL Server Assertion: File: <   .cpp>, line=2040 Failed Assertion =

}

To analyze the dump download and Install Windows Debugger from This Link 

 

Step 1 (Load the memory dump file to debugger):

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

Step 2 (Set the symbol path to Microsoft symbols server):

on command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3 (Load the symbols from Microsoft symbols server):

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

 

Step 4 (check if symbols are loaded):

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr

start             end                 module name

00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)

    CheckSum:         025FEB5E

    ImageSize:        02679000

    File version:     2005.90.4266.0

    Product version:  9.0.4266.0

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5 (Switch to exception context):

Type .ecxr

Step 6(Get the stack of thread which caused exception or assertion):

Type  kC  1000    //You will get the stack of thread which raised exception or assertion .

I have pasted one of the sample stack below, from the exception dump which I worked recently.  First thing to identify from stack is who is raising the exception. In the below stack look at the portion which is highlighted in red (In each frame before the ! symbol), that is the module which raised the exception (Exe or DLL name ).

If Exe/DLL name is Non Microsoft  module (Exe or DLL name ) then the exception is being caused by a third party component, you will need to work with the company that provided that component to get a solution. lmvm Exe/DLL name will give you the company name. For example: lmvm wininet

If Exe/DLL name is  SQLServr  (or) any other SQL Server modules then the exception is raised by SQL Server, In that case type kC 1000 and paste the stack in comments session of this blog (or) When you start thread in MSDN forums (or) In This face book group. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

Note: When you get Assertion make sure you post message line which contains   SQL Server Assertion: File: <Filename.cpp>, line = 2576 Failed Assertion =  ”  

 

0:000> kC 1000

Call Site

wininet!InternetFreeThreadInfo+0×26

wininet!InternetDestroyThreadInfo+0×40

wininet!DllMain_wininet+0xb5

wininet!__DllMainCRTStartup+0xdb

ntdll!LdrShutdownThread+0×155

ntdll!RtlExitUserThread+0×38

msvcr80!_endthreadex+0×27

msvcr80!_callthreadstartex+0x1e

msvcr80!_threadstartex+0×84

kernel32!BaseThreadInitThunk+0xd

ntdll!RtlUserThreadStart+0x1d

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Related posts:

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Debugging, SQL General, SQL Server Engine, Startup failures | Tagged: , , , , , , , , , , , , , , , | 159 Comments »

What is RESOURCE_SEMAPHORE_QUERY_COMPILE?

Posted by Karthick P.K on October 12, 2012

What is RESOURCE_SEMAPHORE_QUERY_COMPILE?

Before we understand the RESOURCE_SEMAPHORE_QUERY_COMPILE let us see what is compile memory.

Compile memory:  When a query is compiled in SQL Server, the compilation process needs memory (for parsing, algeberaization and optimization) called compile memory. This memory doesn’t include the memory required to execute the query.

Challenges with Compile memory:  This memory used for Query compilations are usually expected to fit into SQL Server main-memory and to be relatively short-lived.  Like any other consumers of Bpool,  this is implemented by “stealing” pages from the Buffer Pool and hence it blocks other memory consumers from using that memory until a query compilation completes.

Even if the SQL Server has enough memory to service multiple simultaneous query compilation, allowing all of them to occur at the same time might lead to stealing a significant number of pages from the buffer pool, with consequent increased physical IO , poor performance of query execution and causing memory pressure within SQL Server. However on the other side, a throttling mechanism that is too restrictive could lead to a non-optimal usage of the system resources and decreased throughput for compile intensive workloads, So SQL Server came with more dynamic approach to solve the problem  which  is to better manage system resources, memory in particular. Such management should hence be based on and driven by the amount of memory used.

Let us see it with example:

Assume SQL  Server Max server memory is set to 1000MB and Currently data /index pages is consuming 800MB with in Max server memory (bpool) and 3 queries are reaching SQL Server for compilation, each of them requiring 300 MB for compilation.

If all three queries are compiled simultaneously  total compilation might take 900MB of memory causing all the data and index pages to be dropped from BPOOL causing  consequent increased physical IO and poor performance of query during execution(to bring data pages back to memory). On the other hand let us assume each of this 3 queries need only 2 MB of compilation memory, There is no reason for SQL Server to throttle the number of compilation.

To overcome above challenges SQL 2005+ throttles the number of concurrent compiles that can happen at  any time based on memory usage during the compile. SQL Server memory grant is controlled by a object called “Resource Semaphore” and has internal mechanism to detect how much memory has been used by each compile. There are three gateways (semaphores) called the small, medium, and big gateway. When a request is received for compilation SQL Server will start compilation. There is no limit on how many queries can be compiled simultaneously, but when memory usage for a query reaches the threshold for a given gateway it will then acquire that semaphore of next gateway before continuing. The semaphores (Queries which can be compiled concurrently) are set up to allow 4*schedulers count for the small gateway, 1*schedulers count for the medium gateway and 1 (per SQL instance) for the big gateway.

The small gateway has a fixed threshold for how much memory must be consumed before you enter it.  The medium and big gateways have dynamic thresholds that vary depending on how much memory is available, how much is stolen, etc.

If the semaphore can’t be acquired then you see this wait type (Query is waiting for memory grant to compile a query =  RESOURCE_SEMAPHORE_QUERY_COMPILE wait). This behavior lets SQL Server to allow only a few memory-intensive compilations occur at the same time. Additionally, this behavior maximizes throughput for smaller queries.

clip_image001[4]

How to identify RESOURCE_SEMAPHORE_QUERY_COMPILE waits?

 

To get an idea look at the sys.sysprocesses  table for sessions waiting on  RESOURCE_SEMAPHORE_QUERY_COMPILE

select sp.*, st.text from sys.sysprocesses sp CROSS APPLY sys.dm_exec_sql_text(sp.sql_handle) AS st

WHERE sp.lastwaittype LIKE ‘RESOURCE_SEMAPHORE_QUERY_COMPILE%’ ORDER BY sp.waittime DESC;

 

clip_image002[4]

 

 

 

There could be two possible reasons for RESOURCE_SEMAPHORE_QUERY_COMPILE waits

1.       Memory pressure within SQL Server caused by others using  lot of stolen memory or OS memory pressure

In this case you will see thresholds for medium and big gateways very low. In this situation you have to identify who is consuming most of stolen memory and  tune them to increase the available memory or add additional memory that can used by SQL server. When the available memory decrease, threshold for medium and big gateways would decrease significantly and increase the number of queries which have to enter medium /big gateways, So the number of parallel compiles will decrease increasing the overall wait time.

 

This DBCC memory status output is from system which has 48 processor and has excessive RESOURCE_SEMAPHORE_QUERY_COMPILE waits because of memory pressure.

Small Gateway (default)                  Value

—————————————- ———–

Configured Units                         192        // 190 number of units .  48  CPU’s * 4=192

Available Units                            109

Acquires                                         83

Waiters                                            0

Threshold Factor                         380000

Threshold                                       380000

(6 row(s) affected)

Medium Gateway (default)                 Value

—————————————- ———–

Configured Units                         48              //48 number of units. SO 48 CPU’s *1=48

Available Units                               0             //There is no available slots. All slots are busy.

Acquires                                           48

Waiters                                             34           //34 Queries are waiting

Threshold Factor                         12

Threshold                                    2204603    //Threshold is very low 2 MB (This value is in bytes)

(6 row(s) affected)

Big Gateway (default)                    Value

—————————————- ———–

Configured Units                         1            //1 per instance

Available Units                             0            //There is no available slots. All slots are busy.

Acquires                                          1

Waiters                                          47            // 47 Queries are waiting

Threshold Factor                         8

Threshold                                3306905     //Threshold is very low 3 MB (This value is in bytes)

 

 

2.       There is huge amount of available memory but the available units in gateway is exhausted. This situation normally occurs when we have many queries that have high compile time and use lot of memory for compilation.

Compile time and Compile memory can be captures using the show plan XML for query compile event in profiler.

Below is extract from show plan XML for query compile event in profiler. This query has used approximately 150 MB of compile memory and the threshold for Big gateway is around 143 MB . So only one query which needs more than 143 MB can compile at a time , This can cause contention when there are multiple queries waiting for compile/recompile. Also an important factor to notice in this XML plan is compile time is ~139 times the CPU. So likely this query waited for most of the time for resource semaphore.

Big Gateway (default)                    Value

—————————————- ———–

Configured Units                         1

Available Units                             0

Acquires                                         1

Waiters                                          47

Threshold Factor                         8

Threshold                                149640500

 

         <QueryPlan CachedPlanSize="312" CompileTime="139847" CompileCPU="1002" CompileMemory="152320">

 

How to fix RESOURCE_SEMAPHORE_QUERY_COMPILE waits?

1.       Add additional memory to SQL Server.

2.       Reduce the number of compile and recompiles happening in SQL Server.

3.       Threshold for the gateways are dynamic (except for small gateway) and therefore memory pressure occurs from other sources (Internal to SQL Server or system wide) reduces the  amount of memory available for compiles and Queries are throttled to    higher gateways sooner. Make sure SQL Server is not starving for memory.

4.       Identify the queries which are consuming large compile memory and tune them (CompileMemory  in show plan XML query compile can be used).

5.       RESOURCE_SEMAPHORE_QUERY_COMPILE waits are very common in 32-bit SQL Server because of virtual address space limitation, so migrate to 64-Bit SQL Server.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Memory, Performance, SQL Query, SQL Server Engine | Tagged: , , , , , | 8 Comments »

SQL Server Parameter sniffing

Posted by Karthick P.K on October 8, 2012

When a stored procedure, prepared queries and queries submitted via sp_executesql  is compiled for the first time, the values of the parameters supplied with the execution call are used for cardinality estimation, to optimize the statements within that stored procedure and create the query plan. This is known as parameter sniffing because the optimizer sniffs the current parameter value during compilation.

If these values are typical and the data distribution is even in the underlying tables, all the calls to the stored procedure will benefit from this query plan since the plan is reused. However, parameter sniffing can cause problems if the "sniffed" parameter value is not typical of the values which are actually used during a typical execution or the data in underlying tables are very skewed, because plan generated for “sniffed” parameter value may not be optimal for current parameter passed and since the plan is reused there can be performance degradation.

Consider the following scenario we have a table with two columns (country and some column ). This table has 10001 rows.  10000 rows has USA in country column and 1 row has brazil in country column.

 

This table has NONCLUSTERED INDEX called NC on country column.

 

 

create table data(country char(10),somecolumn char(10))

go

insert into data values (‘BRAZIL’,‘somedata’)

go

insert into data values (‘USA’,‘somedata’)

go 10000

 

CREATE NONCLUSTERED INDEX [NC] ON [dbo].[data]

(

[country] ASC

)WITH (STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

GO

 

create proc sniffing @p1 char(10)

as

begin

select country,somecolumn from data where country=@p1

end

Go

 

 

 

–Let us execute stored procedure sniffing with the with parameter brazil.

 

exec sniffing ‘BRAZIL’

go

 

clip_image002

 

 

 

 

Optimizer picked up Index-seek in Non-clustered index and Row-ID lookup on table.

 

What happens when we execute the same procedure with parameter ‘USA’. Since the plan is already created and cached for ‘BRAZIL’ it is reused and plan which is generated for BRAZIL is Not an optimal plan for parameter USA.

exec sniffing ‘USA’ 

go

 

 

 

clip_image004

 

 

How to identify if the optimizer is using plan which compiled for sniffed parameter values  and not the current parameters value.

 

Let us enable statistics xml on

 

 

set statistics xml on

 

exec sniffing ‘USA’ 

go

 

Look at the XML plan for the ParameterCompiledValue and ParameterRuntimeValue.

 

Below is extract from XML plan and this output proves that the plan is compiled for parameter BRAZIL (ParameterCompiledValue) and it is used for parameter USA (ParameterRuntimeValue)

{

<ColumnReferenceColumn="@p1"ParameterCompiledValue="‘BRAZIL    ‘"ParameterRuntimeValue="‘USA       ‘" />

}

 

 

 

We will also see a huge difference in estimated and actual rows count if the parameter sniffing is impacting the plan

(Remember out dated stats can also cause optimizer to estimate incorrect rows so difference in estimate and actual rows doesn’t mean it is because of parameter sniffing ). 

 

What would have been the optimal plan if the parameter ‘USA’?

 

 

Let us execute the same procedure with recompile option

 

 

exec sniffing ‘USA’ with recompile

go

 

 

clip_image006

 

 

 

 

How to fix Parameter sniffing?

 

1. USE RECOMPILE: when you create the stored procedure. so the parameter is compiled every time it is called. This method can be used if the compile time is very less compared to execution time of bad plan

   Ex: create proc sniffing @p1 char(10) with recompile

 

2.  OPTION (RECOMPILE): for the statement which impacted by the parameter sniffing.  If the procedure has multiple statements recompile will impact only the particular statement.

 

3.  OPTIMIZE FOR HINT: Instructs the query optimizer to use a particular value for a local variable when the query is compiled and optimized (or)  OPTIMIZE FOR  UNKNOWN WHICH Instructs the query optimizer to use statistical data

 

instead of the initial values for all local variables when the query is compiled and optimized.  This value is used only during query optimization and actual values are used during execution.

{

alter proc sniffing @p1 char(10)  as
begin
select country,somecolumn from data where country=@p1
option (optimize for (@p1 = ‘USA’))

–option (optimize for (@p1 unknown))
end
}

 

4.   Assign the incoming parameter values to local variables and use the local variables in the query.  If you are in SQL Server2000 in which we don’t have OPTIMIZE FOR hint.

      Ken Henderson has blogged about it in http://blogs.msdn.com/b/khen1234/archive/2005/06/02/424228.aspx

 

5.   Trace Flag 4136 which is introduced in SQL Server 2008 R2 Cumulative Update 2, SQL Server 2008 SP1 Cumulative Update 7 and SQL Server 2005 SP3 Cumulative Update 9 introduce trace flag 4136 that can be used to disable  the "parameter sniffing" process  more details on http://support.microsoft.com/kb/980653

 

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Optimizer, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , | 6 Comments »

Optimizer Timeout or Optimizer memory abort

Posted by Karthick P.K on October 7, 2012

Optimizer Timeout

When the query processor finds itself consuming a lot of time optimizing a query, it may decide to stop the optimization process abruptly, and choose the best available plan. This is to ensure that the optimizer doesn’t end up optimizing forever. This is  called optimizer timeout (based on the number of plans considered relative to the cost of the best plan so far).

Optimizer memory abort

When queries become more complex number of potential plans to consider can quickly grow in thousands. Optimizer has limit for memory it is allowed to use , when the optimizer reaches the limit it ends with  optimizer memory abort.

When  timeout or memory abort happens optimizer might choose the best plan  from plans which was generated till timeout or abort and it might be far from optimal plan so the query execution can take long time and consume resource.

On SQL 2000 and earlier the only way to detect this condition is compiling the query with trace flag 8675.   If one of these conditions occur the output will reflect a timeout abort or memory abort, similar to the following:

 

End of simplification, time: 2.869 net: 2.869 total: 2.869 net: 2.869

end exploration, tasks: 200094 no total cost time: 16.17 net: 16.169 total: 19.04 net: 19.039

*** Optimizer time out abort at task 614400 ***

*** Optimizer time out abort at task 614400 ***

 

Msg 8623, Level 16, State 1, Line 3

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

End of simplification, time: 0.156491 elapsed: 0.156491

end exploration, tasks: 1614 no total cost time: 0.552436 elapsed: 0.708927

end search(0),  cost: 1275.32 tasks: 3888 time: 0.195008 elapsed: 0.903935

end exploration, tasks: 7596 Cost = 1275.32 time: 0.548032 elapsed: 1.45197

end search(1),  cost: 1263.15 tasks: 21985 time: 2.30564 elapsed: 3.75761

*** Optimizer memory usage abort ***

End of optimization,  elapsed: 2.98304

From SQL server 2005 to determine whether the query optimizer times out or MemoryLimitExceeded search for the 
StatementOptmEarlyAbortReason="TimeOut" (or) StatementOptmEarlyAbortReason="MemoryLimitExceeded" expression in the XML plan output.

 

We can avoid optimizer from timing out and picking bad plan by enabling trace flag -T8780. This increases the time limit before the timeout occurs.

Note: Don’t enable this trace flag at server level , enable it only for the session which runs the query and identify if the optimizer is picking up a better plan. If you see optimizer picking up the better plan, right approach is to tune the query manually or using DTA and apply the recommendations . You can use this trace flag till you apply the recommendations made by DTA.

 

If you experience “ Optimizer memory usage abort” use “SQLServer:Memory Manager\Optimizer Memory (KB)” counter to the amount used for compilation .

select * from sys.dm_os_memory_clerks where type=’MEMORYCLERK_SQLOPTIMIZER’ will tell us the overall memory used by optimizer.

We can also use the CompileMemory= expression in XML plan output starting from SQL server2005 SP2 which will give us the compile memory used by individual plans. If you find optimizer memory is very low then identify what is contributing to memory contention in SQL Server and tune it.

I will discuss compile memory in detail when I blog about Resource_semaphore  wait types.

 

Note: You may also receive below error because of few known issues documented in KB articles 982376, 946020,926773,917888 so if none of the fixes resolve the issue you may have to follow the same steps documented above.

{

"Msg 8623, Level 16, State 1, Line 1

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information."

}

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Performance, SQL General, SQL Query, SQL Server Engine, SQL Server memory | Tagged: , , , , , , , , , , | 8 Comments »

Troubleshooting SQL Server high CPU usage

Posted by Karthick P.K on October 4, 2012

Troubleshooting SQL Server high CPU usage

 

First thing to determine when there is High CPU on systems is, if SQL server is consuming the CPU resource or other applications/service.

 

Use query in  THIS LINK to get CPU usage history (or) Task manager (or) Perfmon counter to determine that. In Perfmon, Process %Process time can also be used. Remember this counter is not based on 100%.  It is based on number of processor.  If you see 200 for sqlservr.exe and the system has 8 CPU, CPU consumed by sqlservr.exe is 200 out of 800 (only 25%).)

 

If the CPU spike is caused by other application involve application team.

 

Next step is to determine if the CPU consumed is kernel time or user time.

 

We can use Process %Privileged  time and %user Time counters in perfmon. Task manager will show kernel times which will also help us understand

 

Kernel CPU:  In general, if kernel CPU remains below 10%, it’s normal.  But if you see sustained kernel CPU at 30% or above, you should start looking at system drivers , Antivirus etc.  some known issues which can increase Kernel CPU time are

1.       Few Anti-virus software’s can cause high kernel time.  Temporarily disable anti-virus software to rule this out

 

2.       We have seen high resolution timer in SQL 2008 or SQL 2005 SP3 caused high kernel time in Virtual Machines because of outdated BIOS .  Temporarily disabling high resolution timer by turning on trace flag 8038 (configure as startup parameter) to prove this. Check for BIOS update and do not use 8038 in long term.

 

High user CPU: Some of the most common causes for High CPU in SQL Server  are

1.       Query execution causing CPU spike (Most commonly caused by optimizer picking bad plan).

 

2.       High compiles and recompiles. (schema, Stats change, Use of Temp table, Recompile hint).

 

3.       System threads spiking CPU (Ghost cleanup, Lazy writer, Resource monitor).

 

4.       Running many traces.

 

 

1. Query execution causing CPU spike:

 

Query execution  takes long times and spikes CPU commonly because of in-correct cardinality estimates caused by outdated statistics, Lack of Index, Server configuration, Distributed queries, etc.

 

When the server is experiencing this problem run the query in below link to list all the queries which are executing in the server order by CPU time desc along with plan.

{

Get SQL Text and Query Plan for statements which are executing now

}

 It could be one query which is driving the majority CPU time or Multiple queries each driving the CPU. Look at the CPU time of the above query output.

 

If it is single query/Store procedure which is driving the majority of CPU.

 

1.        Update the stats of tables and indexes used by the query (If the stats are up to date Estimated rows and estimated execution will  be approximately

same in execution plan .If there is huge difference stats are out dated and requires update) .

 

2.       Identify if the query has used bad plan because of parameter sniffing (If the ParameterCompiledValue and ParameterRuntimeValue is different in XML plan). Refer THIS LINK to know more about Parameter Sniffing

 

3.        If updating the stats and fixing the parameter sniffing doesn’t resolve the issue it is more likely optimizer is not able to create efficient plan because of lack of indexes and correct statistics. Run the query which is driving the CPU in database tuning advisor and apply the recommendations. (You will find missing index detail in xml plan but DTA is more efficient). You can follow the steps in Tune queries using SQL Server Database tuning advisor .

 

4.       If the query which is spiking the CPU is linked server query try changing the security of linked server to ensure linked server user has ddl_admin or dba/sysadmin on the remote server. More details regarding the issue in THIS LINK.

 

5.       Ensure optimizer is not aborting early and creating bad plan. For details refer THIS LINK.

 

6.       Ensure the query which is spiking the CPU doesn’t have plan guides (xml plan will have PlanGuideDB attribute.  Also sys.plan_guides will have entries) and query hints(index= or (option XXX join) or inner (Join Hint) join).

 

7. Ensure that SET options are not changed.

 

If it is Multiple queries/Store procedure are driving the CPU together.

 

1.       Update the stats of all the tables and indexes in the database. Using the query in below link Rebuild index and update statistics for all the tables in database

 

2.       If updating stats doesn’t help and rebuilding the indexes doesn’t bring down the CPU we have to tune the queries 1 by 1.

 

3.       Ensure Large amount of RAM is not causing optimizer to choose inefficient plan http://support.microsoft.com/kb/2413549

 

4.       Ensure that we do not run many traces at same time (commonly from monitoring tools). Use query in below link to list all the active traces.

{

Find all the profiler traces running on SQL Server

}

 

2. If the system thread is consuming most of the CPU.

 

1.       If none of the SQL queries are consuming majority of CPU,  we can identify if the back ground threads is consuming the majority of CPU by looking at  sysprocesses output for background threads. select * from sys.sysprocesses where spid<51.

 

2.       Check if you are hitting any of the known issues.

{

Resource Monitor may consume high CPU: http://support.microsoft.com/kb/968722

The Ghost Cleanup task uses 100% of the CPU on an idle system in SQL Server 2008 or in SQL Server 2005: http://support.microsoft.com/?id=978430

}

 

3. High compiles and recompiles: I will blog about high compiles and recompiles shortly

                               

4. Other factors which can impact SQL Server query performance

1.       Maximum degree of parallelism. Ensure MAX DOP is set properly (you can follow the steps in How to set Max degree of parallelism (MAXDOP)

 

2.       Priority boost. (Do not enable priority boot)

 

3.       Do not enable Fiber mode.

 

4.       Tweaking affinity mask (Spikes few CPU).

 

5.       TokenAndPermUserStore. http://support.microsoft.com/kb/927396

 

6.       CPU power plan degrade the server performance http://support.microsoft.com/kb/2207548

 

7.       SQL Server that’s uses  .Net Framework can cause high CPU Refer THIS LINK

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Related blogs: 

Tuning SQL Server query

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Configuration, Optimizer, Performance, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , , , , | 7 Comments »

SQL Server Latch & Debugging latch time out

Posted by Karthick P.K on September 7, 2012

  

 

In a multithreaded process what would happens when a one thread updates a data or index page in memory while second thread is reading the same page?

What will happen when 1st  thread reads a data/index page in memory while 2nd thread is freeing the same page from memory?

Answer: We would end up with data or data structure inconsistency. To avoid inconsistency SQL Server uses Synchronization Mechanisms like Locks,Latches and Spinlocks.

 

We will discuss few key points about latches and how to debug latch timeout dumps in this blog.

What is Latch ?

They control the concurrent access to data pages and structures by multiple threads.  Latches provide physical data consistency of data pages and provide synchronization for data structures. Latches are not controllable by user like locks.

 

Types of the Latch:

Buffer (BUF) Latch

Used to synchronize access to BUF structures and their associated database pages.

 

Buffer “IO” Latch

A subset of BUF latches used when the BUF and associated data/index page is in the middle of an IO operation (Reading page from disk or writing page to disk).

 

Non-Buffer (Non-BUF) Latch

These are latches that are used to synchronize general in-memory data structures generally used by queries/tasks executed by parallel threads, auto grow operations , shrink operations etc. 

 

Latch modes

 

Keep (KP) Latches

Used to ensure that the page is not released from memory while it is in use. 

Shared (SH) Latches

Used for read-only access to data structures and prevent write access by others threads.

This mode allows shared access. 

SH is compatible with KP, SH, and UP.  It should be noted that although in general SH implies read-only access, it is not always the case. For buffer latches SH is the minimum mode required in order to read a data page.

Update (UP) Latches

Allows read access to the data structure(Compatible with SH and KP), but prevents other EX-latch access. 

Used for write operations when torn page detection is off and when AWE is not enabled.

Exclusive (EX) Latches

Prevents any read activity from occurring on the latched structure. EX is only compatible with KP.

Used during read IO during write IO when torn page detection is on or AWE is enabled.

Destroy (DT) Latches

Used when removing BUFs from the buffer pool, either by adding them to the free list or unmapping AWE buffers. 

 

 

Latch compatibility

  KP SH UP EX DT
KP Y Y Y Y N
SH Y Y Y N N
UP Y Y N N N
EX Y N N N N
DT N N N N N

 

 

How do you identify Latch contention?

 

Latch contention can be identified using below wait types in sysprocesses.

 

PAGEIOLATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on a physical I/O of a buffer pool page to complete. 

                                            1. PAGEIOLATCH_* are commonly solved by tuning the queries which are performing heavy IO (Commonly by adding, changing and removing indexes (or) statistics to reduce the amount of physical IO).

                                 2. Identifying if there is disk bottleneck and fixing them (Pageiolatch wait times (ex > 30 ms))

                      

                      

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO. 

1.       This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid. The wait_resource is the page number (the format is  dbid:file:pageno)

          that is being accessed. 

2.       We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine  whether contention  is for allocation, data or text.

3.       If the pages that SQL Server is most frequently waiting on are in tempdb database ,check the wait resource column for a page number in dbid 2. You may be facing tempdb allocation latch contention mentioned in    http://support.microsoft.com/kb/328551

4.       If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table.  In this case we need to choose a different

          clustered index key to spread the work across different pages.

 

LATCH_*:    Non-buf latch waits can be caused by variety of things.  We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101). 

1.       A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

2.       Auto Grow and auto shrink.

 

 

 

When a latch is requested by thread and If  that latch cannot be granted immediately because of some other thread holding a incompatible latch on same page or data structure then  the requestor must wait for the latch to be grantable.  Warning messages like one below is printed in SQL Server error log and a mini dump with all the threads is captures if the wait interval reaches 5 minutes (waittime 300). The warning message differs for buffer and non-buffer latches.

 

 

844: Time out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p.  Continuing to wait.

 

846: A time-out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit Id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Not continuing to wait.

 

847: Timeout occurred while waiting for latch: class ‘%ls’, id %p, type %d, Task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Continuing to wait.

 

Break up of above warning

type

The latch mode of the current latch acquire request.  This is a numerical value with the following mapping:  0 – NL (not used); 1 – KP; 2 – SH; 3 – UP; 4 – EX; 5 – DT.

 

task

Task for which we are trying to acquire latch.

 

Waittime

The total time waited for this latch acquire request in seconds.

 

owning task

The address of the Task that owns the latch, if available.

 

bp (Buffer latches only)

The address of the BUF structure corresponding to this buffer latch.

 

page (Buffer latches only.)

The page id for the page currently contained in the BUF structure.

 

database id (Buffer latches only.)

The database id for the page in the BUF.

 

 

Like troubleshooting blocking issues in SQL Server when there is a latch contention or timeout dump identify the owner of latch and troubleshoot why the latch is held by the owner for long time.

 

When there is latch timeout dump you will see a warning message similar to one below. Warning error message printed in SQL server errorlog before the dump is very important to find the owner thread of latch.

 

{

 

2012-01-18 00:52:03.16 spid69      A time-out occurred while waiting for buffer latch — type 4, bp 00000000ECFDAA00, page 1:6088, stat 0x4c1010f, database id: 4, allocation unit Id: 72057594043367424, task 0x0000000006E096D8 : 0, waittime 300, flags 0×19,

owning task 0x0000000006E08328. Not continuing to wait.

spid21s     **Dump thread – spid = 21, PSS = 0x0000000094622B60, EC = 0x0000000094622B70

spid21s     ***Stack Dump being sent to E:\Data\Disk1\MSSQL.1\MSSQL\LOG\SQLDump0009.txt

spid21s     * *******************************************************************************

spid21s     * BEGIN STACK DUMP:

spid21s     *   02/28/12 00:32:03 spid 21

spid21s     * Latch timeout

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_COUNT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_VIRTUAL_ROOT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

}

From the error message above we can easily understand we are trying to acquire latch on database id: 4, page 1:6088 (6088 page of first file) and we timed out because task 0x0000000006E08328 (owning task 0x0000000006E08328 in warning message)  is holding a latch on it.

Note: Task is simply a work request to be performed by the thread. (such as system tasks, login task, Ghost cleanup task etc.). Threads which execute the task will take required latches on need.

Let us see how to analyze latch timeout dump and get the owning thread of the Latch using the  owning task 0x0000000006E08328.

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr
start             end                 module name
00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb
    Loaded symbol image file: sqlservr.exe
    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe
    Image name: sqlservr.exe
    Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)
    CheckSum:         025FEB5E
    ImageSize:        02679000
    File version:     2005.90.4266.0
    Product version:  9.0.4266.0
    File flags:       0 (Mask 3F)
    File OS:          40000 NT Base
    File type:        1.0 App
    File date:        00000000.00000000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Step 5:

Use the below command to search thread stack to identify the thread which has reference to the owning task and it will be the thread which is owning the latch. Replace 0X0000000006E08328 with owning task in your errorlog

~*e .echo ThreadId:; ?? @$tid; r? @$t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? @$t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; s -d @$t1 @$t2 0X0000000006E08328 

ThreadId:
unsigned int 0x93c
ThreadId:
unsigned int 0x9a0
ThreadId:
unsigned int 0x9b4
00000000`091fdaf0  06e08328 00000000 00000000 00000000  (……………
00000000`091fdcb8  06e08328 00000000 091fdd70 00000000  (…….p…….
00000000`091fded0  06e08328 00000000 06e0e798 00000000  (……………
00000000`091fdf38  06e08328 00000000 00000002 00000000  (……………
00000000`091fec60  06e08328 00000000 0168883a 00000000  (…….:.h…..
00000000`091ff260  06e08328 00000000 000007d0 00000000  (……………
00000000`091ff2d0  06e08328 00000000 00000020 00000000  (……. …….
00000000`091ff5f8  06e08328 00000000 800306c0 00000000  (……………
00000000`091ff6c0  06e08328 00000000 00000000 00000000  (……………
00000000`091ff930  06e08328 00000000 00000000 00000001  (……………
00000000`091ff9b8  06e08328 00000000 00000000 00000000  (……………
00000000`091ffa38  06e08328 00000000 00000000 00000000  (……………
00000000`091ffc10  06e08328 00000000 03684080 00000000  (……..@h…..
00000000`091ffc90  06e08328 00000000 00000000 00000000  (……………
ThreadId:
unsigned int 0x9b8
ThreadId:
unsigned int 0x9bc
ThreadId:
unsigned int 0x9c0
……………
…………..

 

Step 6:

From the above out put we see thread 0x9b4 has reference to the pointer of owning task and it will be the thread which is owning the latch. Let us switch to the thread(0x9b4 ) which is executing the owning task and

then go through the stack to see why the thread is owning the latch for long time.

Step 7:

   ~~[0x9b4]s      ==> Switching to the thread (Replace 0x9b4 with your thread id which has reference to the po
ntdll!ZwWaitForSingleObject+0xa:
00000000`77ef047a c3              ret

Step 8:

0:002> kC  ==>  Print the stack
Call Site
ntdll!ZwWaitForSingleObject
kernel32!WaitForSingleObjectEx
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::Suspend
sqlservr!SOS_Event::Wait
sqlservr!BPool::FlushCache
sqlservr!checkpoint2
sqlservr!alloca_probe
sqlservr!ProcessCheckpointRequest
sqlservr!CheckpointLoop
sqlservr!ckptproc
sqlservr!SOS_Task::Param
::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!endthreadex
msvcr80!endthreadex

From the above stack we can understand that the thread which is owning the latch is executing checkpoint and flushing cache (Dirty buffers) to disk. If flushing buffers to disk (checkpoint) is taking a long time, then obviously there is disk bottleneck.

Similarly for any other latch time out issues first identify the owner thread of latch, read the stack of owner thread to understand the task performed by owner thread and troubleshoot the performance of task performed by owner thread.

If you want to see the stack of thread which is waiting, then pickup the task (task 0x0000000006E096D8 )from latch timeout warning message in errorlog instead of owning task (task 0x0000000006E08328) and use the command mentioned in step 5.

I hope this post will help you to learn and debug the latch timeout issues.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Related posts: Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, SQL General, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , , , , | 17 Comments »

SQL Server: Ghost records

Posted by Karthick P.K on August 30, 2012

 

What is Ghost record?

When a record is deleted from a clustered index data page or non-clustered index leaf page or a versioned heap page or a forwarded record is recalled, the record is logically removed by marking them as deleted but not physically removed from the page immediately.

Pages which are marked as deleted but actually not deleted physically are called Ghost Records.

 

Why?

Allowing delete or update operation run faster and to ensure faster rollback if the transaction is cancelled or roll backed.

 

Who will remove the records which are marked for deletion?

Ghostcleanuptask: SQL Server Ghostcleanuptask thread physically removes the records which are marked as deleted.

 

How Ghost cleanup task works?

1.       Ghostcleanuptask thread wakes up every 10 seconds.

2.       Sweep databases one by one starting from master.

3.       Skip the database if it is not able to take a shared lock for database (LCK_M_S) or database is not in Open read/write state.

4.       Scans the PFS pages of the current database to get the pages which has ghost records.

PFS Page: A PFS page occurs once in 8088 pages. SSQL Server will attempt to place a PFS page on the first page of every PFS interval(8088Pages). The only time a PFS page is not the first page in its interval is in the first interval for a file.

In this case, the file header page is first, and the PFS page is second. (Page ID starts from 0 so the first PFS page is at Page ID 1)

clip_image002_thumb

5.       Remove the records which are marked as deleted (ghosted) physically.

 

 

The GHOST_VERSION_RECORD is used in cases where versioned heap records, or BLOB fragments are deleted and when versioned forwarded heap rows are recalled back to their originating page. The records must be ghosted rather than deleted so the versioning chain can be maintained. The GHOST_VERSION_RECORDs cannot be expunged until all transactions that have references to the original data record have finished.

 

 

Let us see how the ghost records are marked for deletion and deleted later using below example.

 

Note: We will be using few trace flags for testing purpose. Use this trace flags and below steps only in your testing servers.

Trace flag 3604 Directs SQL Server to send output some of the command’s to client executing the command. Ex.  When we execute  DBCC PAGE or DBCC TRACESTATUS(-1) we will not see results in SSMS or any other client unless Trace flag 3604 is on

Trace flag   661 Disables ghost record cleanup task.  When this trace flag is enabled Ghost cleanup task will not run. 

Trace flag   662 along with 3605 prints the output of Ghost cleanup task to SQL Server error log ( DBCC TRACEON (662, 3605,-1))

 

 

 

Step1:

//Create a database named Ghost test

create database Ghosttest

go

use Ghosttest

go

 

Step 2:

//Create a table named ‘a’ with clustered and non-clustered index.

CREATE TABLE [dbo].[a](

      [a] [char](3000) NULL,

      [b] [char](3000) NULL,

      [c] [int] NULL

) ON [PRIMARY]

 

CREATE Clustered INDEX [CIX_a] ON [dbo].[a]

(

      [c] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

 

 

 

CREATE  INDEX [IX_a] ON [dbo].[a]

(

      [c] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

 

 

 

Go

 

Step 3:

//Insert 1000 records in to the table.

insert into  a values (‘Ghost’,‘Record’,1);

go 1000

 

Step 4:

//Enable trace flag 3604 to see the output of DBCC command from client like SSMS.

 

dbcc traceon (3604,-1)

 

go

 

Step 5:

// Disable Ghost record cleanup task to see how the ghost records are marked in data/index pages.   (Don’t enable in production)

 

dbcc traceon (661,-1)

go

 

Step 6:

//Delete the 1000 records which we inserted in table ‘a’ to generate ghost records (Records marked for deletion but not physically deleted).

 

delete from a

go

 

Step 7:

//Ghost cleanup task first scans the PFS pages in  current database to get the pages which has ghost records. Let us dump the PFS pages to see how the pages with ghost records are marked

// Watch the text which are in red or search for Has Ghost in the output of DBCC page.

DBCC PAGE(‘Ghosttest’,1,1,3)

Go

 

PAGE: (1:1)

 

 

BUFFER:

 

 

BUF @0x00000000FEC0ED80

 

bpage = 0x00000000CA024000           bhash = 0×0000000000000000           bpageno = (1:1)

bdbid = 9                            breferences = 0                      bcputicks = 0

bsampleCount = 0                     bUse1 = 62975                        bstat = 0xc0010b

blog = 0×79797979                    bnext = 0×0000000000000000          

 

PAGE HEADER:

 

 

Page @0x00000000CA024000

 

m_pageId = (1:1)                     m_headerVersion = 1                  m_type = 11

m_typeFlagBits = 0×1                 m_level = 0                          m_flagBits = 0×0

m_objId (AllocUnitId.idObj) = 99     m_indexId (AllocUnitId.idInd) = 0    Metadata: AllocUnitId = 6488064

Metadata: PartitionId = 0            Metadata: IndexId = 0                Metadata: ObjectId = 99

m_prevPage = (0:0)                   m_nextPage = (0:0)                   pminlen = 0

m_slotCnt = 1                        m_freeCnt = 2                        m_freeData = 8188

m_reservedCnt = 0                    m_lsn = (50:335:27)                  m_xactReserved = 0

m_xdesId = (0:0)                     m_ghostRecCnt = 0                    m_tornBits = 412090936

 

Allocation Status

 

GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED           PFS (1:1) = 0×44 ALLOCATED 100_PCT_FULL

DIFF (1:6) = CHANGED                 ML (1:7) = NOT MIN_LOGGED           

 

PFS: Page Alloc Status  @0x000000002205A000

 

(1:0)        – (1:3)        =     ALLOCATED 100_PCT_FULL                             

(1:4)        – (1:5)        = NOT ALLOCATED   0_PCT_FULL                             

(1:6)        – (1:7)        =     ALLOCATED 100_PCT_FULL                             

(1:8)        -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:9)        -              =     ALLOCATED 100_PCT_FULL                     Mixed Ext

(1:10)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:11)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:12)       -              =     ALLOCATED 100_PCT_FULL           IAM Page  Mixed Ext

(1:13)       – (1:14)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:15)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:16)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:17)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:18)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:19)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:20)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:21)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:22)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:23)       – (1:25)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:26)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:27)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:28)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:29)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:30)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:31)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:32)       -              =     ALLOCATED  50_PCT_FULL                     Mixed Ext

(1:33)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:34)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:35)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:36)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:37)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:38)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:39)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:40)       – (1:41)       =     ALLOCATED   0_PCT_FULL                             

(1:42)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:43)       -              = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:44)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:45)       – (1:46)       = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:47)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:48)       – (1:50)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:51)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:52)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:53)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:54)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:55)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:56)       – (1:61)       =     ALLOCATED   0_PCT_FULL                             

(1:62)       -              = NOT ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:63)       -              = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:64)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:65)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:66)       – (1:71)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:72)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:73)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:74)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:75)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:76)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:77)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:78)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:79)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:80)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:81)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:82)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:83)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:84)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:85)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:86)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:87)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:88)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:89)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:90)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:91)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:92)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:93)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:94)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:95)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:96)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:97)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:98)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:99)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:100)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:101)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:102)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:103)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:104)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:105)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:106)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:107)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:108)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:109)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:110)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:111)      – (1:116)      =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:117)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:118)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:119)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:120)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:121)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:122)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:123)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:124)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:125)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:126)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:127)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:128)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:129)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:130)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:131)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:132)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:133)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:134)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:135)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:136)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:137)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:138)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:139)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:140)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:141)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:142)      – (1:143)      =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:144)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:145)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:146)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:147)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:148)      -              =     ALLOCATED  50_PCT_FULL                     Mixed Ext

(1:149)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:150)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:151)      -              =     ALLOCATED 100_PCT_FULL                     Mixed Ext

(1:152)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:153)      – (1:158)      =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:159)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:160)      – (1:383)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:384)      – (1:385)      =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:386)      – (1:391)      = NOT ALLOCATED   0_PCT_FULL                             

(1:392)      – (1:497)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:498)      – (1:499)      =     ALLOCATED   0_PCT_FULL                             

(1:500)      – (1:655)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:656)      -              =     ALLOCATED   0_PCT_FULL                             

(1:657)      – (1:663)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:664)      – (1:671)      = NOT ALLOCATED   0_PCT_FULL                             

 

Step 8:

//Dump one of the page which is marked as having ghost record in PFS page. I have picked Page 158 for example.

//Look at m_ghostRecCnt

 

dbcc page(‘Ghosttest’,1,158,3)

go

 

 

PAGE: (1:158)

 

 

BUFFER:

 

 

BUF @0x00000000FEC02CC0

 

bpage = 0x00000000C9E22000           bhash = 0×0000000000000000           bpageno = (1:158)

bdbid = 9                            breferences = 0                      bcputicks = 933

bsampleCount = 1                     bUse1 = 64101                        bstat = 0xc0010b

blog = 0x2159bb79                    bnext = 0×0000000000000000          

 

PAGE HEADER:

 

 

Page @0x00000000C9E22000

 

m_pageId = (1:158)                   m_headerVersion = 1                  m_type = 1

m_typeFlagBits = 0×4                 m_level = 0                          m_flagBits = 0×0

m_objId (AllocUnitId.idObj) = 28     m_indexId (AllocUnitId.idInd) = 256 

Metadata: AllocUnitId = 72057594039762944                                

Metadata: PartitionId = 72057594038845440                                 Metadata: IndexId = 1

Metadata: ObjectId = 2105058535      m_prevPage = (1:157)                 m_nextPage = (1:160)

pminlen = 3508                       m_slotCnt = 2                        m_freeCnt = 1054

m_freeData = 7134                    m_reservedCnt = 0                    m_lsn = (37:160:37)

m_xactReserved = 0                   m_xdesId = (0:1983)                  m_ghostRecCnt = 2

m_tornBits = 602498023              

 

Allocation Status

 

GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED          

PFS (1:1) = 0×68 MIXED_EXT ALLOCATED   0_PCT_FULL                         DIFF (1:6) = CHANGED

ML (1:7) = NOT MIN_LOGGED           

 

Slot 0 Offset 0×60 Length 3519

 

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Record Size = 3519                  

Memory Dump @0x000000001CEBA060

 

 

//There are 2 ghost records in page 158 (m_ghostRecCnt=2)

 

Step 8:

//Additionally we can query sys.dm_db_index_physical_stats to find the number of ghost records.

 

SELECT object_name(object_id) as Name, record_count, GHOST_RECORD_COUNT,Version_ghost_record_count,INDEX_TYPE_DESC, ALLOC_UNIT_TYPE_DESC

FROM sys.dm_db_index_physical_stats (DB_ID(N‘Ghosttest’), NULL, NULL, NULL , ‘DETAILED’) where index_level=0

go

clip_image004_thumb

 

 

Step 8:

//Let us enable trace flag 662 and 3605  to print the output of ghost cleanup task to SQL error log

  

dbcc traceon (662,3605,-1)

go

 

Step 9:

//Let us disable trace flag 661 to start the Ghost cleanup task again and let us disable 3604  which we enabled to print output of DBCC command in client (SSMS).

 

dbcc traceoff (661,3604,-1)

go

 

Purging page Dbid 9, File 1, Page 144

Purging page Dbid 9, File 1, Page 146

Purging page Dbid 9, File 1, Page 153

Purging page Dbid 9, File 1, Page 154

Purging page Dbid 9, File 1, Page 155

Purging page Dbid 9, File 1, Page 156

Purging page Dbid 9, File 1, Page 157

Purging page Dbid 9, File 1, Page 158

Purging page Dbid 9, File 1, Page 160

Purging page Dbid 9, File 1, Page 161

 

We will see the pages purged by the ghost cleanup task in the SQL Server error log.

Step 10:

//  We can also capture Ghost cleanup task in profiler by selecting SQL Transaction and filter by object name like  “GhostCleanupTask”

image

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in DBCC, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , , , | 11 Comments »

I/O requests taking longer than 15 seconds to complete on file

Posted by Karthick P.K on August 27, 2012

Do you see warnings like one below in your SQL Server error log?

SQL Server has encountered  x occurrence(s) of I/O requests taking longer than 15 seconds to complete on file .

The OS file handle is 0x000006A4. The offset of the latest long I/O is: 0×00000

(or)

BobMgr::GetBuf: Sort Big Output Buffer write not complete after n seconds.

This indicates SQL Server I/O Bottlenecks. SQL Server performance highly relies on the Disk performance.  SQL Server I/O Bottleneck can be identified through

1. PAGEIOLATCH_xx or WRITELOG wait types in Sys.Sysprocesses and other DMV’s

2. I/O taking longer than 15 seconds in SQL Server Error log.

{

SQL Server has encountered X occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [ ] in database [IOTEST (7). The OS file handle is 0x000006A4. The offset of the latest long I/O is:
0×000001

}

3. By looking at I/O latch wait statistics in sys.dm_os_wait_stats

{

Select  wait_type,         waiting_tasks_count,         wait_time_ms  from    sys.dm_os_wait_stats where    wait_type like ‘PAGEIOLATCH%’
order by wait_type

}

4. By looking at pending I/O requests and isolating the disks,File and database in which we have I/O Bottleneck.

{

select     database_id,     file_id,     io_stall,     io_pending_ms_ticks,     scheduler_address from    sys.dm_io_virtual_file_stats(NULL, NULL)t1,         sys.dm_io_pending_io_requests as t2
where    t1.file_handle = t2.io_handle

}

Following are common reasons for I/O Bottleneck in SQL Server.

1. SQL Server is spawning more I/O requests than what I/O disk subsystem could handle.

2 . There could be an Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem (or) Compression and  so the Disks are performing very slow and hence SQL Server is affected.

3. Some other process on the system is saturating the disks with I/O requests. Common application includes AV Scan,System Backup Etc. So I/O requests posted by SQL Server becomes slow.

 

How to  troubleshoot?

1.  Exclude SQL Server files from antivirus scan.

2. Do not place SQL Server FILES on compressed drives.

3. Distribute SQL Server data files and transaction log files across drives.

4. If the “I/O request taking longer” warning is for tempdb , Enable trace flag 1118 and increase the tempdb data files refer:http://support.microsoft.com/kb/2154845

5. If none of the above resolves the issue collect the below perfmon counters.

 

Perfmon counters can help us in understanding “If disk is slow” or  “SQL Server is spawning more I/O then what disk could handle” or “Some other process is saturating disk with I/O”

Note:It is important to get  throughput of the disk subsystem in MB/SEC before we look at disk counters. Normally it will be more than 150 MB for SAN disk and greater 50 MB for Single disk .When you look at the perfmon counter look at Max value.

 

Avg. Disk sec/Transfer –> Time taken to perform the I/O operation

Ideal value for Disk sec/Transfer is 0.005-0.010 sec. If you consistently notice this counter is beyond 0.015 then there is a serious I/O bottleneck.

1. Look for Disk Bytes /sec when Avg. Disk sec/Transfer  is greater than 0.015. If it is below 200 MB for SAN disk and Below 50 MB for Single disk then the problem is with I/O subsystem Engage hardware vendor.

2. If the Disk Bytes /sec  is greater than  200 MB for SAN disk or greater than 50 MB for Single disk when the  Avg. Disk sec/Transfer  is greater than 0.015. Look at the Process:IO Data Bytes/Sec for the same time and identify which process is spawning I/O. If the identified process is not SQL Server involve the team which supports that process. If the  the identified process is SQL Server tune SQL Server queries which are I/O intensive by creating dropping indexes etc.

 

Disk Bytes /sec  –> Total read and write to disk per second in bytes.

Collect the values for each logical disks in which SQL Server files are placed and look at the Max value for this counter ideally it has to be greater than the throughput of the disk subsystem. If you don’t have the throughput for the disk then this value to be greater than 200MB for SAN or greater than 50 MB for single disk.

If it is below the expected value you can consider that your disks are not performing well. Involve the hardware vendor. 

Important: Value for this counter will be low when there is no I/O happening on the drives. So you have to look at the this counter during the time you see I/O warnings or When Disk sec/Transfer >0.010 for the same drive. 

 

Process:IO Data Bytes/Sec –> Total read and write to disk per second in bytes by each process.

Collect this counter for all the processes running on the server. This counter will help us understand if any other process is saturating the disk with excessive I/O.

Example: Let us consider a disk with max throughput of 250MB per second. If antivirus is spawning 200MB of I/O per second and if SQL Server data files are placed in same drive and SQL Server is spawning 150MB obviously there will be I/O waits.

 

Buffer Manager: Page Read/sec + Page Writes/sec –>Total read and write to disk per second in bytes by SQL Server process.

Note: If you are analyzing the .BLG file collected and not live perfmon focus on Maximum value for each counter don’t look at average.  

 

If (Avg. Disk sec/Transfer> ==0.015 ) and ( (Disk Bytes /sec < 150MB (For San)) or (Disk Bytes /sec < 50MB (For Local) or (Disk Bytes /sec < Speed of disk as per Vendor ))

{

There is Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem.

}

If (Disk sec/Transfer > ==0.015 Consistently) and ( (Disk Bytes /sec >= 150 (For San)) or (Disk Bytes /sec >= 50MB (For Local) or (Disk Bytes /sec >= Speed of disk as per Vendor ))

{

Identify the process which is posting excessive I/O request using Process:IO Data Bytes/Sec.

If ( Identified process == SQLServer.exe )

{

Identify and tune the queries which is Spawning excessive I/O.

(Reads+Writes column in profiler, Dashboard reports or sys.dm_exec_query_stats and sys.dm_exec_sql_text

can be used to identify the query). Use DTA to tune the query

}

If ( Identified process != SQLServer.exe )

{

Engage the owner of application which is spawning excessive I/O

}

}

Many thanks to Joseph Pilov from whom I learned many techniques like the one above.

 

If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K  |My Facebook Page |My Site| Blog space| Twitter

Posted in Performance, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , | 7 Comments »

Database Mail errors in SQL Server (Troubleshooting steps)

Posted by Karthick P.K on August 25, 2012

Troubleshooting Database Mail issues in SQL Server

 

 

Use the Database Mail Configuration Wizard, change the Logging Level to Verbose and send a test mail to investigate the point of failure.

 

Right click database mail –View database mail log to see error or we can SELECT * FROM msdb.dbo.sysmail_event_log ;

 

Check the sent_Status column in the sysmail_allitems table. The four values are sent, unsent, retrying and failed.

If the status is sent and the recipients  hasn’t received the email yet, that the Database Mail external program successfully delivered the e-mail message to the SMTP server but it failed to deliver the message to the final recipient. At this point, the SMTP needs to be troubleshooted (perhaps engaged your Exchange or Mail server team)

 

If the status is unsent or retrying, it means that the Database Mail has not yet processed the e-mail message or is in the process of retrying after a failed attempt. This could be due to network conditions, volume of messages, SMTP server issues, etc. If the problem persists, use another profile or another mail host database.

 

If the status is failed, it means that the Database Mail was unable to deliver the message to the SMTP server. Check the sysmail_log table and the destination address. Also be sure that there are no Network or SMTP issues.

 

Send a test email outside SQL Server using below script or Other mail clients and check if the  recipients   are receiving mails. If they do not receive problem is outside SQL Server. Engage Exchange or other mail server teams to identify why we are not able to send emails from below script or Office outlook or Other mail clients.

Set objMessage = CreateObject("CDO.Message")
 objMessage.Subject = "Hello"
 objMessage.From = """SENDER NAME""<e-mail ID>"
 objMessage.To = "To address@mssqlwiki.com"
 objMessage.HTMLBody = "<h1><font face=arial>Hello,<br>How are you?."
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "smtphost.dns.Mailserver.com"
 objMessage.Configuration.Fields.Item _
 ("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 2
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpusessl") = False
objMessage.Configuration.Fields.Item _
("http://schemas.microsoft.com/cdo/configuration/smtpconnectiontimeout") = 60
objMessage.Configuration.Fields.Update
objMessage.Send

If the mail has successfully reached  to recipients from above script problem is with in SQL Server mail configuration. 
Verify the following

 

1.Verify if  Service Broker is enabled (select is_broker_enabled from sys.databases where name=‘MSDB’ (0 – disabled, 1- enabled).

To enable service broker on your database run the following query: ALTER DATABASE MSDB SET ENABLE_BROKER

Note: You will be required to have exclusive access to the database while running this statement.  If you do not you will get the following error message: 
Msg 5061, Level 16, State 1, Line 1. ALTER DATABASE failed because a lock could not be placed on database MSDB. Try again later.

                               Msg 5069, Level 16, State 1, Line 1

ALTER DATABASE statement failed.

You will have to stop SQL Server agent to enable broker on MSDB

2.Check if Database mail stored procedures are  enabled (Surface Area Configuration >> “Surface Area Configuration for Features” >> Under MSSQLSERVER, expand Database Engine, and then click Database Mail. >> Ensure that Enable Database Mail stored procedures is selected, and then click Apply).

 

3.Check if the user is part of DatabaseMailUserRole.

 

4.Check what parameters and values are used in configuration by running

   exec msdb..sysmail_help_configure_sp

   A list of default values are given in BOL, topic: “sysmail_help_configure_sp (Transact-SQL)”. To modify a parameter or value you can use the following stored procedure

   exec msdb..sysmail_configure_sp ‘parameter_name’, ‘parameter_value’

 Check if ReadFromConfigurationFile is enabled if yes check if the DatabaseMail90.exe.config file (The default path is < drive >\Program Files\Microsoft SQL   Server\MSSQL.1\MSSQL\Binn) and has proper parameters.

 

5.Verify that the Database Mail executable is located in the correct directory – e.g. C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn

 

6.Verify that the service account for SQL Server has permission to run the executable, DatabaseMail90.exe, which requires network access to the SMTP servers specified in Database Mail accounts. Therefore, the service account for SQL Server must have permission to access the network, and the SMTP servers must allow connections from the computer that runs SQL Server.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Configuration, Database mail, SQL General | Tagged: , | 3 Comments »

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

Posted by Karthick P.K on August 21, 2012

Do you see below errors in SQL error along with dumps and stuck?

Non-yielding IOCP Listener

* BEGIN STACK DUMP:
*   05/06/12 03:54:59 spid 0
* Non-yielding IOCP Listener

Non-yielding Scheduler
* BEGIN STACK DUMP:
*   04/16/12 10:09:58 spid 6256
* Non-yielding Scheduler

Non-yielding Resource Monitor

* BEGIN STACK DUMP

*   01/22/09 19:11:16 spid 0

* Non-yielding Resource Monitor

External dump process returned no errors.
Date Time Server Process 0:0:0 (0x31e8) Worker 0x000000016F41d140 appears to be non-yielding on Scheduler 4. Thread creation time: 12010668087858. Approx Thread CPU Used: kernel 2 ms, user 60516 ms. Process Utilization 11%. System Idle 83%. Interval: 71227 ms.

Refer “How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps”  for analyzing the Non-yielding Scheduler, Non-yielding IOCP Listener and Non-yielding Resource Monitor Dumps.

If you are interested in just finding a quick resolution follow the below steps to get the Non-Yield stack from the dump and check if it is matching with any existing known issues in SQL Server. 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg

step 2:

Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 3:

on command window type    
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 4:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 5:

Type  .cxr sqlservr!g_copiedStackInfo+0X20   for SQL Server2005 and SQL Server2008/2008R2  (or)   .cxr sqlmin!g_copiedStackInfo+0X20  for SQL Server2012.

Type kc 100 and look at the stack to see if it matches with the stack of any of known issues in SQL Server listed below.

If kc 100 doesn’t display any stack and throws “WARNING: Frame IP not in any known module. Following frames may be wrong”  type .cxr to reset to default scope and try  .cxr sqlservr!g_copiedStackInfo+0X00c (In 32-Bit (X86)  SQL server valid offset for context is 0X00c Look at This blog to see how we identified the offset)

Note:If your stack doesn’t match with any of the stack  listed below then paste the stack in comments session of this blog (or) In This face book group.We will try to find the cause for you. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

 

Stack 1

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!ExecuteSql

sqlservr!CSpecProc::ExecuteSpecial

sqlservr!CXProc::Execute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB :2344600:FIX: "Non-yielding Scheduler" error may occur when you use the CONTAINSTABLE function together with many OR and AND predicates in SQL Server 2008 or in SQL Server 2008 R2

Stack 2

sqlservr!TMatchPattern

sqlservr!FMatchStrTxt

sqlservr!I8CharindexStrBhI8

sqlservr!CEs::GeneralEval4

sqlservr!CXStmtCond::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands 0x12a

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

2633357 FIX: "Non-yielding Scheduler" error might occur when you run a query that uses the CHARINDEX function in SQL Server 2008 R2

Stack 3

sqlservr!CItvlVal::Copy

sqlservr!CConstraintItvl::PcnstrItvlUnion

sqlservr!CConstraintProp::FBuildItvlFromOr

sqlservr!CConstraintProp::FBuildItvlFromPexpr

sqlservr!CConstraintProp::FAndItvlConstraint

sqlservr!CConstraintProp::AndNewConstraint

sqlservr!CConstraintProp::PcnstrDeriveSelect

sqlservr!CLogOp_Select::PcnstrDerive

sqlservr!CLogOpArg::PcnstrDeriveHandler

sqlservr!CLogOpArg::DeriveGroupProperties

sqlservr!COpArg::DeriveNormalizedGroupProperties

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!COptExpr::DeriveGroupProperties 0xc6

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

KB: 982376 FIX: A non-yielding scheduler error or an error 8623 occurs when you run a query that contains a large IN clause in SQL Server 2005,SQL Server 2008, or SQL Server 2008 R2

Stack 4

sqlservr!COptExpr::AdjustParallelPlan

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtSelect::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB: 943060 FIX: A query that has many outer joins takes a long time to compile in SQL Server 2005

Stack 5

sqlservr!CXid::GetBlockingTask

sqlservr!SNode::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchAndResolve

sqlservr!DeadlockMonitor::SearchTaskAndResolve

sqlservr!DeadlockMonitor::WorkLoop

sqlservr!lockMonitor

sqlservr!lockMonitorThread

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

956854 Cumulative update package 10 for SQL Server 2005 Service Pack 2

Stack 6

ntdll!ZwQueryAttributesFile

ntdll!RtlDoesFileExists_UstrEx

ntdll!LdrpSearchPath

ntdll!LdrpCheckForLoadedDll

ntdll!LdrpLoadDll

ntdll!LdrLoadDll

kernel32!LoadLibraryExW

mswsock!SockLoadHelperDll

mswsock!SockGetTdiName

mswsock!SockSocket

mswsock!WSPSocket

ws2_32!WSASocketW

ws2_32!WSASocketA

sqlservr!CreateSocket

sqlservr!AcceptObject::AsyncAccept

sqlservr!Tcp::AcceptDone

sqlservr!SNIAcceptDoneWithReturnCode

sqlservr!SNIAcceptDoneWrapper

sqlservr!SNIAcceptDoneRouter

sqlservr!SOS_Node::ListenOnIOCompletionPort

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 2711549 FIX: An error message is logged when you start SQL Server 2008 R2 or when a client sends a request to SQL Server 2008 R2

Stack 7

ntdll!ZwOpenKey

advapi32!LocalBaseRegOpenKey

advapi32!RegOpenKeyExW

sqlservr!COledbConnect::GetProviderOptions

sqlservr!COledbConnect::SetClsidFromProvider

sqlservr!COledbConnect::Init

sqlservr!CStmtExecProc::XretRemoteExec

sqlservr!CRemoteProcExecLevel::Execute

sqlservr!CStmtExecProc::XretWrapRemoteExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExec::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

KB2468047 FIX: Error code 17883 or "Non-yielding Scheduler" error may occur when you use the OPENQUERY function on SQL Server 2005

Stack 8

ntdll!ZwQueryVirtualMemory

psapi!QueryWorkingSetEx

sqlservr!BPool::Shrink

sqlservr!BPool::ReleaseAwayBufs

sqlservr!BPool::LazyWriter

sqlservr!lazywriter

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

967908 Cumulative update package 13 for SQL Server 2005 Service Pack 2 or 970279 Cumulative update package 4 for SQL Server 2005 Service Pack 3

Stack 9

sqlservr!LatchBase::ReleaseInternal

sqlservr!XVB::GetRecord

sqlservr!RowsetVersionScan::GetData

sqlservr!CQScanRowsetNew::GetRowWithPrefetch

sqlservr!CQScanRowsetNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::InitForExecute

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CExecuteStatement::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecStr::XretExecStrExecute

sqlservr!CXStmtExecStr::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB : 949595 FIX: Error message when you run a query that uses a join condition in SQL Server 2005: "Non-yielding Scheduler"

Stack 10

sqlservr!SQLServerLogIter::LookupScanCache

sqlservr!SQLServerLogIterForward::GetNextBlock

sqlservr!SQLServerLogIterForward::GetNext

sqlservr!LsMgr::GetEndOfLog

sqlservr!LsMgr::ProcessInternalRollForward

sqlservr!LsWorkRequest::Execute

sqlservr!LsWorker::ThreadRoutine

sqlservr!LsWorker::ThreadRoutine

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 970044 FIX: Error message when you try to break database mirroring between two servers that are running SQL Server 2008: "Non-yielding Scheduler"

Stack 11

sqlservr!CLinkedMap

sqlservr!CCheckReadersAndWriters::Release

sqlservr!CMainIlb::~CMainIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 12

ntdll!ZwFreeVirtualMemory

KERNELBASE!VirtualFree

sqlservr!MemoryNode::VirtualFree

sqlservr!ReservedMemBlock::FreeMemory

sqlservr!MultiPageAllocator::FreePagesInternal

sqlservr!MultiPageAllocator::FreePages

sqlservr!MemoryNode::FreePagesInternal

sqlservr!MemoryClerkInternal::FreePagesInline

sqlservr!CVarPageMgr::Release

sqlservr!CMemObj::Free

sqlservr!CMemThread<CMemObj>::Free

sqlservr!LockBytesSS::~LockBytesSS

sqlservr!LockBytesHolder::`scalar deleting destructor’

sqlservr!LockBytesHolder::DestroyCallback

sqlservr!CacheLbss

sqlservr!LockBytesSS::Release

sqlservr!CQueryIlb::~CQueryIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 13

sqlservr!CompareStringWEnglishNoCase

sqlservr!CTypeInfo::ICompW

sqlservr!CDefaultCollation::ICompW

sqlservr!CDependElem::ICompare

sqlservr!CDependList::Find

sqlservr!CDependList::Insert

sqlservr!CDependList::Concat

sqlservr!CDependList::CollectDependencies

sqlservr!FillSysdepends

sqlservr!CProchdr::CreateProc

sqlservr!CSQLSource::PerformPphFakeExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2306162 FIX: Poor performance and some occasional non-yielding scheduler errors occur when you create a complex view that references a large amount of nested views or tables in SQL Server 2008 or in SQL Server 2008 R2

Stack 14

sqlservr!BaseSharedHoBt::GetHoBtId

sqlservr!HoBtFactory::GetDeferredDropCacheHobt

sqlservr!DropDeferredWorkTables

sqlservr!GhostRecordCleanupTask

sqlservr!CGhostCleanupTask::ProcessTskPkt

sqlservr!TaskReqPktTimer::ExecuteTask

sqlservr!OnDemandTaskContext::ProcessTskPkt

sqlservr!SystemTaskContext::ExecuteFunc

sqlservr!SystemTaskEntryPoint

sqlservr!OnDemandTaskContext::FuncEntryPoint

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SchedulerManager::FiberEntryPoint

kernel32!BaseFiberStart

kernel32!RtlCompareMemoryStub

KB 2505256 FIX: Poor performance when worktables that are marked for deferred drop are cleaned up in SQL Server 2008 R2

Stack 15

ntdll!ZwReadFile

kernel32!ReadFile

sqlservr!DiskReadAsync

sqlservr!FCB::AsyncRead

sqlservr!BackupIoRequest::StartDatabaseRead

sqlservr!BackupCopyMachine::CopyFileToBackupSet0

sqlservr!BackupCopyMachine::CopyFileToBackupSet

KB 960543 FIX: SQL Server 2005 or SQL Server 2008 may stop responding when you are performing a backup

Stack 16

sqlservr!Worker::ProfilingCPUTicks::ProfilingCpuTicksCallback

sqlservr!SOS_Scheduler::TaskTransition

sqlservr!SOS_Scheduler::Switch

sqlservr!SOS_Scheduler::SuspendNonPreemptive

sqlservr!SOS_Scheduler::Suspend

sqlservr!SOS_Task::Sleep

sqlservr!BTreeMgr::Seek

sqlservr!BTreeMgr::GetHPageIdWithKey

sqlservr!IndexPageManager::GetPageWithKey

sqlservr!GetRowForKeyValue

sqlservr!IndexRowScanner::EstablishInitialKeyOrderPosition

sqlservr!IndexDataSetSession::GetNextRowValuesInternal

sqlservr!RowsetNewSS::GetNextRows

sqlservr!CMEDScan::FGetRow

sqlservr!CMEDCatalogOwner::GetOwnerAliasIdFromSid

sqlservr!CMEDCatalogOwner::LookupPrimaryIdInCatalog

sqlservr!CMEDCacheEntryFactory::GetProxiedCacheEntryByAltKey

sqlservr!CMEDCatalogOwner::GetProxyOwnerBySID

sqlservr!CMEDProxyDatabase::GetOwnerBySID

sqlservr!GetDefaultSchemaIdCrossDb

sqlservr!GetCtxtSchemaId

sqlservr!CMEDAccess::GetMultiNameObject

sqlservr!CRangeObject::CImplName::FSameObject

sqlservr!CRangeObject::FCheckImplNames

sqlservr!CRangeObject::XretPostSchemaChecks

sqlservr!CRangeObject::XretSchemaChanged

sqlservr!CRangeTable::XretSchemaChanged

sqlservr!CEnvCollection::XretSchemaChanged

sqlservr!CXStmtQuery::XretSchemaChanged

sqlservr!CXStmtSelect::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

KB 2699013 FIX: SQL Server 2008 R2 or SQL Server 2008 stops responding and a "Non-yielding Scheduler" error is logged

Stack 17

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

KB 967169 FIX: When you run an UPDATE statement against a table that has a FOR UPDATE trigger that joins the DELETED and INSERTED tables, the query takes a long time to finish

Stack 18

msvcr80!memcpy

BackupString::vswcatf

BackupString::swcatf

BackupHistory::GenerateBackupDetails

sqlservr!BackupHistory::GenerateBackupSet

KB 917971 FIX: You may receive more than 100,000 page faults when you try to back up a SQL Server 2005 database that contains hundreds of files and file groups.

Stack 20

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

Stack 21

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

 

Related blogs: 

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

SQL Server Latch & Debugging latch time out

How to Analyze "Deadlocked Schedulers" Dumps

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group to get answers for all your SQL Server related questions.

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Debugging, Performance, SQL General, SQL Server Tools | Tagged: , , , , , , , , , , , , , , , , , , , , , , , , , , , | 117 Comments »

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

Posted by Karthick P.K on August 17, 2012

Note: If you are interested only in finding a quick resolution for  Non-Yielding scheduler or Non-yielding IOCP Listener dumps or  Non-yielding  resource monitor Jump to THIS LINK. Continue reading this article if you like to learn How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps

How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps?

This blog is targeted towards analyzing Non-Yielding scheduler dumps and not targeted on explaining how Non-Yield Detection works please read http://technet.microsoft.com/en-us/library/cc917684.aspx to understand how the Non-Yield Detection works but let us recollect few key points before we get in to analysis.

 

1.   SQL Server has its own logical schedulers to schedule the SQL Server workers.

2.   The scheduler is called the User Mode Scheduler (UMS) in SQL Server 2000 and the SQL Server Operating System (SOS) Scheduler in SQL Server 2005

3.   Logical scheduler makes the worker non-preemptive to the database engine. The worker owns the scheduler until it yields to another worker on the same scheduler.

 

What if the threads which owns the scheduler executes for long time without yielding (or) forever and does not yield to give quantum for the other threads waiting in the scheduler?

Answer: Other threads would not get CPU cycles and starve the SQL Server performance.

 

What if the thread is not able to finish its work with in quantum –(4 Milliseconds) for example large for loop?

SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) is expected to yield back to SQL Server scheduler when it exceeds 4ms and rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum.

 

What if the thread did not yield after 4 Milliseconds?

SQL Server has its scheduler monitor to track this. SchedulerMonitor algorithm is to check non-Yield condition every 5 seconds during which the basic check (Check if the thread is executing for >4Ms) is done . When the basic check evaluates to true, tracking of the worker begins and if the thread doesn’t yield beyond 10seconds (Nonyield threshold) after the tracking begins then threshold check becomes true. So there is approximately 15 seconds between the time of the last yield on the scheduler and the time that the threshold check becomes true and tracking continues.

 

A dump is taken when an specific nonyield situation has reached 60 seconds in total duration. Once a 17883 mini-dump is captured, no further 17883 mini-dumps are captured until trace flag -T1262 is enabled or the SQL Server process is restarted. However, 17883 error message reporting continues, regardless of the mini-dump capture.  Also when –T1262 is enabled mini-dump is captured when the Non-Yield threshold check becomes true (15 seconds)

and at subsequent 60-second intervals for the same nonyield occurrence. A new nonyielding occurrence causes dump captures to occur again.

When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

Non-yielding IOCP Listener

An identical algorithm is used to detect non-yielding I/O completion routines, counting completed I/O completion routines instead of number of yields. Scheduler Monitor takes a dump when it notices the IOCP has not moved for 10 seconds. Analyzing Non-yielding IOCP Listener is also same as analyzing non-yielding scheduler dump

 

 

Let us step in to analysis of  non-yielding scheduler dump which I got in SQL Server 2012

 

Sample 1

When a non-yielding scheduler dump is generated following error message is logged in SQL Error log and SQLDump000n.mdmp is generated in log folder.

 

{

    * *******************************************************************************

    *

    * BEGIN STACK DUMP:

    *   04/16/12 10:09:58 spid 6256

    *

    * Non-yielding Scheduler

    *

     * *******************************************************************************

Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0. Thread creation time: 12979065797278. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 70110 ms.

}

 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Microsoft (R) Windows Debugger Version 6.11.0001.404 X86

Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Users\karthick \Desktop\Karthick\SQLDump0009.mdmp]

User Mini Dump File: Only registers, stack and portions of memory are available

 

Comment: ‘Stack Trace’

Comment: ‘Non-yielding Scheduler’  èType of the dump

Symbol search path is: *** Invalid ***

 

 

Executable search path is:

Windows 7 Version 7601 (Service Pack 1) MP (24 procs) Free x64

Product: Server, suite: Enterprise TerminalServer SingleUserTS  à Windows version and system information

Machine Name:

Debug session time: Mon Apr 16 09:09:59.000 2012 (GMT-7)

System Uptime: 9 days 15:57:03.155

Process Uptime: 0 days 0:06:48.000

……………………………………………………….

……………………………..

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

0:146> lmvm sqlservr                                               

start             end                 module name

00000000`ffad0000 00000000`ffb0e000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21553ADC31784A4D933974A386EE2E052\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.S1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Apr 06 08:19:38 2012 (4F7F098A)

    CheckSum:         00036498

    ImageSize:        0003E000

    File version:     2011.110.2316.0                                                         

    Product version:  11.0.2316.0 èSQL Server Version

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5:

Use !findstack command to find scheduler monitor thread (sqlservr!SQL_SOSNonYieldSchedulerCallback )

 

 

0:146> !findstack sqlservr!SQL_SOSNonYieldSchedulerCallback

Thread 006, 1 frame(s) match  è Thread ID of scheduler monitor.

        * 07 00000000336be420 000007fee36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

 

Step 6:

Switch to scheduler monitor thread using ~[threadID]s command

 

0:146> ~[006]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

 

Step 7:

Use kC or kP command to look at the stack on scheduler monitor thread.

 

0:006> kP                        Child-SP          RetAddr           Call Site

00000000`3369c218 000007fe`fcd210ac ntdll!NtWaitForSingleObject+0xa

00000000`3369c220 00000000`ffaeecce KERNELBASE!WaitForSingleObjectEx+0×79

00000000`3369c2c0 00000000`ffaef1a4 sqlservr!CDmpDump::DumpInternal+0x20e

00000000`3369c360 000007fe`dbe50794 sqlservr!CDmpDump::Dump+0×24

00000000`3369c3a0 000007fe`dbe511e6 sqllang!SQLDumperLibraryInvoke+0x2e4

00000000`3369c640 000007fe`dbe16ddb sqllang!CImageHelper::DoMiniDump+0×426

00000000`3369c830 00000000`ffae307f sqllang!stackTrace+0xbdb

00000000`3369e270 000007fe`e36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

00000000`336be430 000007fe`e36866da sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0×375

00000000`336bebf0 000007fe`e364b53f sqldk!SchedulerMonitor::CheckScheduler+0×307

00000000`336bed60 000007fe`e364aa8f sqldk!SchedulerMonitor::CheckSchedulers+0×211

00000000`336bf1f0 000007fe`e371c779 sqldk!SchedulerMonitor::Run+0xfb

00000000`336bf320 000007fe`e3642f10 sqldk!SchedulerMonitor::EntryPoint+0×9

00000000`336bf350 000007fe`e3642d04 sqldk!SOS_Task::Param::Execute+0x21e

00000000`336bf950 000007fe`e36429e6 sqldk!SOS_Scheduler::RunTask+0xa8

00000000`336bf9c0 000007fe`e365a29f sqldk!SOS_Scheduler::ProcessTasks+0×299

00000000`336bfa40 000007fe`e365a3b0 sqldk!SchedulerManager::WorkerEntryPoint+0×261

00000000`336bfae0 000007fe`e3659fcf sqldk!SystemThread::RunWorker+0x8f

00000000`336bfb10 000007fe`e365aaf8 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8

00000000`336bfbc0 00000000`76ad652d sqldk!SchedulerManager::ThreadEntryPoint+0×236

 

Step 8:

Switch to the thread which is reported as Non-Yielding in SQL Server error log using ~~[ThreadID]s command.

 

Recollect the below error in SQL error log.  Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0.

 

0:006> ~~[0x1cb0]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

Step 9:

Look at the current stack of Non-yielding thread. using kc command

 

0:146> kc 10

Call Site

ntdll!NtWaitForSingleObject

KERNELBASE!WaitForSingleObjectEx

sqldk!SOS_Scheduler::SwitchContext

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqldk!WorkDispatcher::DequeueTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

 

 

 

 

Recollect the information which we discussed earlier in this blog. When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

 

 

Look at the above stack it cannot be Non-Yield thread because we see SwitchNonPreemptive  and SwitchContex  in the thread. 

{

SwitchPreemptive or SuspendNonPreemptive forces another worker to become owner of the scheduler. It does this by making the head of the runnable list the new owner and removing the current worker from logical scheduler control. The worker transitions ownership and is removed from SQL scheduler control until the external activity is complete. When the external activity is complete, the worker returns to the end of the runnable list by calling SwitchNonPreemptive.

}

 

Step 10:

Search for the copied stack structure using X commad

 

0:146> X sqlmin!*copiedStack*    

000007fe`df11bfe0 sqlmin!g_copiedStackInfo = <no type information>               

 

It is sqlmin!g_copiedStackInfo in this dump because this is dump is from SQL Server2012. In earlier versions of SQL Server it is sqlservr!g_copiedStackInfo

 

Step 11:

We know copied CONTEXT is stored in g_CopiedStackInfo how to find the valid offset in this structure ? If the CONTEXT is valid Rip,Rbp and RSP registers has to be valid if the dump is from X64 system and Eip,Ebp and Esp has to be valid if it is X86 system.

 

 

Let us do dd on sqlmin!g_copiedStackInfo  (remember it is sqlservr!g_copiedStackInfo in SQL2008/2005/2000)

 

0:146> dd sqlmin!g_copiedStackInfo     

000007fe`df11bfe0  00000001 00000000 3369e2e0 00000000

000007fe`df11bff0  0000a998 00000000 00000000 00000000

000007fe`df11c000  00000000 00000000 00000000 00000000

000007fe`df11c010  00000000 00000000 00000000 00000000

000007fe`df11c020  00000000 00000000 00000000 00000000

000007fe`df11c030  0010000b 00001f80 00000033 00000000

000007fe`df11c040  002b0000 00000246 00000000 00000000

000007fe`df11c050  00000000 00000000 00000000 00000000

 

Step 12:

Let us dump each address with context and verify if Rip,Rbp and RSP registers are valid. This dump is from 64-bit SQL Server so we are using Rip,Rbp and RSP registers. If the dump is from x86 system use Eip,Ebp and Esp .

 

0:146> dt 000007fe`df11bfe0 CONTEXT Rip Rsp Rbp     èType cast  000007fe`df11bfe0  with CONTEXT  .  RsP,Rbp and Rip is invalid so this is not valid offset.

ole32!CONTEXT

   +0×098 Rsp : 2

   +0x0a0 Rbp : 0x5a4d

   +0x0f8 Rip : 0

0:146> dt 000007fe`df11bff0 CONTEXT Rip Rsp Rbp      è Type cast 000007fe`df11bff0  with CONTEXT  . RIP is invalid

ole32!CONTEXT

   +0×098 Rsp : 0×72120000

   +0x0a0 Rbp : 0x3369e3cc

   +0x0f8 Rip : 0xf2

0:146> dt 000007fe`df11c000 CONTEXT Rip Rsp Rbp   è Type cast 000007fe`df11c000 with CONTEXT  . RIP ,RSP  and Rbp is valid. So this should be a valid context.Let us switch to this context and verify

ole32!CONTEXT

   +0×098 Rsp : 0x3369e2e0

   +0x0a0 Rbp : 0x3369e498

   +0x0f8 Rip : 0x76d3139a

 

Now we know 000007fe`df11c000 is valid context.  So 000007fe`df11c000 -sqlmin!g_copiedStackInfo =0×20 hence  we can use .cxr sqlmin!g_copiedStackInfo+0X20 directly to switch to the context of copied stack.

 

Step 13:

 

Switch the context of copied stack using .cxr 000007fe`df11c000  or .cxr sqlmin!g_copiedStackInfo+0X20

 

0:146> .cxr 000007fe`df11c000

rax=0000000000000002 rbx=000000003369e3cc rcx=0000000000005a4d

rdx=0000000072120000 rsi=000000000000026c rdi=0000000000000000

rip=0000000076d3139a rsp=000000003369e2e0 rbp=000000003369e498

r8=00000000000000b0  r9=0000000084a85310 r10=0000000000000000

r11=0000000000000000 r12=0000000000000000 r13=0000000000000004

r14=00000000000000f2 r15=0000000000000001

iopl=0         nv up ei pl zr na po nc

cs=0033  ss=002b  ds=0000  es=0000  fs=0000  gs=0000             efl=00000246

ntdll!NtWriteFile+0xa:

00000000`76d3139a c3              ret

 

Step 14:

 

Dump the stack of copied context using kP or kc (kc displays clean stack trace. each display line includes only the module name and the function name)

 

 

0:146> Kc

  *** Stack trace for last set context – .thread/.cxr resets it

Call Site

ntdll!NtWriteFile

KERNELBASE!WriteFile

kernel32!WriteFileImplementation

sqllang!CErrorReportingManager::WriteToErrLog

sqllang!CErrorReportingManager::SendErrorToErrLog

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqllang!ReportLoginFailure

sqllang!FRedoLogin

sqllang!login

sqllang!process_login_finish

sqllang!process_commands

sqldk!SOS_Task::Param::Execute

sqldk!SOS_Scheduler::RunTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

Now compare the current stack and the copied stack to see if the thread has progressed after No-Yield condition. Stack look completely different So the Non-Yield thread has progressed and completed .It is doing new work now. Also to understand why the thread was Non-Yielding look at the copied stack and not the current unless both the stacks are same. 

 

 

Current thread stack which we dumped using the thread ID in SQL Errorlog.

0:146> kc                                                                                    

Copied thread stack which SQL Server copied to global structure before generating the dump.

0:146> Kc 10

ntdll!NtWaitForSingleObject

KERNELBASE!WriteFile

KERNELBASE!WaitForSingleObjectEx

kernel32!WriteFileImplementation

sqldk!SOS_Scheduler::SwitchContext

sqllang!CErrorReportingManager::WriteToErrLog

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqllang!CErrorReportingManager::SendErrorToErrLog

sqldk!WorkDispatcher::DequeueTask

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqldk!SOS_Scheduler::ProcessTasks

sqllang!ReportLoginFailure

sqldk!SchedulerManager::WorkerEntryPoint

sqllang!FRedoLogin

sqldk!SystemThread::RunWorker

sqllang!login

sqldk!SystemThreadDispatcher::ProcessWorker

sqllang!process_login_finish

sqldk!SchedulerManager::ThreadEntryPoint

sqllang!process_commands

kernel32!BaseThreadInitThunk

sqldk!SOS_Task::Param::Execute

ntdll!RtlUserThreadStart

sqldk!SOS_Scheduler::RunTask

 

sqldk!SOS_Scheduler::ProcessTasks

 

sqldk!SchedulerManager::WorkerEntryPoint

 

sqldk!SystemThread::RunWorker

 

sqldk!SystemThreadDispatcher::ProcessWorker

 

sqldk!SchedulerManager::ThreadEntryPoint

 

Now let us read the copied stack and understand what would have caused a Non-Yield condition (read from bottom to top)

 

ntdll!NtWriteFile  -> WriteFile function is at top of the stack and did not complete in expected time.          
KERNELBASE!WriteFile       
kernel32!WriteFileImplementation 
sqllang!CErrorReportingManager::WriteToErrLog  ->Write the error to errorlog
sqllang!CErrorReportingManager::SendErrorToErrLog  ->Send the error to SQL Server errorlog
sqllang!CErrorReportingManager::CwchFormatAndPrint  ->format the error
sqllang!ReportLoginFailure  ->Login failed
sqllang!FRedoLogin        
sqllang!login ->Login task is processed

 

 

From the above stack we are able to understand SQL Server is writing login failed information to SQL Error log (Synchronously) and the writefile function has taken long time and did not return.So there is Non-Yield scheduler dump.

 

When will writefile operation take long time?

When there is Disk bottleneck. So the obvious solution for this issue is to  fix the performance of the disk.

Similarly there could be numerous other reasons for Non-Yield condition so look at the stack of your Non-Yield scheduler dump using the method above and make out what could have caused the Non-Yield condition.

Also refer THIS LINK to check if your stack matches with any of the known issues in SQL Server.

To Be continued…………………………

Related posts:

  • How to Analyze "Deadlocked Schedulers" Dumps?
  • Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes
  • SQL Server generated Access Violation dumps while accessing oracle linked servers.
  • SQL Server Latch & Debugging latch time out

    If you liked this post, do like us on Face Book at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

  • Posted in Configuration, Debugging, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , , | 16 Comments »

    SQL Server generated Access Violation dumps while accessing oracle linked servers.

    Posted by Karthick P.K on August 8, 2012

     

    When you run queries against Oracle linked servers from SQL Server you see errors like one below and access violation dumps are generated (SQLDump00XX.mdmp files in SQL Server error log folder).

    {

    External dump process returned no errors.
    Using ‘dbghelp.dll’ version ’4.0.5′
    SqlDumpExceptionHandler: Process 510 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.
    * *******************************************************************************
    *
    * BEGIN STACK DUMP:
    *  Exception Address = 000000007752485C Module(ntdll+000000000002285C)

    *   Exception Code    = c0000005 EXCEPTION_ACCESS_VIOLATION

    *   Access Violation occurred reading address 0000041EA9AE2EF0

    * Input Buffer 510 bytes -

    }

     

    To analyze the dump download and Install Windows Debugger from This  link

    1. Open Windbg

    2. Choose File menu –> select Open crash dump –>Select the Dump file
    (SQLDump000#.mdmp)

    3. on command window type    
    .sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

    4. Type .reload /f and hit enter. This will force debugger to immediately
    load all the symbols.

    5. Type .ecxr

    6. Type  kL    and look at the stack

     

    {

    ntdll!RtlpFreeUserBlock
    ntdll!RtlFreeHeap

    }

    If you see above frames in the top of the stack and if you are using Oracle Provider for OLE DB – Version: 11.2.0.1 and later.

    There is a known issue with Oracle Provider for OLE DB – Version: 11.2.0.1 and later   when  — STYLE COMMENTS are used in linked server queries ,SP’s Etc .

    Resolution

    1. Remove the – -style comments
    OR
    2) use /* */ for the comments instead of –

     

    If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

     

    Regards

    Karthick P.K

    Posted in Connectivity, Debugging, SQL Server Engine | Tagged: , , , , | 4 Comments »

    Service pack ,Hotfix and CU installation for SQL Server 2005 might fail with “Unable to install Windows Installer MSI file“

    Posted by Karthick P.K on July 18, 2012

    Installation of SQL Server 2005 Service pack ,Hotfix and CU  would fail fails with “Unable to install Windows Installer MSI file“ and message box “ A recently applied update failed to install”

    You will find below errors in setup logs.

    Summary.txt:
    Product Installation Status
    Product                   : Setup Support Files
    Product Version (Previous):
    Product Version (Final)   :
    Status                    : Failure
    Log File                  : C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\Redist9_Hotfix_KBxxxx_SqlSupport.msi.log
    Error Number              : 1635
    Error Description         : Unable to install Windows Installer MSI file
    ———————————————————————————-

    Hotfix.log:-

    Copy Engine: Creating MSI install log file at: C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix\Redist9_Hotfix_KBXXXXX_SqlSupport.msi.log
    Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
    Registry: Cannot read registry key value "Debug", error 2
    Registry: Opened registry key "Software\Policies\Microsoft\Windows\Installer"
    Registry: Cannot read registry key value "Debug", error 0
    Copy Engine: Error, unable to install MSI file: C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\Cache\SQLSupport\x64\1033\SqlSupport.msi
    The following exception occurred: Unable to install Windows Installer MSI file  Date: 05/04/2010 13:31:36.698  File: \depot\sqlvault\stable\setupmainl1\setup\sqlse\sqlsedll\copyengine.cpp  Line: 1429
    Watson: Param1 = Unknown

    SQLSupport.msi.log

    MSI (s) (24:78) [ ]: MainEngineThread is returning 1635

    This patch package could not be opened.  Verify that the patch package exists and that you can access it, or contact the application vendor to verify that this is a valid Windows Installer patch package.

    C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\Cache\SQLSupport\x86\1033\SqlSupport.msi

    MSI (c) (3C:14) [ ]: Decrementing counter to disable shutdown. If counter >= 0, shutdown will be denied.  Counter after decrement: -1

    MSI (c) (3C:14) [ ]: MainEngineThread is returning 1635

     

     

    Resolution:

    1. Uninstall SQL Server setup support files from add or remove programs (You will get a warning:26002 The following products depend on Microsoft SQL Server 2005 …..) You can ignore the warning and continue to uninstall SQL Server setup support files.

    2.  Unzip (or) Extract the installer (Service pack /Hotfix / CU) using /X parameter from command prompt (Example: d:>SQLServer2005SP3-KB955706-x86-ENU.exe /x).

    3. Invoke hotfix.exe and complete the upgrade.

     

    Hope this helps.

    Thank you,

    Karthick P.K |Technical Lead | Microsoft SQL Server Support  |My Facebook Page |My Site| Blog space| Twitter

    Posted in SQL Cluster Setup, SQL Server Setup | Tagged: , , , , , | 3 Comments »

    A significant part of SQL Server process memory has been paged out

    Posted by Karthick P.K on June 27, 2012

    When you get “A significant part of SQL Server process memory has been paged out. This may result in performance degradation.This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 2007640, committed (KB): 4594040, memory utilization: 43%.” message in SQL Server error log

    Pay attention to Working set (KB), committed (KB) and memory utilization:% (Percentage of SQL Server memory in RAM) in above warning message. Above warning message is logged in SQL Server error log when working set reaches 50% or  below of the overall committed memory by SQL Server memory manager.

     

    What is working set: Memory allocated by the process which is currently in RAM.

     

    Committed: Total memory that is allocated by process (allocated bytes can be in RAM or Page file)

     

    Working Set trimming:  Windows is moving the allocated bytes of the process from physical RAM to page file because of memory pressure. Memory pressure is most commonly caused by applications or windows components that are requesting more memory causing OS to start trimming working set of other processes to satisfy these new requests.

     

    Before we step in to troubleshooting working set trimming warnings, here are few basics about how SQL Server memory management is designed to dynamically adjust the committed memory based on the amount of available memory on the system.

    SQL Server uses CreateMemoryResourceNotification to create a memory resource notification object  and SQL Server Resource monitor threads calls QueryMemoryResourceNotification every time it runs to identify if there is any notification. If a low memory notification comes from Windows, SQL Server scales down its memory usage and when Windows sends the high memory notification, SQL Server Server can grow its memory usage target. Low memory notification is signaled by windows when the available physical memory drops approximately below 32 MB per 4 GB, to a maximum of 64 MB. The default level that signals a high-memory-resource notification event is three times the default low-memory value. As soon as the SQL Server resource monitor threads finds low-memory-resource notification it scales down SQL Server memory usage.

     

    Why do I see “A significant part of sql server process memory has been paged out. This may result in performance degradation.” By SQL Server In spite of having above mechanism to detect the system level memory pressure and scale SQL Server memory?

     

    There are couple of situations where SQL Server Process working set might be paged out by Windows despite these memory resource notification mechanism.

    1.If windows is not sending the correct notifications to all listening processes at the right moment and thresholds

    2.If SQL Server is not responding fast enough to the low memory resource notification from Windows

    3.When low physical memory notification is received by SQL Server it will scale down its memory usage by trimming internal caches. This effect is applied for 5 seconds and then paused for 1 minute. This is to avoid any hungry/faulty application to consume all the memory making SQL Server to  scale its usage continuously. If low memory conditions still exist after 1 minute pause, the effect is applied again. So if there is physical memory pressure even after SQL Server scaling its usage for 5 seconds windows will still page out SQL Server’s working set.

    4.Conditions in Windows where working sets of all processes are trimmed

    5.Windows might decide to trim a certain percentage of working set of various or specific processes

     

     

    We can also increase the LowMemoryThreshold value so the OS will notify applications such as SQL on low memory conditions much earlier and SQL Server can respond to memory pressure much early before the system is starving for memory.

     

    How to set the LowMemoryThreshold value (in MB)?

     

    In Regedit -> go to

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\SessionManager\MemoryManagement

    Right click on the right pane,

    Select New -> select click DWORD Value -> enter LowMemoryThreshold

    Double Click LowMemoryThreshold -> value (choose decimal) -> 512

    System Reboot is required to take effect.

     

    Default values as per MSDN:

    “The default level of available memory that signals a low-memory-resource notification event is approximately 32 MB per 4 GB, to a maximum of 64 MB. The default level that signals a high-memory-resource notification event is three times the default low-memory value.”

     

     

    We can use the below query to extract information about the condition of OS memory and SQL memory using a query like the following. Looking at this query, you will be able to easily determine the various indicators that would have triggered the Windows to page various processes including SQL Server. Use the following query to obtain the memory notification-related information from the XML data of the ring buffer

    SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
    dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as [Notification_Time],
    cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') AS [Notification_type],
    cast(record as xml).value('(//Record/MemoryRecord/MemoryUtilization)[1]', 'bigint') AS [MemoryUtilization %],
    cast(record as xml).value('(//Record/MemoryNode/@id)[1]', 'bigint') AS [Node Id],
    cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') AS [Process_Indicator],
    cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') AS [System_Indicator],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect/@type)[1]', 'varchar(30)') AS [type],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect/@state)[1]', 'varchar(30)') AS [state],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect/@reversed)[1]', 'int') AS [reserved],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect)[1]', 'bigint') AS [Effect],
    
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@type)[1]', 'varchar(30)') AS [type],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@state)[1]', 'varchar(30)') AS [state],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@reversed)[1]', 'int') AS [reserved],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect)[2]', 'bigint') AS [Effect],
    
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@type)[1]', 'varchar(30)') AS [type],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@state)[1]', 'varchar(30)') AS [state],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@reversed)[1]', 'int') AS [reserved],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect)[3]', 'bigint') AS [Effect],
    
    cast(record as xml).value('(//Record/MemoryNode/ReservedMemory)[1]', 'bigint') AS [SQL_ReservedMemory_KB],
    cast(record as xml).value('(//Record/MemoryNode/CommittedMemory)[1]', 'bigint') AS [SQL_CommittedMemory_KB],
    cast(record as xml).value('(//Record/MemoryNode/AWEMemory)[1]', 'bigint') AS [SQL_AWEMemory],
    cast(record as xml).value('(//Record/MemoryNode/SinglePagesMemory)[1]', 'bigint') AS [SinglePagesMemory],
    cast(record as xml).value('(//Record/MemoryNode/MultiplePagesMemory)[1]', 'bigint') AS [MultiplePagesMemory],
    cast(record as xml).value('(//Record/MemoryRecord/TotalPhysicalMemory)[1]', 'bigint') AS [TotalPhysicalMemory_KB],
    cast(record as xml).value('(//Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [AvailablePhysicalMemory_KB],
    cast(record as xml).value('(//Record/MemoryRecord/TotalPageFile)[1]', 'bigint') AS [TotalPageFile_KB],
    cast(record as xml).value('(//Record/MemoryRecord/AvailablePageFile)[1]', 'bigint') AS [AvailablePageFile_KB],
    cast(record as xml).value('(//Record/MemoryRecord/TotalVirtualAddressSpace)[1]', 'bigint') AS [TotalVirtualAddressSpace_KB],
    cast(record as xml).value('(//Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [AvailableVirtualAddressSpace_KB],
    cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
    cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
    cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time],
    tme.ms_ticks as [Current Time]
    FROM sys.dm_os_ring_buffers rbf
    cross join sys.dm_os_sys_info tme
    where rbf.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR' --and cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') = 'RESOURCE_MEMPHYSICAL_LOW'
    ORDER BY rbf.timestamp ASC
    

    We can use below query to to check the health of SQL Server including SQL Server working set information in past

    SELECT  CONVERT (varchar(30), GETDATE(), 121) as runtime, DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE()) AS Notification_time,    a.* , sys.ms_ticks AS [Current Time]
    FROM   (SELECT x.value('(//Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS [ProcessUtilization],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle %],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/UserModeTime) [1]', 'bigint') AS [UserModeTime],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/KernelModeTime) [1]', 'bigint') AS [KernelModeTime],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/PageFaults) [1]', 'bigint') AS [PageFaults],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/WorkingSetDelta) [1]', 'bigint')/1024 AS [WorkingSetDelta],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/MemoryUtilization) [1]', 'bigint') AS [MemoryUtilization (%workingset)],
    x.value('(//Record/@time)[1]', 'bigint') AS [Record Time]  FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers
    WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR') AS R(x)) a  CROSS JOIN sys.dm_os_sys_info sys ORDER BY DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE())
    
    • Process Utilization shows the percentage of overall used system CPU time that was consumed by sqlservr.exe. Process Utilization is calculated as sql_process_cpu_time/total_system_cpu_time for the current time interval.
    • SystemIdle is the percentage of time that the system’s CPU’s have been idle.
    • Page Faults value is the number of hard + soft page faults that have occurred since the last snapshot.
    • Working Set is the change in the working set size  in KB, since the last snapshot.
    • Memory Utilization is working set/committed memory which is another way of saying that the number is the percentage of the process’ committed memory that is in RAM.  The farther below 100% that this number falls, the larger the percentage of SQL memory that has been trimmed and moved to the page file.

    Common Side Effects of Working set Trimming

    1.  When OS starts trimming the working set of SQL Server we would see drastic performance drop, increased I/O,non-yielding Resource Monitor / scheduler dumps etc..

    2.  IS-alive check failures resulting in SQL Server resource failure.

    3. Resource monitor thread can start  Garbage collector  if SQLCLR is enabled on this instance of SQL. When Garbage collector  is kicked off during memory pressure all other threads in the process are suspended. So if Garbage collector is taking a long time reosurce monitor thread appears stuck and hence the non-yielding errors and dumps are generated. (Refer http://support.microsoft.com/kb/2504603)

    How to troubleshoot?

    1.  Capture perfmon counters (Process: Private bytes and Working set ) to determine which applications / windows component are requesting memory and causing OS to start trimming the working set of processes including SQL Server.

    2.  Use This exe which will print the memory information of all the processes and system wide memory information (Global memory status) when the operating system signals  low memory notification.

     

    3.  Cap the SQL Server MAX Server Memory after considering the memory required by other applications, Operating system, Drivers , SQL Server Non- Bpool allocations etc. Make sure you have adequate available physical memory even when the system is under heavy load.

     

    4.  We can consider using the Lock pages in memory privilege. Remember it protects only the BPool from paging and Non-Bpool allocations can still be paged out.

    If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group  https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

     

    http://mssqlwiki.com/sqlwiki/sql-performance/basics-of-sql-server-memory-architecture/

    http://mssqlwiki.com/sqlwiki/sql-performance/windows-2008-and-windows-2008-r2-known-issues-related-to-working-set-memory/

    http://mssqlwiki.com/2012/05/18/sql-server-performance-degraded-in-32-bit-sql-server-after-i-adding-additional-ram/

    http://mssqlwiki.com/sqlwiki/sql-performance/troubleshooting-sql-server-memory/

    http://mssqlwiki.com/sqlwiki/sql-performance/io-bottlenecks/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Posted in Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: , , , , , , , , , , , , , ,