MSSQLWIKI

Karthick P.K on SQL Server

Archive for the ‘Performance’ Category

Tempdb latch contention

Posted by Karthick P.K on September 17, 2013

You might see Page latch contention in tempdb when you repeatedly drop and create TempDb objects (Temp tables, table variables etc.).

When you notice PAGELATCH_* contention on tempdb (Wait resource in sysprocesses starts with 2: ) check if the latch wait is on PFS,GAM or SGAM page. When there is latch contention on tempdb you will see lot of sessions waiting on Pagelatch_* similar to one below.

In the below output session is waiting on resource 2:15:121320 . If we decode the wait resource it is 2: database id of tempdb ,  15: file number , 121320 is page number. 121320 is in multiple of 8088 so it is a PFS page, similarly identify if the page we are waiting is GAM or SGAM page if it is not PFS page.

Wait type                            Wait resource

PAGELATCH_UP               2:15:121320

PAGELATCH_UP               2:15:121320

How to identify if page is PFS,GAM or IAM?

PFS Page: A PFS page occurs once in 8088 pages. SQL Server will attempt to place a PFS page on the first page of every PFS interval(8088Pages). The only time a PFS page is not the first page in its interval is in the first interval for a file. File header page is first, and the PFS page is second. (Page ID starts from 0 so the first PFS page is at Page ID 1). If (page number)/8088 is round value then the page is PFS page.

GAM Page: GAM page is page 2 in the data file, next GAM page is placed at 511230 Page after first GAM page (GAM interval). If (page number-1)/511230 is round value then the page is GAM page.

SGAM Page: SGAM page is page 3 in data file , next SGAM page is placed at 511230 Page after first SGAM page. If (page number-2)/511230 is round value then the page is GAM page.

clip_image002

How to resolve?

1. Increase the number of TEMPDB data files files and size them equally. As a general rule, if the number of logical processors is less than or equal to 8, use the same number of data files as logical processors. If the number of logical processors is greater than 8, use 8 data files and then if contention continues further increase the number of data files by multiples of 4 (You may not see improvement once you reach 32 files). 

2. Enable server side trace flag 1118.

3. If you further see latch contention on PFS  page after following above two steps then the only option is to modify your application to limit the tempdb usage.

4. If you see contention on 2:1:103 (Page 103 is for system table sys.sysmultiobjrefs. This table manages the relationship between created objects in every database). The only way to reduce contention on this page is reduce the relation. Example creating lot of temp tables with primary key can cause this contention because the relation between the table and PK constraint has to be updated in sys.sysmultiobjrefs.

What’s the best practice ?

1. Create multiple tempdb data files instead of creating 1 large file and size them equally in all your SQL Server instances.

2. Make TF1118 (Uniform allocation) as default. (Extra space required by this trace flag shouldn’t really matter as amount additional space required is minimal and storage cost is not that high these days).  

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Performance, Space management, SQL General, SQL Server Engine | Tagged: , , , , | 1 Comment »

Troubleshooting Transactional replication Latency using Agent Statistics

Posted by Prabhakar Bhaskaran on September 13, 2013

Troubleshooting latency issues in replication is black box for many DBA’s, In this post I will explain how you can leverage the agent statistics to troubleshoot the latency issues.

Before understanding how to decode the agent statistics, lets take a look at the some of the basic things which will help us to troubleshoot the replication performance issue in better way.

The following MSDN diagram depicts the transactional replication architecture in simple manner.

Transactional replication components and data flow

Troubleshooting latency issues is multi step approach, first step is identify which agent is slow,

  • Log reader Agent (Publisher to Distributor)
  • Distribution Agent (Distributor to Subscriber)

So, the problem can be either log reader or distribution agent, we can identify this by just simply inserting the tracer token.

Once we find out the problematic agent the next step is to identify within the agent which particular thread causing the issue.

Let me introduce you to the important threads and its work on these replication agents in nutshell.

Log Reader Agent

Reader Thread – It scans the publisher database transaction log using sp_replcmds

Writer Thread – Add the queued transactions to Distribution database using sp_MSadd_repl_commands

Distribution Agent

Reader thread – It finds the watermark from the table Msreplication_subscriptions(on subscriber) and uses this information to retrieve pending commands from the Distribution database. It basically uses the stored procedure sp_MSget_replcommands to achieve it.

Writer thread – Writer thread uses the Batched RPC calls to write the information to subscriber database.

Now that we understood the threads in the replication agents.  let’s assume we already identified which agent is slow by inserting tracer token. Next is to dig deeper on thread level, this is where our replication agent statistics comes to rescue us.

Agent statistics entries appended to history tables every 5 minutes by default. It provides the historical view of how the agent has been performing and keeps the last 3 days data. You can keep for more days by changing the history retention period.

MSlogreader_history

MSdistribution_history

the above two tables are located in Distribution database. The statistics information is added as XML blob in comments column of these tables.

Now, lets take a look at how to decipher this XML Data for each agents.

Log Reader Agent statistics

<stats state=”1″ work=”948″ idle=”351940″ >
<reader fetch=”859″ wait=”0″/>
<writer write=”822″ wait=”395390″/>
<sincelaststats elapsedtime=”300″ work=”49″ cmds=”176998″ cmdspersec=”3543.000000″><reader fetch=”17″ wait=”0″/><writer write=”29″ wait=”350833″/></sincelaststats></stats>

– State = 1 means stats after batch commit

–Work = cumulative time spent by the agent since restart – idle time

–Idle = Time spent waiting to call sp_replcmds

–Reader fetch = Time to do execute sp_replcmds

Wait = Time spent waiting on writer to release buffer

–Writer write = Time spent writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer

Note: Each thread will have their own buffer with 40k in size.

Here,we need to look at the wait time to understand where the bottleneck exist.For example, if you notice wait time for Reader thread is high then it essentially means your writer thread is slow since reader thread is waiting for writer to release the buffer. Similarly, if you notice high wait time for writer thread then your reader thread is performing slow.

The simple way to decode this is,

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the commands to distribution database)

HIGH Wait time on Writer thread =  Reader thread is slow ( thread which scans the transaction log)

Distribution Agent Statistics

<stats state=”1″ work=”154″ idle=”351464″>
<reader fetch=”144″ wait=”11″/>
<writer write=”12″ wait=”338″/>
<sincelaststats elapsedtime=”305″ work=”10″ cmds=”81262″ cmdspersec=”8041.000000″><reader fetch=”0″ wait=”9″/><writer write=”10″ wait=”0″/></sincelaststats></stats>
– State =1 means stats after a batch commit

– Work = cumulative time spend by the agent since restart – idle time (seconds)

– Idle = Time spend waiting to call sp_msget_repl_commands

– Reader fetch = Time to do execute sp_msget_repl_commands

Wait = Time spent waiting on writer to release buffer.

– Writer write = Time spend writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer.

Similar to log reader agent, the decoding of wait time is same way we did for log reader agent.

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the subscriber database using batched RPC Calls)

HIGH wait time on Writer thread = Reader thread is slow ( thread which takes the pending commands from Distribution database)

Distributor Writer thread Slow Scenario

We would be able to understand this concepts better by looking at the example statistics, In this below case, I explicitly started the transaction on subscriber table to simulate blocking at the subscriber side making the writer thread of distribution agent to wait and build up latency.

This is how stats looked,

  • <stats state=”1″ work=”755″ idle=”354505″>
  • <reader fetch=”153″ wait=”604″/>
  • <writer write=”613″ wait=”346″/>
  • <sincelaststats elapsedtime=”636″ work=”515″ cmds=”45033″ cmdspersec=”87.000000″><reader fetch=”0″ wait=”515″/><writer write=”515″ wait=”0″/></sincelaststats></stats>

We can clearly see Reader thread wait time is high(515) which means writer thread is slow since we simulated the blocking on subscriber side.

Similarly,we can simulate the blocking on replication tables msrepl_commands and msrepl_transactions which will cause Log reader writer thread to be slow and stats will show Reader thread wait time as high.

Ok, now we isolated the source of bottleneck in thread level, After this we can just follow the standard performance troubleshooting approach described in this Whitepaper to troubleshoot the slowness of the replication session.

For instance, check out the video where Joe Sack talks about using Extended events to troubleshoot the Distributor writer thread slowness.

In Summary

1. Find which agent is causing slowness using tracer token.

2. Leverage the Agent statistics to narrow down problem to thread level .

3. Follow standard performance troubleshooting approach to resolve the issue.

Thanks for reading! I hope this will help you to troubleshoot the replication performance better next time.

Posted in Performance, Replication, SQL General | Tagged: , , , , , | 2 Comments »

Max server memory – Do I need to configure?

Posted by Karthick P.K on April 22, 2013

Do I need to configure Max server memory and min server memory?  What is the right value for  this configuration and how to determine it?

There are many debates around this and above questions are raised by many SQL Server DBA’s frequently in many forums. If you ask me , “It depends” on various factors.

Before we choose to configure or leave this value to default it is very important to understand how SQL Server grow and shrink its memory usage based on the available memory in operating system even when Max server memory is not configured or defaulted.

How SQL Server grow and shrink its memory usage based on the available memory in operating system even when Maximum server memory is not configured or defaulted?

 SQL Server memory management is designed to dynamically adjust its memory usage based on the amount of available memory on the system. SQL Server will keep allocating memory based on its need as long as there is memory available I.e. as long as MEMPHYSICAL_HIGH (HighMemoryResourceNotification )notification is signaled in widows  and will scale down its usage when there is MEMPHYSICAL_LOW (LowMemoryResourceNotification) signaled in windows. When available memory is between the low memory and high memory SQL Server will try to maintain the memory usage stable( RESOURCE_MEM_STEADY) with some exceptions.

You can download the ResourceNotificationHighandLow.exe from This link to see memory notifications from windows.

The default level of available memory that signals a LowMemoryResourceNotification event is approximately 32 MB per 4 GB, to a maximum of 64 MB. (By default, the threshold is 64mb on most systems).

The default level that signals a high-memory-resource notification event is three times the default low-memory value (By default, the threshold is 64*3=192 MB on most systems).

Key points:

1. Once the available memory on the system goes below 192 MB HighMemoryResourceNotification (MEMPHYSICAL_HIGH) signal is revoked by windows and SQL Server will not grow its Bpool.

2. Once the available memory on the system goes below 64 MB LowMemoryResourceNotification (MEMPHYSICAL_low) is signaled by windows and SQL Server will shrink its Bpool (reduce its memory usage).

3. When the available memory in the system is between 192Mb and 64 Mb (I.e between LowMemoryThreshold and HighMemoryThreshold) SQL Server will not grow or shrink its usage (With some exceptions which we will see in a while)

Note: So unless there is an crazy application in the system that keeps allocating and releasing memory in Zigzag fashion making  windows trigger HighMemoryResourceNotification and LowMemoryResourceNotification one after the other SQL Server will not grow and shrink its memory usage in Loop continuously. If there are such application in system then even configuring max server memory may not help.

The default Low memory threshold 64MB may not be ideal for all systems.  Ex: Let as assume an application is requesting 150MB of memory suddenly when the available memory is 190 MB and the grant is successful. Available memory will now drop to 40 MB making windows signal the LowMemoryResourceNotification. SQL Server will start responding to the LowMemoryResourceNotification from windows but at the same time windows working set manager will also start trimming the working set of all the processes. Which will bring down the overall performance of the system.

We can increase the LowMemoryThreshold value by making the following registry changes If LowMemoryThreshold set to higher value  OS will notify applications such as SQL on low memory conditions much earlier and SQL Server can respond to memory pressure much early before the system starves for memory and before windows working set manger starts trimming the working set of all the processes.

In Regedit -> go to

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\SessionManager\MemoryManagement

Right click on the right pane,

Select New -> select click DWORD Value -> enter LowMemoryThreshold

Double Click LowMemoryThreshold -> value (choose decimal) -> 512

System Reboot is required to take effect.

In the above example I have set the LowMemoryThreshold to 512 MB hence the MEMPHYSICAL_LOW notification will be signaled as soon as the available memory drops to 512MB and HighMemoryResourceNotification (MEMPHYSICAL_HIGH) will be in signaled state till the available memory is  1536MB (LowMemoryThreshold *3).

After making the above change SQL Server will grow its Bpool memory till the available memory in the system is greater than 1536 MB and as soon as the available memory drops below 1536MB HighMemoryResourceNotification signal will be revoked by windows causing SQL Server to maintain steady state and will not grow its memory usage further but that  doesn’t  mean SQL Server will wait for the LowMemoryResourceNotification notification to scale down its memory usage after the  HighMemoryResourceNotification notification is revoked. SQL Server will always try to keep the available physical memory in the system high (I.e. SQL Server will try to keep the available memory in system to  HighMemoryThreshold (LowMemoryThreshold  * 3 ).

What if I have multiple instances of SQL Server on same server and how they load balance the memory among themselves?

SQL Server will try to balance to balance its memory usage with other instances of SQL Server running on the same box .  As I mentioned earlier SQL Server will try to maintain the available memory on the system to High memory threshold. SQL Server Lazy writer checks If there is disk reads performed in last 10 seconds and if there is no reads for last 10 seconds then SQL Server will reduce its memory usage until HighMemoryResourceNotification is signaled by OS.

Let us see this with an example :

Let us assume there are 2 SQL Server instances running on server with 32 GB of RAM and Lowmemorythreshold is set to 512MB on the system (so HighMemoryThreshold is 1536 (Lowmemorythreshold *3)).

1.            When the OS starts  HighMemoryResourceNotification  is set to on because there is adequate available memory on the server.

2.            SQL Server instance 1 starts first and it will consume memory till the HighMemoryResourceNotification resource notification is revoked ( HighMemoryResourceNotification  will be revoked when available memory drops below 1536 MB).

2.            Now the 2nd SQL Server instance is started and it finds High memory resource notification is revoked so it will not increase its memory usage. 

3.            Lazy writer thread of 1st instance checks if there is any disk reads performed by 1st Instance  in last 10 seconds , If there is no disk reads then first instance will scale down its usage until HighMemoryResourceNotification is signaled by OS (HighMemoryResourceNotification  will be signaled again when available memory becomes 1536 MB).

4.            2nd Instance which is hungry for memory sees the High memory resource notification and starts growing its usage till the high memory notification is revoked.  Once the high memory notification is revoked 2nd instance will stop growing.

5.            1st  instance finds the high memory notification is revoked and will again check if there are any disk reads in last 10 seconds and if there are no reads then It will further scale down till there is high memory resource notification.

6.            Once the high memory is signaled 2nd instance will start growing again.

7.            Over a time each instance will very well balance their memory requirements among themselves. ( I.e. if there is read performed from disk with in last 10 seconds we assume there is additional memory requirement for the instance so it will not scale down while on other hand if there is no reads for more than 10 seconds and if the memory available is below the high memory threshold instance it will scale down to give memory for other instance)  

8.            Instance with higher memory requirement will be consuming more memory than the instance with low memory requirement in some time. This way both the instances will balance their memory requirements with each other.

Note:

1. Above logic may not fit well if the total Physical memory on the system is very low compared with the memory requirements of multiple SQL Server instances running on the system because if you start the second instance while the first SQL Server is running with full memory utilization but still performing lot of reads I.e. RESOURCE_MEM_STEADY and still lot of reads , second instance may take long time or may not scale up its memory usage soon. In such case you can cap the max server memory but the performance of SQL Server will be very poor because of memory contention.

2. Also be cautious when you increase the value of LowMemoryThreshold beyond 512 MB.  Increasing this threshold  increases the range of memory that is available where neither the LowMemoryResourceNotification or HighMemoryResourceNotification object is signaled ( RESOURCE_MEM_STEADY).  So when you have multiple instance , if you start the second instance while the first SQL Server is running with full memory utilization and with lot of reads  I.e. RESOURCE_MEM_STEADY + reads continuously , second instance may take time scale up its memory usage soon and chances of getting Lowmemorythreshold is low because of wider range  of  RESOURCE_MEM_STEADY

FAQ:

1. What will happen when MTL allocations increases?

Available memory in system drops when the MTL consumption increases. If the MEMPHYSICAL_HIGH is set then there will not be any effect to bPool. If MTL consumption increases drastically it might cause available memory to drop further causing windows to trigger LowMemoryResourceNotification.

If LowMemoryThreshold is siganled SQL Server will scale down its bPool usage.

2. Will windows working set manager starts trimming the working set of all processes as soon as the LowMemoryResourceNotification is signaled?

No.

3. What are the other effects of changing LowMemoryThreshold?

There might be other application and drivers which is also using memory notification from windows to grow and shrink memory usage. They will also shrink and grow when there is notification from windows.

4. Why would I need to CAP my SQL Server memory when we have a great dynamic mechanism in SQL Server to grow and shrink its memory usage?

You can leave the max server memory as default  If your operating system is Windows 2008 or above and if you have all the fixes in This link and This link and if you do not have any faulty drivers or applications which will request large amount of memory suddenly and if you are not using large pages memory model else I would suggest capping the Max server memory

If you have decided to configure the Max server memory remember it will not control the overall memory used by SQL Server. There are significant changes in memory allocations controlled by Max server Memory between SQL Server2012 and earlier versions. Let us understand what allocations it controls in SQL Server 2012 and earlier versions of SQL Server

What is controlled by SQL Server Max Server Memory (Extract from SQLServer2012 Memory) ?

SQL Server memory is internally divided in to two regions known as BPOOL and NonBPool (aka MTL or MTR) More details about BPOOL and MTL can be found in This blog.

In earlier versions of SQL Server (Till 2008 R2) “Max Server Memory” controlled the Maximum physical memory Single page allocator (BPOOL)  can consume in SQL Server user address space.

Only the single page allocator was part of BPOOL and Max server memory controlled only BPOOL, so the following allocations came outside BPOOL (Max server memory)

1.Multi-Page allocations from SQL Server [These are allocations which request more > 8 KB and required contiguous memory]

2.CLR allocations [These include the SQL CLR heaps and its global allocations created during startup]

3.Memory used for thread stacks within SQL Server process (Max worker threads * thread stack size). Thread stack size is 512K in 32 bit SQL Server, 904 K in WOW mode and 2 MB in 64-Bit 

4.Direct windows allocations made by Non-SQL Server dll’s ([These include windows heap usage and direct virtual allocations made by modules loaded into SQL Server process. Examples: allocations from extended stored procedure dll’s, objects created using OLE Automation procedures (sp_OA calls), allocations from linked server providers loaded in sqlserver process)

SQL Server 2012 memory manager has now clubbed single page allocator and multipage allocator together  as any-size page allocator . As a result, the any-size page allocator now manages allocations categorized in the past as single page and Multi-Page allocations.

1. "max server memory" now controls and includes “Multi pages allocations”.

2. In earlier versions of SQL Server CLR allocated memory was outside BPOOL (Max server memory)   . SQL Server 2012 includes SQL CLR allocated memory in "max server memory".

SQL Server 2012 "max server memory" configuration does not include only the following allocations:

1. Memory allocations for thread stacks within SQL Server process

2. Memory allocation requests made directly to Windows [Ex: Allocations (Heap, Virtualalloc calls ) from 3rd party Dll’s loaded in SQL Server process , objects created using OLE Automation procedures (sp_oa) etc]

Hope you got clarity on allocations controlled by  Max server memory , Let us see how to set it.

How to set correct value for SQL Server Max server memory?

There is no magic formula for this. Estimate the memory required by other applications running on same server, Operating system, Drivers , SQL Server Non- bPool allocations, jobs, anti virus etc.. Make sure you have acceptable available physical memory even when the system is under heavy load.

1. Consider the operating system memory requirement.

     Approximately 1 GB (Would increase if it is DC, cluster etc.)

2. Consider the memory requirements by other applications/processes running on the server.

   You have to derive it based on applications/processes/AV’s running on the system and their memory requirements. (Perfmon Process-> Private bytes and Working set can help)

3. Consider the memory requirements of the drivers/firmwares.

    You have to derive it based on memory requirements by drivers installed on the system. (RAMMAP can help)

4. Consider the NonbPool (aka MTL or MTR) memory requirements by SQL Server.

select  sum(multi_pages_kb)/1024 as multi_pages_mb from sys.dm_os_memory_clerks

(You can skip above query if your SQL Server version is 2012)

+

Max worker threads * 2MB

+

Memory for direct Windows allocations approximately 0 to 300 MB in most of the cases but you may have to increase it if there are many 3 party components loaded in SQL Server process (Including linked server dll’s, 3rd party backup dll’s etc.)

+

If you are using CLR extensively add some additional memory for CLR.   

5. Consider the memory requirement by jobs (Including replication agents, Log shipping etc. )  and packages that will run on the server.

You have to derive (May vary from few Mb’s to GB’s)

6. Consider SSAS and RS memory requirements.

You have to derive

7. Make sure there is good enough free space for operating system.

Approximately (100 MB for each GB till 4G) + (50 MB for each additional GB till 12GB) + (25 MB for each additional GB till your RAM size)

8. Other memory requirements.

If you have any other memory requirement specific to your environment.

Once you have calculated a reasonable value for all the above memory requirements  take the sum of all the above requirements and deduct it with total physical memory to derive an ideal value for your max server memory.

Max server memory=  Total physical memory  – (1+2+3+4+5+6+7+8)

If you still see LowMemoryResourceNotification  or working set below 100% frequently then use This exe which will print the memory information of all the processes and system wide memory information (Global memory status) when the operating system signals  low memory notification. Once you get the output from the exe when there is LowMemoryResourceNotification  review requirements of each process and tweak Max server Memory accordingly.

Important: Make sure you have this fix if you are on windows2003 http://support.microsoft.com/kb/938486

What about Min server memory and should I configure it?

I mentioned earlier that when LowMemoryResourceNotification comes from Windows or HighMemoryResourceNotification is revoked+No reads for 10 seconds , SQL Server scales down its memory usage.

How much it scales down?

Until “Minimum server memory”  is reached (If there is continuous memory pressure on the system).

What happens when you set Max server memory and min server memory to same value?

SQL Server will never scale down its memory usage even when there is memory pressure system wide (Lowphysicalmemory notification  set at system level). Note: This setting does not affect OS from paging.

What are the affects?

When there is LowMemoryResourceNotification  If LPM is not enabled SQL Server’s working set (Bpool + Non bPool )will be paged. If LPM is enabled system will starve for memory  and non-bpool will be paged.

If you do not want SQL Server to scale down its usage when there is LowMemoryResourceNotification  in windows configure Min server memory and Max server memory to same value (Bad choice). 

If you want to limit  “how much SQL Server wants to scale down“ you can configure this value.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

 

Posted in Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: , , , , , | 39 Comments »

SQL Server and VMware ballooning

Posted by Karthick P.K on March 31, 2013

VMware and SQL Server performance

If you are running production SQL Server on VM-Ware double check if you have configured/disabled ballooning for the virtual machine in which SQL Server is running.

What is ballooning?  Method by which VMware host can  reclaim memory from the Virtual machines.

Is it really bad to give memory from Guest to Host ? My opinion  is yes if you are running production SQL Server on VM-Ware.

Why I think it’s bad? SQL Server is memory intensive application and requires adequate memory for smooth running. If SQL Server doesn’t have adequate memory to run you see poor response time, Resource_semaphore/ Resource_semaphore_Query_compile waits, increased I/O, OOM errors, Non-Yield condition’s etc. In addition to all this when memory is reclaimed from virtual machines available memory in windows drops triggering windows to page out the working set of all the processes and you will face all side affects discussed in A significant part of SQL Server process memory has been paged out  .

In worst case it is better not to give memory for SQL Server instead of give and take back. Remember Max server Memory is also a factor which will impact the generation of execution plan by the optimizer, So plan generated when you have X GB of max server memory may not be the right plan to use when you have Y GB actual memory after ballooning reclaimed memory from guest OS.

What if hypervisor runs low in physical memory?  It gives a hint that you  have a poor consolidation. You can pick up the other Virtual machines that are not hosting production SQL Servers  (or SQL Servers) from same Hypervisor and tweak  reservations (or) Increase the maximum memory that can be reclaimed when the hypervisor is under memory pressure.

What if I don’t disable ballooning for my production SQL Server?  Ballooning can slowly take the memory from virtual machine in which SQL Server is hosted and can cause all the problems I mentioned above.

To make things confusing when you look at task manager you may not even realize that ballooning has reclaimed memory from Guest OS, Because total physical memory shown in performance tab includes the memory taken by ballooning driver.

How to identify > Look at the driver locked memory from RAMMAP sysinternals tool. (VM memory performance counters can also be used)

Some of the RAMMAP output captured in production SQL Servers can self-explain. Driver locked value would be few MB’s in normal systems, If the value is very high in VMware virtual machines then you can assume ballooning is reclaiming the remaining memory .  

Below is output of RAMMAP from Virtual machine with 12 GB memory hosting SQL Server with max server memory capped to 8GB.  

Driver locked is around 8GB. So the system is running with less than 4Gb of RAM and how much is for SQL Smile?

clip_image002

Below is output of RAMMAP from Virtual machine with 24 GB memory hosting SQL Server with max server memory capped to 20GB.  

Driver locked is around 16 GB. So the system is running with less than 8Gb of RAM and how much is for SQL Smile?

clip_image004

How to disable ballooning ? Refer http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002586

You don’t agree ? I respect your view but my view is different Smile.

Note: I have not recommended to disable ballooning in every virtual machines. I recommend to do it for your performance sensitive SQL Server and if you find your hypervisor is running low in memory revisit your consolidation (or) Configure other non critical virtual machines running on same host in such a way that hypervisor can reclaim memory from them when it is under low memory condition.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Performance, SQL Server Engine, SQL Server memory | Tagged: , , , , | 4 Comments »

Debugging memory Leaks using Debug diagnostic tool.

Posted by Karthick P.K on December 6, 2012

In my previous post (SQL Server memory leak ) I explained how to identify the modules  which are leaking the memory using ‘!heap’ commands.  Sometimes we may not be able to find the cause by displaying the memory using ‘!d’ commands to find the patterns or using search memory commands (s).

In such scenarios  we can use  Debug Diagnostic Tool or UMDH to track memory leaks.   This blog will explain how to identify the memory leaks using Debug diagnostics tools.

 

Download and install Debug Diagnostic Tool from http://www.microsoft.com/en-us/download/details.aspx?id=26798

 

1. Go to ToolsàOptions ->Preferences àSelect Record call stacks immediately when monitoring the leaks.

 

clip_image002

 

2. Go to the rules tab and select add rule.

 

3. Choose Native (non .Net) memory leak and handle leak.

 

4. Select the SQL Server or any process which has to be tracked for memory leak.

 

5. Click next and leave the default options (you can choose auto-unload Leak track when rule is completed or deactivated).

 

6. Click next and Activate the rule now.

 

7. Leaktrack.dll would have loaded to the process for which we are tracking the allocations.

 

8.  Now you can wait for the leak to happen again.

 

{

–If you are learning how to troubleshoot SQL Server memory leak follow the steps which we followed in previous post (https://mssqlwiki.com/2012/12/04/sql-server-memory-leak/)to leak the memory.

 

–Download HeapLeak.dll from this link.

–Create  an extended stored procedure in SQL Server

sp_addextendedproc ‘HeapLeak’,‘C:\HeapLeakdll\HeapLeak.dll’

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

}

 

9. Once you suspect memory is leaked. Go to the rules and take a full user dump by right clicking the Leak rule.

 

clip_image004

 

 

10. After the dump is captured , go to the advanced analysis tab, Add data files and select the dump which we generated.

 

11. Go to ToolsàOptions ->set the symbol path for analysis. Default Microsoft symbol path is below.

 

srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;c:\Release

 

Important: Replace c:\Release with symbol path of dll’s which you have loaded in SQL Server (optional)

 

 

11. In the available analysis script select memory pressure analyzers (memory analysis.asp).

 

12. Click start analysis.

 

clip_image006

 

 

13. Analysis might take a while depending on time it takes to load the symbols. Once the analysis is completed it would generate and open a HTML report.

This HTML report is  stored in C:\Program Files\DebugDiag\Reports\ by default and can be used for later reference.

 

I have attached a sample report which I collected when leaking memory using heapleak.dll in This link. You can use it for reference.

 

 

Report generated by debug diagnostic tool memory pressure analyzer will have the analysis summary and below Table Of Contents

 

sqlservr.exe__…………dmp

   Virtual Memory Analysis Report

   Heap Analysis Report

   Leak Analysis Report

   Outstanding allocation summary

    Detailed module report (Memory)

    Detailed module report (Handles)

 

 

 

14.  Analysis summary is  good portion in the report to start with and would give the module which is leaking the memory. Look at the below report.

 

clip_image008

 

 

15. Report has clearly indicated HeapLeak.dll has 255 MB of outstanding allocations. In heapleak.dll “Sub“ is the function which allocated this memory at offset 23.

 

 

16. Look at the virtual memory summary. It gives complete picture about memory distribution in the virtual address space. In the below summary memory reserved is 1.57 GB which is normal in 32-Bit SQL Server, but native heaps is 272.94 MB which is not normal.

Look at the heap summary there are 50 heaps.

 

 

 

 

 

clip_image010

 

clip_image012

 

17. Now look at the Outstanding allocation summary. It gives top 10 modules by allocation count and allocation size. In below summary HeapLeak has 26,182 allocations with size of 255.6 MB.

Note: In this report it is HeapLeak but in real time it might be any module which is leaking the memory

.

 

clip_image014

 

18. You can  also look at detailed module report(Memory).  It gives the memory allocation from each module along with function  and source line which allocated the memory (If you set the symbols for all the modules loaded).

 

clip_image016

 

 

By now we are sure that sub function in HeapLeak.dll has allocated 255 MB in line number 14 and has not released. The report also gives you the callstack samples that show the code path when functions was doing allocations. Refer This sample HTML report file. 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

Related posts:

Basics of SQL Server Memory Architecture

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

 

Posted in Debugging, Memory, Performance, SQL Server memory | Tagged: , , , , , | 5 Comments »

SQL Server memory leak

Posted by Karthick P.K on December 4, 2012

What is memory leak?

When a process allocates memory it is supposed to de-allocate it and release it back to OS. If it misses to de-allocate the memory due to flaw in code it is called as leak and It can cause memory pressure both to the operating system and application.

 

Myth about SQL Server memory leak

SQL Server memory management is designed to dynamically grow and shrink its memory based on the amount of available memory on the system and Max server memory setting in SQLServer.

Many times system admins look at the memory usage of SQLServer and assume SQLServer is leaking memory if they find SQL Server memory usage is high.

This is incorrect SQL Server is server based application and its memory manager is designed in such a way that it will keep growing its memory usage on need (Exception large pages) and will not scale down its usage unless there is low memory notification from Windows. We can control the memory usage of SQL Server using Max server memory setting in SQLServer. This setting limits the Bpool usage of SQL Server and doesn’t control the overall memory usage of SQLServer. There are portions of SQLServer memory that is allocated outside BPOOL (aks: MTL or MTR) we do not have a way to control how much memory SQL Server can use outside bpool, but non bool memory usage will be normally low and can be easily estimated by studying the components running in SQL Server.

Ex: If you want to set SQLServer to use only 10GB RAM on server. Consider how much memory SQL Server might need outside Bpool and set the “max server memory” setting accordingly. In this case if you estimate SQL Server will use 1.5GB outside Bpool then set the Max server memory to 8.5GB.

What can cause SQL Server Memory leak?

SQL Server code has a logic to allocate memory but doesn’t de-allocate it. If any of the components in SQL Server is causing a memory leak in SQL Server it can be identified easily using the DMV’s like sys.dm_os_memory_allocation,sys.dm_os_memory_clerks and sys.dm_os_memory_objects etc., but most of the memory leaks in SQL Server is caused by 3rd party Dll’s which are loaded in SQL Server process.

 

Note: All the memory allocations by Non SQL server Dll’s loaded in SQL Server will happens in “Mem to Leave”(outside the Bpool) and they are called as direct windows allocations (DWA) 

 

When there is out of memory conditions in SQL Server and if you suspect there is a memory leak.First thing to determine is who is consuming the memory. If SQL Server is not using majority of the memory in MemToLeave and still you get Mem to leave errors probably there is a leak and it caused by some DLL’s loaded in

SQL Server. Refer Section 1 (MTL error) in https://mssqlwiki.com/sqlwiki/sql-performance/troubleshooting-sql-server-memory/

 

Below query can be used to determine actual memory consumption by SQL Server in MTL.

select sum(multi_pages_kb)  from sys.dm_os_memory_clerks

 

If the memory consumption by SQL Server is very low and still if you see SQL Server memory errors like few below then focus on Leaks.

 

Ex:

SQL Server 2000

                WARNING: Failed to reserve contiguous memory of Size= 65536.

                WARNING: Clearing procedure cache to free contiguous memory.

                Error: 17802 “Could not create server event thread.”

                SQL Server could not spawn process_loginread thread.

SQL Server 2005/2008

                Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 122880

 

 

 

How to identify and troubleshoot the memory leak?

 

There are multiple ways in windows to identify who is leaking memory in process. We will discuss how to identify the memory leak using  

 

1. Windows debugger 2. Debug diagnostics tools for windows and 3. UMDH in this blog.

 

Let us create a sample DLL to load in SQL server process to leak memory and see how to use the tools I mentioned above to troubleshoot the leak. 

 

Download HeapLeak.dll from This link and install Microsoft Visual C++ 2010 Redistributable Package from this links 32-Bit or 64-Bit to make this DLL work.

 

–Create  an extended stored procedure in SQL Server

exec sp_addextendedproc  'HeapLeak','C:\HeapLeakdll\HeapLeak.dll'

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

 

 

We will also enable below trace flags in SQL Server to automatically generate filter dump when there  is out of memory errors and see how to identify who is leaking.

 

 

dbcc traceon (2551,-1) — 2551 is used to enable filter dump.

go

dbcc traceon (8004,-1) –8004 is used to take memory dump on first occurrence of OOM condition

go

–Note: Both the trace flags listed above are un-documented, So use it at your own risk and there is no guarantee that this trace flags will work in future versions of SQL Server

 

 

Once we enable the trace flag . We have to cause out memory error in SQL Server to generate OOM memory dump. We have leaked around 300 MB of memory from MTL by executing above extended SP 30 times.

 

Let use execute below script which create XML handles. Memory for xml handles is allocated from MTL we will get out of memory errors very soon because extended stored procedure which we executed has already leaked the memory.

(Do not run below XML script directly with out executing HeapLeak  Below script will cause OOM error because of handle created for each execution, but it is accounted as SQL Server allocation so will not help us to understand the  how to debug leaks caused by 3rd party DLL’s)

 

Note: 1. SQL Server memory dump will be generated in SQL Server error log folder. 
2. Size of MTL is 256 MB + Max worker threads *.5  in 32-Bit SQL Server.  So approximately 384 MB unless modified using –g switch.

 

DECLARE @idoc int
 
DECLARE @doc varchar(1000)
 
SET @doc ='<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
<Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
     <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
      <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
   </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
   <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
   <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
   </Order>           
</Customer>
</ROOT>'
 
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
 
go 10000

We will receive below error after few executions.

Msg 6624, Level 16, State 12, Procedure sp_xml_preparedocument, Line 1

XML document could not be created because server memory is low.

To analyze the dump download and Install Windows Debugger from http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

 

Step 1 (Load the memory dump file to debugger):

 

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

 

Step 2 (Set the symbol path to Microsoft symbols server):

 

on command window type

 

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

 

Step 3 (Load the symbols from Microsoft symbols server):

 

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

 

 

Step 4 (check if symbols are loaded):

 

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

:028> lmvm sqlservr

start    end        module name

01000000 02ba8000   sqlservr   (pdb symbols)          c:\websymbols\sqlservr.pdb\93AACB610C614E1EBAB0FFB42031691D2\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Oct 14 15:35:29 2005 (434F82E9)

    CheckSum:         01B73B9B

    ImageSize:        01BA8000

    File version:     2005.90.1399.0

    Product version:  9.0.1399.0

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0409.04e4

    CompanyName:      Microsoft Corporation

    ProductName:      Microsoft SQL Server

    InternalName:     SQLSERVR

    OriginalFilename: SQLSERVR.EXE

    ProductVersion:   9.00.1399.06

    FileVersion:      2005.090.1399.00

    FileDescription:  SQL Server Windows NT

    LegalCopyright:   © Microsoft Corp. All rights reserved.

    LegalTrademarks:  Microsoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

    Comments:         NT INTEL X86

 

Step 5 : (!address to display the memory information)

 

Use !address command to display the memory information of the process from dump.

 

 

0:028> !address -summary

 

——————– Usage SUMMARY ————————–

    TotSize (      KB)   Pct(Tots) Pct(Busy)   Usage

   686a7000 ( 1710748) : 81.58%    81.80%    : RegionUsageIsVAD

     579000 (    5604) : 00.27%    00.00%    : RegionUsageFree

    4239000 (   67812) : 03.23%    03.24%    : RegionUsageImage

     ea6000 (   15000) : 00.72%    00.72%    : RegionUsageStack

      1e000 (     120) : 00.01%    00.01%    : RegionUsageTeb

   122d0000 (  297792) : 14.20%    14.24%    : RegionUsageHeap

          0 (       0) : 00.00%    00.00%    : RegionUsagePageHeap

       1000 (       4) : 00.00%    00.00%    : RegionUsagePeb

       1000 (       4) : 00.00%    00.00%    : RegionUsageProcessParametrs

       1000 (       4) : 00.00%    00.00%    : RegionUsageEnvironmentBlock

       Tot: 7fff0000 (2097088 KB) Busy: 7fa77000 (2091484 KB)

 

——————– Type SUMMARY ————————–

    TotSize (      KB)   Pct(Tots)  Usage

     579000 (    5604) : 00.27%   : <free>

    4239000 (   67812) : 03.23%   : MEM_IMAGE

     5fc000 (    6128) : 00.29%   : MEM_MAPPED

   7b242000 ( 2017544) : 96.21%   : MEM_PRIVATE

 

——————– State SUMMARY ————————–

    TotSize (      KB)   Pct(Tots)  Usage

   1b7bd000 (  450292) : 21.47%   : MEM_COMMIT

     579000 (    5604) : 00.27%   : MEM_FREE

   642ba000 ( 1641192) : 78.26%   : MEM_RESERVE

 

Largest free region: Base 00000000 – Size 00010000 (64 KB)

 

 

Look at the RegionUsageHeap it is around 297792 KB and largest free region is just 64KB. We know SQL Server doesn’t use Heap’s extensively so normally the heap allocated by SQL Server will not go beyond few MB. In this case it is consuming around 290 MB and so other components which use MTL can easily fail.

Let us try to understand why the Heap is around 297792 KB and try to identify if there is  a pattern.

 

Step 6: (Let us use !heap –s to display summary information about the heap)

 

 

0:028> !heap -s

LFH Key                   : 0x672ddb11

  Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast

                    (k)     (k)    (k)     (k) length      blocks cont. heap

—————————————————————————–

000d0000 00000002    1024    896    896      6     1     1    0      0   L 

001d0000 00008000      64     12     12     10     1     1    0      0     

002c0000 00001002    1088     96     96      2     1     1    0      0   L 

002e0000 00001002      64     52     52      3     2     1    0      0   L 

007c0000 00001002      64     64     64     56     1     0    0      0   L 

00d10000 00001002     256     24     24      8     1     1    0      0   L 

340b0000 00001002      64     28     28      1     0     1    0      0   L 

340c0000 00041002     256     12     12      4     1     1    0      0   L 

342a0000 00000002    1024     24     24      3     1     1    0      0   L 

34440000 00001002      64     48     48     40     2     1    0      0   L 

61cd0000 00011002     256     12     12      4     1     1    0      0   L 

61d10000 00001002      64     16     16      7     1     1    0      0   L 

61d20000 00001002      64     12     12      4     1     1    0      0   L 

62a90000 00001002    1024   1024   1024   1016     2     0    0      0   L 

62b90000 00001002    1024   1024   1024   1016     2     0    0      0   L 

62c90000 00001002     256     40     40      7     1     1    0      0   LFH

00770000 00001002      64     16     16      2     2     1    0      0   L 

63820000 00001002      64     24     24      3     1     1    0      0   L 

63830000 00001001   10240  10240  10240    160    21     0    0    bad     

64230000 00001001   10240  10240  10240    160    21     0    0    bad     

64c30000 00001001   10240  10240  10240    160    21     0    0    bad     

65630000 00001001   10240  10240  10240    160    21     0    0    bad     

66030000 00001001   10240  10240  10240    160    21     0    0    bad     

66a30000 00001001   10240  10240  10240    160    21     0    0    bad     

67430000 00001001   10240  10240  10240    160    21     0    0    bad     

68130000 00001001   10240  10240  10240    160    21     0    0    bad     

68b30000 00001001   10240  10240  10240    160    21     0    0    bad     

69530000 00001001   10240  10240  10240    160    21     0    0    bad     

69f30000 00001001   10240  10240  10240    160    21     0    0    bad     

6a930000 00001001   10240  10240  10240    160    21     0    0    bad     

6b330000 00001001   10240  10240  10240    160    21     0    0    bad     

6bd30000 00001001   10240  10240  10240    160    21     0    0    bad     

6c730000 00001001   10240  10240  10240    160    21     0    0    bad     

6d130000 00001001   10240  10240  10240    160    21     0    0    bad     

6db30000 00001001   10240  10240  10240    160    21     0    0    bad     

6e530000 00001001   10240  10240  10240    160    21     0    0    bad     

6ef30000 00001001   10240  10240  10240    160    21     0    0    bad     

6f930000 00001001   10240  10240  10240    160    21     0    0    bad     

70330000 00001001   10240  10240  10240    160    21     0    0    bad     

70d30000 00001001   10240  10240  10240    160    21     0    0    bad     

7a160000 00001001   10240  10240  10240    160    21     0    0    bad     

7ab60000 00001001   10240  10240  10240    160    21     0    0    bad     

7b560000 00001001   10240  10240  10240    160    21     0    0    bad     

7d0d0000 00001001   10240  10240  10240    160    21     0    0    bad     

7e030000 00001001   10240  10240  10240    160    21     0    0    bad     

7ea30000 00001001   10240  10240  10240    160    21     0    0    bad     

67f90000 00001003     256     16     16     14     1     1    0    bad     

71850000 00001003     256      4      4      2     1     1    0    bad     

71890000 00001003     256      4      4      2     1     1    0    bad     

67fd0000 00001002      64     16     16      4     1     1    0      0   L 

718d0000 00001003     256     40     40      3     1     1    0    bad     

71910000 00001003     256      4      4      2     1     1    0    bad     

71950000 00001003     256      4      4      2     1     1    0    bad     

71990000 00001003     256      4      4      2     1     1    0    bad     

67ff0000 00001002      64     16     16      4     1     1    0      0   L 

719d0000 00001003    1792   1352   1352      5     2     1    0    bad     

71a10000 00001003     256      4      4      2     1     1    0    bad     

71a50000 00001003     256      4      4      2     1     1    0    bad     

71a90000 00001002      64     16     16      1     0     1    0      0   L 

—————————————————————————–

 

 

If you look at the above out put you can clearly identify a pattern. There are multiple created and each of them is 10 MB. But how to identify who actually created them?

 

Step 7:

 

Let us pickup one of the heap which is 10 MB and display all the entries (allocations) with in this 10 MB heap using !heap with –h parameter

 

Heap I have picked is 63830000.

 

 

0:028> !heap -h 63830000

Index   Address  Name      Debugging options enabled

19:   63830000

    Segment at 63830000 to 64230000 (00a00000 bytes committed)

    Flags:                00001001

    ForceFlags:           00000001

    Granularity:          8 bytes

    Segment Reserve:      00100000

    Segment Commit:       00002000

    DeCommit Block Thres: 00000200

    DeCommit Total Thres: 00002000

    Total Free Size:      00005048

    Max. Allocation Size: 7ffdefff

    Lock Variable at:     00000000

    Next TagIndex:        0000

    Maximum TagIndex:     0000

    Tag Entries:          00000000

    PsuedoTag Entries:    00000000

    Virtual Alloc List:   63830050

    UCR FreeList:        63830588

    FreeList Usage:      00000000 00000000 00000000 00000000

    FreeList[ 00 ] at 63830178: 6422de88 . 638ad7e0      Unable to read nt!_HEAP_FREE_ENTRY structure at 638ad7e0

(1 block )

    Heap entries for Segment00 in Heap 63830000

        63830608: 00608 . 00040 [01] – busy (40)

        63830648: 00040 . 02808 [01] – busy (2800)

        641b6698: 02808 . 02808 [01] – busy (2800)

        ……………………………………

        ……………………………………

        ……………………………………

        ……………………………………

       

Step 8: (Let us pickup one of the heap entry (allocation) and try to identify what is in it)

 

 

0:028> db 641b6698

641b6698  01 05 01 05 93 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

641b66a8  6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20  ng the heap for

641b66b8  64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b  demo at MSSQLWIK

641b66c8  49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00  I.COM………..

641b66d8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b66e8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b66f8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

641b6708  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  …………….

 

0:028> db 63830648

63830648  01 05 08 00 89 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

63830658  6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20  ng the heap for

63830668  64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b  demo at MSSQLWIK

63830678  49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00  I.COM………..

63830688  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

63830698  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306a8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306b8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

 

 

Similarly you can dump multiple heap allocations to identify a pattern.

 

Now if you look at the memory dumped you see a string which might help you to identify the DLL which created the heap. There is a pattern in above heaps. All the heap allocations have below string

“Iam filing the heap for demo at MSSQLWIKI.COM”

 

Note : You can use L Size to dump more memory using db or dc command’s example db 63830648 L1500

 

Step 9:

Let us open the DLL which we loaded in SQL Server for testing using notepad and see if there is string which matches the pattern

 

clip_image002

 

 

Yes there is which proves that this DLL’s has caused the leak. In real time you may have to play with different heap allocations to identify the pattern.

 

This is one way to find the leaks from the memory dump after the leak has actually happened. It may not be always easy to find a pattern and identify the modules who allocated the memory, In such scenarios  you may have to track the leak using the tools like debug diagnostic tool, UMDH etc.In the my next blog I will post how to track memory leak using Debug diagnostics tool.

Continued in Debugging memory Leaks using Debug diagnostic tool

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

Related posts:

Basics of SQL Server Memory Architecture

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, Memory, Performance, SQL General, SQL Server Engine | Tagged: , , , , | 46 Comments »

SQL Server NUMA load distribution

Posted by Karthick P.K on November 22, 2012

When port affinity is not configured all the connection to SQL Server enters through single port and connections are tied to nodes in round robin basis.

 

We might end with Imbalance of Workload in NUMA systems under below conditions.

 

1. When a connection is tied (or) affinitized to a node, all the work from that connection will be completed on the same node (in which connection is directed)  if plans are serial.  We don’t consider the CPU load across the NUMA to pick up the node for serial plans, We use the same node in which connection is made for serial plan execution. Parallel query would use any NUMA node regardless of node this query came from. When all the queries execute from connections made to same node and if plans are also serial we might end up with overloading one Node while others are not fully used.

 

2. State of each nodes is internally maintained by SQL Server and updated every 2 seconds so there is  remote possibility that all parallel queries end with same node some times and cause spike in one node, while the other nodes are unused.

 

3. When there is imbalance between the number of online schedulers in each node (Ex: 16-CPU in Node1 and 4-CPU in Node2 ) and if all plans are serial (assume we have set Max DOP 1) We might end up with overloading the schedulers in node with least schedulers. while the schedulers on other node is underused, similarly when memory is shared across nodes we share it equally irrespective of number of schedulers on each node so in this case first 16 schedulers would have got  half of memory and 4 schedulers of second node would have got remaining half. So ensure you choose the CPU affinity carefully (Specially when you have installed SQL Server with limited processor license on system with larger number of CPU’s). 

 

 

Image 1:  sys.dm_os_schedulers (6 – CPU’S on node-0 and  1- CPU on node-1. Look at current task count)

 

clip_image002

 

Image 2 (Look at the current and pending task in node 0 and in node 1)

clip_image002[4]

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

 

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Debugging, Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: , , , | 3 Comments »

SQL Server Query optimization

Posted by Karthick P.K on November 6, 2012

SQL Server Query optimization (or) Tuning slow queries in SQL Server.

How to troubleshoot (or) tune slow queries in SQL Server, Optimize slow queries to run faster , resolve error sql server -2147217871 Query timeout expired  and make them run faster?

A query in considered to be slow when it is executing for longer duration than expected. Total duration of the query can be broken in to compile time, CPU time and Wait time.

 

Before you start troubleshooting the query which is running for longer duration, Identify if the query is slow because it is long waiting (or) Long running (or) Long compiling.

 

Compile time:Time taken to compile the query.  compile time can be identified by looking at the

 

1. CompileTime=”n”  in XML plan

2. SQL Server parse and compile time when Set statistics time on is enabled.

 

CPU time: Time taken by the query in CPU (Execution time – (compile time+ wait time). CPU time can be identified by looking at the

 

1. CPU column in profiler.

2.  CPU time under SQL Server Execution Times when statistics time on is enabled.

 

Execution time: Time taken by the query for complete execution ( Execution time = CPU time (CPU time for compilation+execution) +Wait time). Total duration of the query can be identified by using the

 

1.Duration column in profiler

2. SQL Server Execution Times, elapsed times when statistics time on is enabled.

 

What is long waiting query?

 

A query is considered to be long waiting query, when it spend most of its time waiting for some resource.

 

How to identify if the query is long waiting?

 

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set .

When a query is waiting for a resource (such as lock, network I/O, Page_I/O Etc) it will not  consume CPU.  So if you see duration being higher than CPU (Difference between Duration and  CPU is wait time),It indicates that the query has spent large amount of time waiting for some resource.

               

Let us see an example of long waiting query. I have collected profiler trace while executing the query.

 

set statistics io on

set statistics time on

go

–Place your query here

select top 10000 * from a

go

set statistics io off

set statistics time off

go

 

 

clip_image001[15]

Look at the Duration and CPU column in the profiler Cpu=256 and duration =1920. So this query has spent majority of time waiting for some resource.

clip_image002[16]

Look at the output of statistics time and statistics I/O in above image.

SQL Server has spent only 2 milliseconds compiling the query and 256 milliseconds on CPU, but the overall duration was 1920 milliseconds so the query has spent maximum time waiting for some resource.

 

Identify the resource in which this query is waiting on using one of the steps listed below.

1. Look at the wait type column of the sysprocesses for the spid which is executing query while the query is executing.

2. If there is no other activity on the server collect sys.dm_os_wait_stats output before and after the query execution and identify the wait (Will not help in tuning queries running for short duration)

3. Collect XEvent to gather the wait stats of individual query.

 

Once you identify the resource in which the query is waiting on tune the resource. Most of the times queries would be slow waiting for below resource.

 

PAGEIOLATCH_* or Write log: This indicates I/O resource bottleneck follow the detailed troubleshooting steps mentioned in  This Link to fix the I/O bottleneck. If you find SQL Server spawning excessive I/O Create necessary indexes.

a. Logical reads + Physical reads in statistics I/O output (Refer above image) or Reads and writes in profiler will indicate the I/O posted by this query. If you see very high reads for query compared with the result rest retuned by query it is an indication of  missing indexes or bad plan. Create necessary indexes (You can use DTA for index recommendations.).

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO. 

a.This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid The wait_resource is the page number (the format is  dbid:file:pageno) that is being accessed. 

b. We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine  whether contention  is for allocation, data or text.

c. If the pages that SQL Server is most frequently waiting on are in Tempdb database ,check the wait resource column for a page number in dbid 2 Ex(2:1:1 or 1:1:2). Enable TF 1118 and increase the number of TEMPDB data files and size them  equally (You may be facing tempdb   llocation latch contention mentioned in http://support.microsoft.com/kb/328551)

d. If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table.  In this case we need to choose a different clustered index key to spread the work across different pages.

LATCH_*:    Non-buf latch waits can be caused by variety of things.  We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101). 

a. A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

b. Auto Grow and auto shrink while query is executed.

c. Queries going for excessive parallelism.

Blocking (LCK*): Use the query in This Link  to identify the blocking. Tune the head blocker.

Asynch_network_io (or) network IO: Keep the result set returned by the query smaller. Follow detailed troubleshooting refer  This Link

Resource_semaphore waits: Make sure there is no memory pressure on the server Follow steps in This Link  for detailed troubleshooting.

SQL Trace: Stop all the profiler traces running on the server. Identify the traces which are running on the server using the query in This Link

Cx packet: Set the Max degree of parallelism. But remember Cxpacket wait type is not always a problem.

a. For servers that have eight or less processors, use the following configuration where N equals the number of processors: max degree of parallelism = 0 to N .

b. For servers that use more than eight processors, use the following configuration: max degree of parallelism = 8.Refer This Link

SOS_SCHEDULER_YIELD : Identify if there is CPU bottleneck on the server. This waiting means that the thread is waiting for CPU.

a.  SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) Will ( is expected to) yield back to SQL Server scheduler when it exceeds 4ms and before it yields back it check if there are any other runnable threads, If there is any runnable threads then the thread which is in top of runnable list is  scheduled and current thread will go to the tail of the runnable list and will get rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum. The time thread spends in runnable list waiting for its quantum is accounted as SOS_SCHEDULER_YIELD. You will see this type when multiple threads are waiting to get CPU cycle. Follow trouble shooting the steps mentioned This Link

Important: In SQL Server instances when there more than 1 CPU it is possible that the CPU is higher than the duration. Because CPU is sum of time spend by query in all the CPU’s when choosing a parallel whereas the duration is actual duration of the query.

 

What is long running query?

A query is considered to be long running query, when it spend most of its time on CPU and not waiting for some resource.

How to identify if the query is long running ?

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set . If the CPU and duration is close than the query is considered to be long running. If the query is long running identify where the query spend the time ,It could be for compiling or post compilation (For executing the query). compare the duration of the query with CompileTime (XML plan compile time (or) SQL Server parse and compile time when statistics time is on refer above image).

High Compile time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on).Compile time will normally be in few millisecond . Follow the below steps if you see high compile time

1. Identify if you have large token perm refer http://support.microsoft.com/kb/927396

2. Create necessary indexes and stats. Tune the query manually (or) in DTA and apply the recommendation

3. Reduce the complexity of query. Query which joins multiple tables (or) having large number of IN clause can taking  a while to compile.

4. You can reduce the compile’s by using force parameterization option.  

High CPU time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on).  If the compile time is very low compared to the duration. Then follow the below steps.

1. Update the stats of tables and indexes used by the query (If the stats are up to date Estimated rows and estimated execution will  be approximately same in execution plan .If there is huge difference stats are out dated and requires update) .

2. Identify if the query has used bad plan because of parameter sniffing (If the ParameterCompiledValue and ParameterRuntimeValue is different in XML plan). Refer THIS LINK to know more about Parameter Sniffing

3. If updating the stats and fixing the parameter sniffing doesn’t resolve the issue it is more likely optimizer is not able to create efficient plan because of lack of indexes and correct statistics. Run the query which is driving the CPU in database tuning advisor and apply the recommendations. (You will find missing index detail in xml plan but DTA is more efficient).

4. If the query which is running longer and consuming CPU is linked server query try changing the security of linked server to ensure linked server user has ddl_admin or dba/sysadmin on the remote server. More details regarding the issue in THIS LINK.

5. Ensure optimizer is not aborting early and creating bad plan. For details refer THIS LINK

6. Ensure the query which is spiking the CPU doesn’t have plan guides (xml plan will have PlanGuideDB attribute.  Also sys.plan_guides will have entries) and query hints(index= or (option XXX join) or inner (Join Hint) join).

7.  Ensure that SET options are not changed.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

Posted in Performance, SQL General, SQL Query | Tagged: , , , , , , , , , | 11 Comments »

What is RESOURCE_SEMAPHORE_QUERY_COMPILE?

Posted by Karthick P.K on October 12, 2012

What is RESOURCE_SEMAPHORE_QUERY_COMPILE?

Before we understand the RESOURCE_SEMAPHORE_QUERY_COMPILE let us see what is compile memory.

Compile memory:  When a query is compiled in SQL Server, the compilation process needs memory (for parsing, algeberaization and optimization) called compile memory. This memory doesn’t include the memory required to execute the query.

Challenges with Compile memory:  This memory used for Query compilations are usually expected to fit into SQL Server main-memory and to be relatively short-lived.  Like any other consumers of Bpool,  this is implemented by “stealing” pages from the Buffer Pool and hence it blocks other memory consumers from using that memory until a query compilation completes.

Even if the SQL Server has enough memory to service multiple simultaneous query compilation, allowing all of them to occur at the same time might lead to stealing a significant number of pages from the buffer pool, with consequent increased physical IO , poor performance of query execution and causing memory pressure within SQL Server. However on the other side, a throttling mechanism that is too restrictive could lead to a non-optimal usage of the system resources and decreased throughput for compile intensive workloads, So SQL Server came with more dynamic approach to solve the problem  which  is to better manage system resources, memory in particular. Such management should hence be based on and driven by the amount of memory used.

Let us see it with example:

Assume SQL  Server Max server memory is set to 1000MB and Currently data /index pages is consuming 800MB with in Max server memory (bpool) and 3 queries are reaching SQL Server for compilation, each of them requiring 300 MB for compilation.

If all three queries are compiled simultaneously  total compilation might take 900MB of memory causing all the data and index pages to be dropped from BPOOL causing  consequent increased physical IO and poor performance of query during execution(to bring data pages back to memory). On the other hand let us assume each of this 3 queries need only 2 MB of compilation memory, There is no reason for SQL Server to throttle the number of compilation.

To overcome above challenges SQL 2005+ throttles the number of concurrent compiles that can happen at  any time based on memory usage during the compile. SQL Server memory grant is controlled by a object called “Resource Semaphore” and has internal mechanism to detect how much memory has been used by each compile. There are three gateways (semaphores) called the small, medium, and big gateway. When a request is received for compilation SQL Server will start compilation. There is no limit on how many queries can be compiled simultaneously, but when memory usage for a query reaches the threshold for a given gateway it will then acquire that semaphore of next gateway before continuing. The semaphores (Queries which can be compiled concurrently) are set up to allow 4*schedulers count for the small gateway, 1*schedulers count for the medium gateway and 1 (per SQL instance) for the big gateway.

The small gateway has a fixed threshold for how much memory must be consumed before you enter it.  The medium and big gateways have dynamic thresholds that vary depending on how much memory is available, how much is stolen, etc.

If the semaphore can’t be acquired then you see this wait type (Query is waiting for memory grant to compile a query =  RESOURCE_SEMAPHORE_QUERY_COMPILE wait). This behavior lets SQL Server to allow only a few memory-intensive compilations occur at the same time. Additionally, this behavior maximizes throughput for smaller queries.

clip_image001[4]

How to identify RESOURCE_SEMAPHORE_QUERY_COMPILE waits?

 

To get an idea look at the sys.sysprocesses  table for sessions waiting on  RESOURCE_SEMAPHORE_QUERY_COMPILE

select sp.*, st.text from sys.sysprocesses sp CROSS APPLY sys.dm_exec_sql_text(sp.sql_handle) AS st

WHERE sp.lastwaittype LIKE ‘RESOURCE_SEMAPHORE_QUERY_COMPILE%’ ORDER BY sp.waittime DESC;

 

clip_image002[4]

 

 

 

There could be two possible reasons for RESOURCE_SEMAPHORE_QUERY_COMPILE waits

1.       Memory pressure within SQL Server caused by others using  lot of stolen memory or OS memory pressure

In this case you will see thresholds for medium and big gateways very low. In this situation you have to identify who is consuming most of stolen memory and  tune them to increase the available memory or add additional memory that can used by SQL server. When the available memory decrease, threshold for medium and big gateways would decrease significantly and increase the number of queries which have to enter medium /big gateways, So the number of parallel compiles will decrease increasing the overall wait time.

 

This DBCC memory status output is from system which has 48 processor and has excessive RESOURCE_SEMAPHORE_QUERY_COMPILE waits because of memory pressure.

Small Gateway (default)                  Value

—————————————- ———–

Configured Units                         192        // 190 number of units .  48  CPU’s * 4=192

Available Units                            109

Acquires                                         83

Waiters                                            0

Threshold Factor                         380000

Threshold                                       380000

(6 row(s) affected)

Medium Gateway (default)                 Value

—————————————- ———–

Configured Units                         48              //48 number of units. SO 48 CPU’s *1=48

Available Units                               0             //There is no available slots. All slots are busy.

Acquires                                           48

Waiters                                             34           //34 Queries are waiting

Threshold Factor                         12

Threshold                                    2204603    //Threshold is very low 2 MB (This value is in bytes)

(6 row(s) affected)

Big Gateway (default)                    Value

—————————————- ———–

Configured Units                         1            //1 per instance

Available Units                             0            //There is no available slots. All slots are busy.

Acquires                                          1

Waiters                                          47            // 47 Queries are waiting

Threshold Factor                         8

Threshold                                3306905     //Threshold is very low 3 MB (This value is in bytes)

 

 

2.       There is huge amount of available memory but the available units in gateway is exhausted. This situation normally occurs when we have many queries that have high compile time and use lot of memory for compilation.

Compile time and Compile memory can be captures using the show plan XML for query compile event in profiler.

Below is extract from show plan XML for query compile event in profiler. This query has used approximately 150 MB of compile memory and the threshold for Big gateway is around 143 MB . So only one query which needs more than 143 MB can compile at a time , This can cause contention when there are multiple queries waiting for compile/recompile. Also an important factor to notice in this XML plan is compile time is ~139 times the CPU. So likely this query waited for most of the time for resource semaphore.

Big Gateway (default)                    Value

—————————————- ———–

Configured Units                         1

Available Units                             0

Acquires                                         1

Waiters                                          47

Threshold Factor                         8

Threshold                                149640500

 

         <QueryPlan CachedPlanSize="312" CompileTime="139847" CompileCPU="1002" CompileMemory="152320">

 

How to fix RESOURCE_SEMAPHORE_QUERY_COMPILE waits?

1.       Add additional memory to SQL Server.

2.       Reduce the number of compile and recompiles happening in SQL Server.

3.       Threshold for the gateways are dynamic (except for small gateway) and therefore memory pressure occurs from other sources (Internal to SQL Server or system wide) reduces the  amount of memory available for compiles and Queries are throttled to    higher gateways sooner. Make sure SQL Server is not starving for memory.

4.       Identify the queries which are consuming large compile memory and tune them (CompileMemory  in show plan XML query compile can be used).

5.       RESOURCE_SEMAPHORE_QUERY_COMPILE waits are very common in 32-bit SQL Server because of virtual address space limitation, so migrate to 64-Bit SQL Server.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Memory, Performance, SQL Query, SQL Server Engine | Tagged: , , , , , | 14 Comments »

SQL Server Parameter sniffing

Posted by Karthick P.K on October 8, 2012

When a stored procedure, prepared queries and queries submitted via sp_executesql  is compiled for the first time, the values of the parameters supplied with the execution call are used for cardinality estimation, to optimize the statements within that stored procedure and create the query plan. This is known as parameter sniffing because the optimizer sniffs the current parameter value during compilation.

If these values are typical and the data distribution is even in the underlying tables, all the calls to the stored procedure will benefit from this query plan since the plan is reused. However, parameter sniffing can cause problems if the "sniffed" parameter value is not typical of the values which are actually used during a typical execution or the data in underlying tables are very skewed, because plan generated for “sniffed” parameter value may not be optimal for current parameter passed and since the plan is reused there can be performance degradation.

Consider the following scenario we have a table with two columns (country and some column ). This table has 10001 rows.  10000 rows has USA in country column and 1 row has brazil in country column.

 

This table has NONCLUSTERED INDEX called NC on country column.

 

 

create table data(country char(10),somecolumn char(10))

go

insert into data values (‘BRAZIL’,‘somedata’)

go

insert into data values (‘USA’,‘somedata’)

go 10000

 

CREATE NONCLUSTERED INDEX [NC] ON [dbo].[data]

(

[country] ASC

)WITH (STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

GO

 

create proc sniffing @p1 char(10)

as

begin

select country,somecolumn from data where country=@p1

end

Go

 

 

 

–Let us execute stored procedure sniffing with the with parameter brazil.

 

exec sniffing ‘BRAZIL’

go

 

clip_image002

 

 

 

 

Optimizer picked up Index-seek in Non-clustered index and Row-ID lookup on table.

 

What happens when we execute the same procedure with parameter ‘USA’. Since the plan is already created and cached for ‘BRAZIL’ it is reused and plan which is generated for BRAZIL is Not an optimal plan for parameter USA.

exec sniffing ‘USA’ 

go

 

 

 

clip_image004

 

 

How to identify if the optimizer is using plan which compiled for sniffed parameter values  and not the current parameters value.

 

Let us enable statistics xml on

 

 

set statistics xml on

 

exec sniffing ‘USA’ 

go

 

Look at the XML plan for the ParameterCompiledValue and ParameterRuntimeValue.

 

Below is extract from XML plan and this output proves that the plan is compiled for parameter BRAZIL (ParameterCompiledValue) and it is used for parameter USA (ParameterRuntimeValue)

{

<ColumnReferenceColumn="@p1"ParameterCompiledValue="‘BRAZIL    ‘"ParameterRuntimeValue="‘USA       ‘" />

}

 

 

 

We will also see a huge difference in estimated and actual rows count if the parameter sniffing is impacting the plan

(Remember out dated stats can also cause optimizer to estimate incorrect rows so difference in estimate and actual rows doesn’t mean it is because of parameter sniffing ). 

 

What would have been the optimal plan if the parameter ‘USA’?

 

 

Let us execute the same procedure with recompile option

 

 

exec sniffing ‘USA’ with recompile

go

 

 

clip_image006

 

 

 

 

How to fix Parameter sniffing?

 

1. USE RECOMPILE: when you create the stored procedure. so the parameter is compiled every time it is called. This method can be used if the compile time is very less compared to execution time of bad plan

   Ex: create proc sniffing @p1 char(10) with recompile

 

2.  OPTION (RECOMPILE): for the statement which impacted by the parameter sniffing.  If the procedure has multiple statements recompile will impact only the particular statement.

 

3.  OPTIMIZE FOR HINT: Instructs the query optimizer to use a particular value for a local variable when the query is compiled and optimized (or)  OPTIMIZE FOR  UNKNOWN WHICH Instructs the query optimizer to use statistical data

 

instead of the initial values for all local variables when the query is compiled and optimized.  This value is used only during query optimization and actual values are used during execution.

{

alter proc sniffing @p1 char(10)  as
begin
select country,somecolumn from data where country=@p1
option (optimize for (@p1 = ‘USA’))

–option (optimize for (@p1 unknown))
end
}

 

4.   Assign the incoming parameter values to local variables and use the local variables in the query.  If you are in SQL Server2000 in which we don’t have OPTIMIZE FOR hint.

      Ken Henderson has blogged about it in http://blogs.msdn.com/b/khen1234/archive/2005/06/02/424228.aspx

 

5.   Trace Flag 4136 which is introduced in SQL Server 2008 R2 Cumulative Update 2, SQL Server 2008 SP1 Cumulative Update 7 and SQL Server 2005 SP3 Cumulative Update 9 introduce trace flag 4136 that can be used to disable  the "parameter sniffing" process  more details on http://support.microsoft.com/kb/980653

 

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Optimizer, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , | 6 Comments »

Optimizer Timeout or Optimizer memory abort

Posted by Karthick P.K on October 7, 2012

Optimizer Timeout

When the query processor finds itself consuming a lot of time optimizing a query, it may decide to stop the optimization process abruptly, and choose the best available plan. This is to ensure that the optimizer doesn’t end up optimizing forever. This is  called optimizer timeout (based on the number of plans considered relative to the cost of the best plan so far).

Optimizer memory abort

When queries become more complex number of potential plans to consider can quickly grow in thousands. Optimizer has limit for memory it is allowed to use , when the optimizer reaches the limit it ends with  optimizer memory abort.

When  timeout or memory abort happens optimizer might choose the best plan  from plans which was generated till timeout or abort and it might be far from optimal plan so the query execution can take long time and consume resource.

On SQL 2000 and earlier the only way to detect this condition is compiling the query with trace flag 8675.   If one of these conditions occur the output will reflect a timeout abort or memory abort, similar to the following:

 

End of simplification, time: 2.869 net: 2.869 total: 2.869 net: 2.869

end exploration, tasks: 200094 no total cost time: 16.17 net: 16.169 total: 19.04 net: 19.039

*** Optimizer time out abort at task 614400 ***

*** Optimizer time out abort at task 614400 ***

 

Msg 8623, Level 16, State 1, Line 3

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

End of simplification, time: 0.156491 elapsed: 0.156491

end exploration, tasks: 1614 no total cost time: 0.552436 elapsed: 0.708927

end search(0),  cost: 1275.32 tasks: 3888 time: 0.195008 elapsed: 0.903935

end exploration, tasks: 7596 Cost = 1275.32 time: 0.548032 elapsed: 1.45197

end search(1),  cost: 1263.15 tasks: 21985 time: 2.30564 elapsed: 3.75761

*** Optimizer memory usage abort ***

End of optimization,  elapsed: 2.98304

From SQL server 2005 to determine whether the query optimizer times out or MemoryLimitExceeded search for the 
StatementOptmEarlyAbortReason="TimeOut" (or) StatementOptmEarlyAbortReason="MemoryLimitExceeded" expression in the XML plan output.

 

We can avoid optimizer from timing out and picking bad plan by enabling trace flag –T8780. This increases the time limit before the timeout occurs.

Note: Don’t enable this trace flag at server level , enable it only for the session which runs the query and identify if the optimizer is picking up a better plan. If you see optimizer picking up the better plan, right approach is to tune the query manually or using DTA and apply the recommendations . You can use this trace flag till you apply the recommendations made by DTA.

 

If you experience “ Optimizer memory usage abort” use “SQLServer:Memory Manager\Optimizer Memory (KB)” counter to the amount used for compilation .

select * from sys.dm_os_memory_clerks where type=’MEMORYCLERK_SQLOPTIMIZER’ will tell us the overall memory used by optimizer.

We can also use the CompileMemory= expression in XML plan output starting from SQL server2005 SP2 which will give us the compile memory used by individual plans. If you find optimizer memory is very low then identify what is contributing to memory contention in SQL Server and tune it.

I will discuss compile memory in detail when I blog about Resource_semaphore  wait types.

 

Note: You may also receive below error because of few known issues documented in KB articles 982376, 946020,926773,917888 so if none of the fixes resolve the issue you may have to follow the same steps documented above.

{

"Msg 8623, Level 16, State 1, Line 1

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information."

}

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Performance, SQL General, SQL Query, SQL Server Engine, SQL Server memory | Tagged: , , , , , , , , , , | 24 Comments »

Troubleshooting SQL Server high CPU usage

Posted by Karthick P.K on October 4, 2012

Troubleshooting SQL Server high CPU usage

 

First thing to determine when there is High CPU on systems is, if SQL server is consuming the CPU resource or other applications/service.

 

Use query in  THIS LINK to get CPU usage history (or) Task manager (or) Perfmon counter to determine that. In Perfmon, Process %Process time can also be used. Remember this counter is not based on 100%.  It is based on number of processor.  If you see 200 for sqlservr.exe and the system has 8 CPU, CPU consumed by sqlservr.exe is 200 out of 800 (only 25%).)

 

If the CPU spike is caused by other application involve application team.

 

Next step is to determine if the CPU consumed is kernel time or user time.

 

We can use Process %Privileged  time and %user Time counters in perfmon. Task manager will show kernel times which will also help us understand

 

Kernel CPU:  In general, if kernel CPU remains below 10%, it’s normal.  But if you see sustained kernel CPU at 30% or above, you should start looking at system drivers , Antivirus etc.  some known issues which can increase Kernel CPU time are

1.       Few Anti-virus software’s can cause high kernel time.  Temporarily disable anti-virus software to rule this out

 

2.       We have seen high resolution timer in SQL 2008 or SQL 2005 SP3 caused high kernel time in Virtual Machines because of outdated BIOS .  Temporarily disabling high resolution timer by turning on trace flag 8038 (configure as startup parameter) to prove this. Check for BIOS update and do not use 8038 in long term.

 

High user CPU: Some of the most common causes for High CPU in SQL Server  are

1.       Query execution causing CPU spike (Most commonly caused by optimizer picking bad plan).

 

2.       High compiles and recompiles. (schema, Stats change, Use of Temp table, Recompile hint).

 

3.       System threads spiking CPU (Ghost cleanup, Lazy writer, Resource monitor).

 

4.       Running many traces.

 

 

1. Query execution causing CPU spike:

 

Query execution  takes long times and spikes CPU commonly because of in-correct cardinality estimates caused by outdated statistics, Lack of Index, Server configuration, Distributed queries, etc.

 

When the server is experiencing this problem run the query in below link to list all the queries which are executing in the server order by CPU time desc along with plan.

{

Get SQL Text and Query Plan for statements which are executing now

}

 It could be one query which is driving the majority CPU time or Multiple queries each driving the CPU. Look at the CPU time of the above query output.

 

If it is single query/Store procedure which is driving the majority of CPU.

 

1.        Update the stats of tables and indexes used by the query (If the stats are up to date Estimated rows and estimated execution will  be approximately

same in execution plan .If there is huge difference stats are out dated and requires update) .

 

2.       Identify if the query has used bad plan because of parameter sniffing (If the ParameterCompiledValue and ParameterRuntimeValue is different in XML plan). Refer THIS LINK to know more about Parameter Sniffing

 

3.        If updating the stats and fixing the parameter sniffing doesn’t resolve the issue it is more likely optimizer is not able to create efficient plan because of lack of indexes and correct statistics. Run the query which is driving the CPU in database tuning advisor and apply the recommendations. (You will find missing index detail in xml plan but DTA is more efficient). You can follow the steps in Tune queries using SQL Server Database tuning advisor .

 

4.       If the query which is spiking the CPU is linked server query try changing the security of linked server to ensure linked server user has ddl_admin or dba/sysadmin on the remote server. More details regarding the issue in THIS LINK.

 

5.       Ensure optimizer is not aborting early and creating bad plan. For details refer THIS LINK.

 

6.       Ensure the query which is spiking the CPU doesn’t have plan guides (xml plan will have PlanGuideDB attribute.  Also sys.plan_guides will have entries) and query hints(index= or (option XXX join) or inner (Join Hint) join).

 

7. Ensure that SET options are not changed.

 

If it is Multiple queries/Store procedure are driving the CPU together.

 

1.       Update the stats of all the tables and indexes in the database. Using the query in below link Rebuild index and update statistics for all the tables in database

 

2.       If updating stats doesn’t help and rebuilding the indexes doesn’t bring down the CPU we have to tune the queries 1 by 1.

 

3.       Ensure Large amount of RAM is not causing optimizer to choose inefficient plan http://support.microsoft.com/kb/2413549

 

4.       Ensure that we do not run many traces at same time (commonly from monitoring tools). Use query in below link to list all the active traces.

{

Find all the profiler traces running on SQL Server

}

 

2. If the system thread is consuming most of the CPU.

 

1.       If none of the SQL queries are consuming majority of CPU,  we can identify if the back ground threads is consuming the majority of CPU by looking at  sysprocesses output for background threads. select * from sys.sysprocesses where spid<51.

 

2.       Check if you are hitting any of the known issues.

{

Resource Monitor may consume high CPU: http://support.microsoft.com/kb/968722

The Ghost Cleanup task uses 100% of the CPU on an idle system in SQL Server 2008 or in SQL Server 2005: http://support.microsoft.com/?id=978430

}

 

3. High compiles and recompiles: I will blog about high compiles and recompiles shortly

                               

4. Other factors which can impact SQL Server query performance

1.       Maximum degree of parallelism. Ensure MAX DOP is set properly (you can follow the steps in How to set Max degree of parallelism (MAXDOP)

 

2.       Priority boost. (Do not enable priority boot)

 

3.       Do not enable Fiber mode.

 

4.       Tweaking affinity mask (Spikes few CPU).

 

5.       TokenAndPermUserStore. http://support.microsoft.com/kb/927396

 

6.       CPU power plan degrade the server performance http://support.microsoft.com/kb/2207548

 

7.       SQL Server that’s uses  .Net Framework can cause high CPU Refer THIS LINK

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Related blogs: 

Tuning SQL Server query

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Configuration, Optimizer, Performance, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , , , , | 28 Comments »

I/O requests taking longer than 15 seconds to complete on file

Posted by Karthick P.K on August 27, 2012

Do you see warnings like one below in your SQL Server error log?

SQL Server has encountered  x occurrence(s) of I/O requests taking longer than 15 seconds to complete on file .

The OS file handle is 0x000006A4. The offset of the latest long I/O is: 0x00000

(or)

BobMgr::GetBuf: Sort Big Output Buffer write not complete after n seconds.

This indicates SQL Server I/O Bottlenecks. SQL Server performance highly relies on the Disk performance.  SQL Server I/O Bottleneck can be identified through

1. PAGEIOLATCH_xx or WRITELOG wait types in Sys.Sysprocesses and other DMV’s

2. I/O taking longer than 15 seconds in SQL Server Error log.

{

SQL Server has encountered X occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [ ] in database [IOTEST (7). The OS file handle is 0x000006A4. The offset of the latest long I/O is:
0x000001

}

3. By looking at I/O latch wait statistics in sys.dm_os_wait_stats

{

Select  wait_type,         waiting_tasks_count,         wait_time_ms  from    sys.dm_os_wait_stats where    wait_type like ‘PAGEIOLATCH%’
order by wait_type

}

4. By looking at pending I/O requests and isolating the disks,File and database in which we have I/O Bottleneck.

{

select     database_id,     file_id,     io_stall,     io_pending_ms_ticks,     scheduler_address from    sys.dm_io_virtual_file_stats(NULL, NULL)t1,         sys.dm_io_pending_io_requests as t2
where    t1.file_handle = t2.io_handle

}

Following are common reasons for I/O Bottleneck in SQL Server.

1. SQL Server is spawning more I/O requests than what I/O disk subsystem could handle.

2 . There could be an Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem (or) Compression and  so the Disks are performing very slow and hence SQL Server is affected.

3. Some other process on the system is saturating the disks with I/O requests. Common application includes AV Scan,System Backup Etc. So I/O requests posted by SQL Server becomes slow.

 

How to  troubleshoot?

1.  Exclude SQL Server files from antivirus scan.

2. Do not place SQL Server FILES on compressed drives.

3. Distribute SQL Server data files and transaction log files across drives.

4. If the “I/O request taking longer” warning is for tempdb , Enable trace flag 1118 and increase the tempdb data files refer:http://support.microsoft.com/kb/2154845

5. If none of the above resolves the issue collect the below perfmon counters.

 

Perfmon counters can help us in understanding “If disk is slow” or  “SQL Server is spawning more I/O then what disk could handle” or “Some other process is saturating disk with I/O”

Note:It is important to get  throughput of the disk subsystem in MB/SEC before we look at disk counters. Normally it will be more than 150 MB for SAN disk and greater 50 MB for Single disk .When you look at the perfmon counter look at Max value.

 

Avg. Disk sec/Transfer –> Time taken to perform the I/O operation

Ideal value for Disk sec/Transfer is 0.005-0.010 sec. If you consistently notice this counter is beyond 0.015 then there is a serious I/O bottleneck.

1. Look for Disk Bytes /sec when Avg. Disk sec/Transfer  is greater than 0.015. If it is below 200 MB for SAN disk and Below 50 MB for Single disk then the problem is with I/O subsystem Engage hardware vendor.

2. If the Disk Bytes /sec  is greater than  200 MB for SAN disk or greater than 50 MB for Single disk when the  Avg. Disk sec/Transfer  is greater than 0.015. Look at the Process:IO Data Bytes/Sec for the same time and identify which process is spawning I/O. If the identified process is not SQL Server involve the team which supports that process. If the  the identified process is SQL Server tune SQL Server queries which are I/O intensive by creating dropping indexes etc.

 

Disk Bytes /sec  –> Total read and write to disk per second in bytes.

Collect the values for each logical disks in which SQL Server files are placed and look at the Max value for this counter ideally it has to be greater than the throughput of the disk subsystem. If you don’t have the throughput for the disk then this value to be greater than 200MB for SAN or greater than 50 MB for single disk.

If it is below the expected value you can consider that your disks are not performing well. Involve the hardware vendor. 

Important: Value for this counter will be low when there is no I/O happening on the drives. So you have to look at the this counter during the time you see I/O warnings or When Disk sec/Transfer >0.010 for the same drive. 

 

Process:IO Data Bytes/Sec –> Total read and write to disk per second in bytes by each process.

Collect this counter for all the processes running on the server. This counter will help us understand if any other process is saturating the disk with excessive I/O.

Example: Let us consider a disk with max throughput of 250MB per second. If antivirus is spawning 200MB of I/O per second and if SQL Server data files are placed in same drive and SQL Server is spawning 150MB obviously there will be I/O waits.

 

Buffer Manager: Page Read/sec + Page Writes/sec –>Total read and write to disk per second in bytes by SQL Server process.

Note: If you are analyzing the .BLG file collected and not live perfmon focus on Maximum value for each counter don’t look at average.  

 

If (Avg. Disk sec/Transfer> ==0.015 ) and ( (Disk Bytes /sec < 150MB (For San)) or (Disk Bytes /sec < 50MB (For Local) or (Disk Bytes /sec < Speed of disk as per Vendor ))

{

There is Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem.

}

If (Disk sec/Transfer > ==0.015 Consistently) and ( (Disk Bytes /sec >= 150 (For San)) or (Disk Bytes /sec >= 50MB (For Local) or (Disk Bytes /sec >= Speed of disk as per Vendor ))

{

Identify the process which is posting excessive I/O request using Process:IO Data Bytes/Sec.

If ( Identified process == SQLServer.exe )

{

Identify and tune the queries which is Spawning excessive I/O.

(Reads+Writes column in profiler, Dashboard reports or sys.dm_exec_query_stats and sys.dm_exec_sql_text

can be used to identify the query). Use DTA to tune the query

}

If ( Identified process != SQLServer.exe )

{

Engage the owner of application which is spawning excessive I/O

}

}

Many thanks to Joseph Pilov from whom I learned many techniques like the one above.

 

If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K  |My Facebook Page |My Site| Blog space| Twitter

Posted in Performance, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , | 22 Comments »

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

Posted by Karthick P.K on August 21, 2012

Do you see below errors in SQL error along with dumps and stuck?

Non-yielding IOCP Listener

* BEGIN STACK DUMP:
*   05/06/12 03:54:59 spid 0
* Non-yielding IOCP Listener

Non-yielding Scheduler
* BEGIN STACK DUMP:
*   04/16/12 10:09:58 spid 6256
* Non-yielding Scheduler

Non-yielding Resource Monitor

* BEGIN STACK DUMP

*   01/22/09 19:11:16 spid 0

* Non-yielding Resource Monitor

External dump process returned no errors.
Date Time Server Process 0:0:0 (0x31e8) Worker 0x000000016F41d140 appears to be non-yielding on Scheduler 4. Thread creation time: 12010668087858. Approx Thread CPU Used: kernel 2 ms, user 60516 ms. Process Utilization 11%. System Idle 83%. Interval: 71227 ms.

Refer “How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps”  for analyzing the Non-yielding Scheduler, Non-yielding IOCP Listener and Non-yielding Resource Monitor Dumps.

If you are interested in just finding a quick resolution follow the below steps to get the Non-Yield stack from the dump and check if it is matching with any existing known issues in SQL Server. 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg

step 2:

Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 3:

on command window type    
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 4:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 5:

Type  .cxr sqlservr!g_copiedStackInfo+0X20   for SQL Server2005 and SQL Server2008/2008R2  (or)   .cxr sqlmin!g_copiedStackInfo+0X20  for SQL Server2012.

Type kc 100 and look at the stack to see if it matches with the stack of any of known issues in SQL Server listed below.

If kc 100 doesn’t display any stack and throws “WARNING: Frame IP not in any known module. Following frames may be wrong”  type .cxr to reset to default scope and try  .cxr sqlservr!g_copiedStackInfo+0X00c (In 32-Bit (X86)  SQL server valid offset for context is 0X00c Look at This blog to see how we identified the offset)

Note:If your stack doesn’t match with any of the stack  listed below then paste the stack in comments session of this blog (or) In This face book group.We will try to find the cause for you. If you don’t get any prompt reply from the community, you may need to open a support ticket with Microsoft.

 

Stack 1

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptExpr::DetachPointersIntoMemo

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!ExecuteSql

sqlservr!CSpecProc::ExecuteSpecial

sqlservr!CXProc::Execute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB :2344600:FIX: "Non-yielding Scheduler" error may occur when you use the CONTAINSTABLE function together with many OR and AND predicates in SQL Server 2008 or in SQL Server 2008 R2

Stack 2

sqlservr!TMatchPattern

sqlservr!FMatchStrTxt

sqlservr!I8CharindexStrBhI8

sqlservr!CEs::GeneralEval4

sqlservr!CXStmtCond::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands 0x12a

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

2633357 FIX: "Non-yielding Scheduler" error might occur when you run a query that uses the CHARINDEX function in SQL Server 2008 R2

Stack 3

sqlservr!CItvlVal::Copy

sqlservr!CConstraintItvl::PcnstrItvlUnion

sqlservr!CConstraintProp::FBuildItvlFromOr

sqlservr!CConstraintProp::FBuildItvlFromPexpr

sqlservr!CConstraintProp::FAndItvlConstraint

sqlservr!CConstraintProp::AndNewConstraint

sqlservr!CConstraintProp::PcnstrDeriveSelect

sqlservr!CLogOp_Select::PcnstrDerive

sqlservr!CLogOpArg::PcnstrDeriveHandler

sqlservr!CLogOpArg::DeriveGroupProperties

sqlservr!COpArg::DeriveNormalizedGroupProperties

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!COptExpr::DeriveGroupProperties 0xc6

sqlservr!COptExpr::DeriveGroupProperties

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtDML::InitNormal

sqlservr!CStmtDML::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

KB: 982376 FIX: A non-yielding scheduler error or an error 8623 occurs when you run a query that contains a large IN clause in SQL Server 2005,SQL Server 2008, or SQL Server 2008 R2

Stack 4

sqlservr!COptExpr::AdjustParallelPlan

sqlservr!COptContext::PcxteOptimizeQuery

sqlservr!CQuery::Optimize

sqlservr!CQuery::PqoBuild

sqlservr!CStmtQuery::InitQuery

sqlservr!CStmtSelect::Init

sqlservr!CCompPlan::FCompileStep

sqlservr!CSQLSource::FCompile

sqlservr!CSQLSource::FCompWrapper

sqlservr!CSQLSource::Transform

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB: 943060 FIX: A query that has many outer joins takes a long time to compile in SQL Server 2005

Stack 5

sqlservr!CXid::GetBlockingTask

sqlservr!SNode::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchForDeadlock

sqlservr!DeadlockMonitor::SearchAndResolve

sqlservr!DeadlockMonitor::SearchTaskAndResolve

sqlservr!DeadlockMonitor::WorkLoop

sqlservr!lockMonitor

sqlservr!lockMonitorThread

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

956854 Cumulative update package 10 for SQL Server 2005 Service Pack 2

Stack 6

ntdll!ZwQueryAttributesFile

ntdll!RtlDoesFileExists_UstrEx

ntdll!LdrpSearchPath

ntdll!LdrpCheckForLoadedDll

ntdll!LdrpLoadDll

ntdll!LdrLoadDll

kernel32!LoadLibraryExW

mswsock!SockLoadHelperDll

mswsock!SockGetTdiName

mswsock!SockSocket

mswsock!WSPSocket

ws2_32!WSASocketW

ws2_32!WSASocketA

sqlservr!CreateSocket

sqlservr!AcceptObject::AsyncAccept

sqlservr!Tcp::AcceptDone

sqlservr!SNIAcceptDoneWithReturnCode

sqlservr!SNIAcceptDoneWrapper

sqlservr!SNIAcceptDoneRouter

sqlservr!SOS_Node::ListenOnIOCompletionPort

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 2711549 FIX: An error message is logged when you start SQL Server 2008 R2 or when a client sends a request to SQL Server 2008 R2

Stack 7

ntdll!ZwOpenKey

advapi32!LocalBaseRegOpenKey

advapi32!RegOpenKeyExW

sqlservr!COledbConnect::GetProviderOptions

sqlservr!COledbConnect::SetClsidFromProvider

sqlservr!COledbConnect::Init

sqlservr!CStmtExecProc::XretRemoteExec

sqlservr!CRemoteProcExecLevel::Execute

sqlservr!CStmtExecProc::XretWrapRemoteExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExec::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

KB2468047 FIX: Error code 17883 or "Non-yielding Scheduler" error may occur when you use the OPENQUERY function on SQL Server 2005

Stack 8

ntdll!ZwQueryVirtualMemory

psapi!QueryWorkingSetEx

sqlservr!BPool::Shrink

sqlservr!BPool::ReleaseAwayBufs

sqlservr!BPool::LazyWriter

sqlservr!lazywriter

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

967908 Cumulative update package 13 for SQL Server 2005 Service Pack 2 or 970279 Cumulative update package 4 for SQL Server 2005 Service Pack 3

Stack 9

sqlservr!LatchBase::ReleaseInternal

sqlservr!XVB::GetRecord

sqlservr!RowsetVersionScan::GetData

sqlservr!CQScanRowsetNew::GetRowWithPrefetch

sqlservr!CQScanRowsetNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRow

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::InitForExecute

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CExecuteStatement::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecStr::XretExecStrExecute

sqlservr!CXStmtExecStr::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB : 949595 FIX: Error message when you run a query that uses a join condition in SQL Server 2005: "Non-yielding Scheduler"

Stack 10

sqlservr!SQLServerLogIter::LookupScanCache

sqlservr!SQLServerLogIterForward::GetNextBlock

sqlservr!SQLServerLogIterForward::GetNext

sqlservr!LsMgr::GetEndOfLog

sqlservr!LsMgr::ProcessInternalRollForward

sqlservr!LsWorkRequest::Execute

sqlservr!LsWorker::ThreadRoutine

sqlservr!LsWorker::ThreadRoutine

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadStart

KB 970044 FIX: Error message when you try to break database mirroring between two servers that are running SQL Server 2008: "Non-yielding Scheduler"

Stack 11

sqlservr!CLinkedMap

sqlservr!CCheckReadersAndWriters::Release

sqlservr!CMainIlb::~CMainIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 12

ntdll!ZwFreeVirtualMemory

KERNELBASE!VirtualFree

sqlservr!MemoryNode::VirtualFree

sqlservr!ReservedMemBlock::FreeMemory

sqlservr!MultiPageAllocator::FreePagesInternal

sqlservr!MultiPageAllocator::FreePages

sqlservr!MemoryNode::FreePagesInternal

sqlservr!MemoryClerkInternal::FreePagesInline

sqlservr!CVarPageMgr::Release

sqlservr!CMemObj::Free

sqlservr!CMemThread<CMemObj>::Free

sqlservr!LockBytesSS::~LockBytesSS

sqlservr!LockBytesHolder::`scalar deleting destructor’

sqlservr!LockBytesHolder::DestroyCallback

sqlservr!CacheLbss

sqlservr!LockBytesSS::Release

sqlservr!CQueryIlb::~CQueryIlb

sqlservr!CBlobHandleFactoryMain::ReleaseILockBytes

sqlservr!CMainIlb::Release

sqlservr!CTraceRpcBinaryStream::~CTraceRpcBinaryStream

sqlservr!CTraceTvpData::~CTraceTvpData

sqlservr!CRpcTraceHelper::CleanUpTraceTvpData

sqlservr!CRpcTraceHelper::TracePostExec

sqlservr!CRPCExecEnv::OnExecFinish

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!endthreadex

msvcr80!endthreadex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2520808 FIX: Non-yielding scheduler error when you run a query that uses a TVP in SQL Server 2008 or in SQL Server 2008 R2 if SQL Profiler or SQL Server Extended Events is used

Stack 13

sqlservr!CompareStringWEnglishNoCase

sqlservr!CTypeInfo::ICompW

sqlservr!CDefaultCollation::ICompW

sqlservr!CDependElem::ICompare

sqlservr!CDependList::Find

sqlservr!CDependList::Insert

sqlservr!CDependList::Concat

sqlservr!CDependList::CollectDependencies

sqlservr!FillSysdepends

sqlservr!CProchdr::CreateProc

sqlservr!CSQLSource::PerformPphFakeExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

KB 2306162 FIX: Poor performance and some occasional non-yielding scheduler errors occur when you create a complex view that references a large amount of nested views or tables in SQL Server 2008 or in SQL Server 2008 R2

Stack 14

sqlservr!BaseSharedHoBt::GetHoBtId

sqlservr!HoBtFactory::GetDeferredDropCacheHobt

sqlservr!DropDeferredWorkTables

sqlservr!GhostRecordCleanupTask

sqlservr!CGhostCleanupTask::ProcessTskPkt

sqlservr!TaskReqPktTimer::ExecuteTask

sqlservr!OnDemandTaskContext::ProcessTskPkt

sqlservr!SystemTaskContext::ExecuteFunc

sqlservr!SystemTaskEntryPoint

sqlservr!OnDemandTaskContext::FuncEntryPoint

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SchedulerManager::FiberEntryPoint

kernel32!BaseFiberStart

kernel32!RtlCompareMemoryStub

KB 2505256 FIX: Poor performance when worktables that are marked for deferred drop are cleaned up in SQL Server 2008 R2

Stack 15

ntdll!ZwReadFile

kernel32!ReadFile

sqlservr!DiskReadAsync

sqlservr!FCB::AsyncRead

sqlservr!BackupIoRequest::StartDatabaseRead

sqlservr!BackupCopyMachine::CopyFileToBackupSet0

sqlservr!BackupCopyMachine::CopyFileToBackupSet

KB 960543 FIX: SQL Server 2005 or SQL Server 2008 may stop responding when you are performing a backup

Stack 16

sqlservr!Worker::ProfilingCPUTicks::ProfilingCpuTicksCallback

sqlservr!SOS_Scheduler::TaskTransition

sqlservr!SOS_Scheduler::Switch

sqlservr!SOS_Scheduler::SuspendNonPreemptive

sqlservr!SOS_Scheduler::Suspend

sqlservr!SOS_Task::Sleep

sqlservr!BTreeMgr::Seek

sqlservr!BTreeMgr::GetHPageIdWithKey

sqlservr!IndexPageManager::GetPageWithKey

sqlservr!GetRowForKeyValue

sqlservr!IndexRowScanner::EstablishInitialKeyOrderPosition

sqlservr!IndexDataSetSession::GetNextRowValuesInternal

sqlservr!RowsetNewSS::GetNextRows

sqlservr!CMEDScan::FGetRow

sqlservr!CMEDCatalogOwner::GetOwnerAliasIdFromSid

sqlservr!CMEDCatalogOwner::LookupPrimaryIdInCatalog

sqlservr!CMEDCacheEntryFactory::GetProxiedCacheEntryByAltKey

sqlservr!CMEDCatalogOwner::GetProxyOwnerBySID

sqlservr!CMEDProxyDatabase::GetOwnerBySID

sqlservr!GetDefaultSchemaIdCrossDb

sqlservr!GetCtxtSchemaId

sqlservr!CMEDAccess::GetMultiNameObject

sqlservr!CRangeObject::CImplName::FSameObject

sqlservr!CRangeObject::FCheckImplNames

sqlservr!CRangeObject::XretPostSchemaChecks

sqlservr!CRangeObject::XretSchemaChanged

sqlservr!CRangeTable::XretSchemaChanged

sqlservr!CEnvCollection::XretSchemaChanged

sqlservr!CXStmtQuery::XretSchemaChanged

sqlservr!CXStmtSelect::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtExecProc::XretLocalExec

sqlservr!CStmtExecProc::XretExecExecute

sqlservr!CXStmtExecProc::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

KB 2699013 FIX: SQL Server 2008 R2 or SQL Server 2008 stops responding and a "Non-yielding Scheduler" error is logged

Stack 17

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQScanNLJoinNew::GetRowHelper

sqlservr!CQueryScan::GetRow

sqlservr!CXStmtQuery::ErsqExecuteQuery

sqlservr!CXStmtCondWithQuery::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<1,0>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CXStmtDML::FExecTrigger

sqlservr!CXStmtDML::FExecAllTriggers

sqlservr!CXStmtDML::XretDMLExecute

sqlservr!CXStmtDML::XretExecute

sqlservr!CExecStmtLoopVars::ExecuteXStmtAndSetXretReturn

sqlservr!CMsqlExecContext::ExecuteStmts<0,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!CStmtPrepQuery::XretExecute

sqlservr!CMsqlExecContext::ExecuteStmts<1,1>

sqlservr!CMsqlExecContext::FExecute

sqlservr!CSQLSource::Execute

sqlservr!process_request

sqlservr!process_commands

sqlservr!SOS_Task::Param::Execute

sqlservr!SOS_Scheduler::RunTask

sqlservr!SOS_Scheduler::ProcessTasks

sqlservr!SchedulerManager::WorkerEntryPoint

sqlservr!SystemThread::RunWorker

sqlservr!SystemThreadDispatcher::ProcessWorker

sqlservr!SchedulerManager::ThreadEntryPoint

msvcr80!_callthreadstartex

msvcr80!_threadstartex

kernel32!BaseThreadInitThunk

KB 967169 FIX: When you run an UPDATE statement against a table that has a FOR UPDATE trigger that joins the DELETED and INSERTED tables, the query takes a long time to finish

Stack 18

msvcr80!memcpy

BackupString::vswcatf

BackupString::swcatf

BackupHistory::GenerateBackupDetails

sqlservr!BackupHistory::GenerateBackupSet

KB 917971 FIX: You may receive more than 100,000 page faults when you try to back up a SQL Server 2005 database that contains hundreds of files and file groups.

Stack 20

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

Stack 21

mswsock!SockCloseSocket
mswsock!WSPCloseSocket
ws2_32!closesocket
sqlservr!Tcp::FCloseRefHandle
sqlservr!Tcp::Close
sqlservr!Smux::InternalClose
sqlservr!Smux::ReadDone

"Non-yielding Scheduler" error and SQL Server 2008 or SQL Server 2008 R2 stops responding intermittently in Windows Server 2008 or in Windows Server 2008 R2

 

Related blogs: 

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

SQL Server Latch & Debugging latch time out

How to Analyze "Deadlocked Schedulers" Dumps

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group to get answers for all your SQL Server related questions.

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Debugging, Performance, SQL General, SQL Server Tools | Tagged: , , , , , , , , , , , , , , , , , , , , , , , , , , , | 161 Comments »

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps ……

Posted by Karthick P.K on August 17, 2012

Note: If you are interested only in finding a quick resolution for  Non-Yielding scheduler or Non-yielding IOCP Listener dumps or  Non-yielding  resource monitor Jump to THIS LINK. Continue reading this article if you like to learn How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps

How to analyze Non-Yielding scheduler dumps and Non-yielding IOCP listener dumps?

This blog is targeted towards analyzing Non-Yielding scheduler dumps and not targeted on explaining how Non-Yield Detection works please read http://technet.microsoft.com/en-us/library/cc917684.aspx to understand how the Non-Yield Detection works but let us recollect few key points before we get in to analysis.

 

1.   SQL Server has its own logical schedulers to schedule the SQL Server workers.

2.   The scheduler is called the User Mode Scheduler (UMS) in SQL Server 2000 and the SQL Server Operating System (SOS) Scheduler in SQL Server 2005

3.   Logical scheduler makes the worker non-preemptive to the database engine. The worker owns the scheduler until it yields to another worker on the same scheduler.

 

What if the threads which owns the scheduler executes for long time without yielding (or) forever and does not yield to give quantum for the other threads waiting in the scheduler?

Answer: Other threads would not get CPU cycles and starve the SQL Server performance.

 

What if the thread is not able to finish its work with in quantum –(4 Milliseconds) for example large for loop?

SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) is expected to yield back to SQL Server scheduler when it exceeds 4ms and rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum.

 

What if the thread did not yield after 4 Milliseconds?

SQL Server has its scheduler monitor to track this. SchedulerMonitor algorithm is to check non-Yield condition every 5 seconds during which the basic check (Check if the thread is executing for >4Ms) is done . When the basic check evaluates to true, tracking of the worker begins and if the thread doesn’t yield beyond 10seconds (Nonyield threshold) after the tracking begins then threshold check becomes true. So there is approximately 15 seconds between the time of the last yield on the scheduler and the time that the threshold check becomes true and tracking continues.

 

A dump is taken when an specific nonyield situation has reached 60 seconds in total duration. Once a 17883 mini-dump is captured, no further 17883 mini-dumps are captured until trace flag -T1262 is enabled or the SQL Server process is restarted. However, 17883 error message reporting continues, regardless of the mini-dump capture.  Also when –T1262 is enabled mini-dump is captured when the Non-Yield threshold check becomes true (15 seconds)

and at subsequent 60-second intervals for the same nonyield occurrence. A new nonyielding occurrence causes dump captures to occur again.

When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

Non-yielding IOCP Listener

An identical algorithm is used to detect non-yielding I/O completion routines, counting completed I/O completion routines instead of number of yields. Scheduler Monitor takes a dump when it notices the IOCP has not moved for 10 seconds. Analyzing Non-yielding IOCP Listener is also same as analyzing non-yielding scheduler dump

 

 

Let us step in to analysis of  non-yielding scheduler dump which I got in SQL Server 2012

 

Sample 1

When a non-yielding scheduler dump is generated following error message is logged in SQL Error log and SQLDump000n.mdmp is generated in log folder.

 

{

    * *******************************************************************************

    *

    * BEGIN STACK DUMP:

    *   04/16/12 10:09:58 spid 6256

    *

    * Non-yielding Scheduler

    *

     * *******************************************************************************

Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0. Thread creation time: 12979065797278. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 0%. System Idle 97%. Interval: 70110 ms.

}

 

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

 

Microsoft (R) Windows Debugger Version 6.11.0001.404 X86

Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [C:\Users\karthick \Desktop\Karthick\SQLDump0009.mdmp]

User Mini Dump File: Only registers, stack and portions of memory are available

 

Comment: ‘Stack Trace’

Comment: ‘Non-yielding Scheduler’  èType of the dump

Symbol search path is: *** Invalid ***

 

 

Executable search path is:

Windows 7 Version 7601 (Service Pack 1) MP (24 procs) Free x64

Product: Server, suite: Enterprise TerminalServer SingleUserTS  à Windows version and system information

Machine Name:

Debug session time: Mon Apr 16 09:09:59.000 2012 (GMT-7)

System Uptime: 9 days 15:57:03.155

Process Uptime: 0 days 0:06:48.000

……………………………………………………….

……………………………..

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

 

0:146> lmvm sqlservr                                               

start             end                 module name

00000000`ffad0000 00000000`ffb0e000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21553ADC31784A4D933974A386EE2E052\sqlservr.pdb

    Loaded symbol image file: sqlservr.exe

    Image path: C:\Program Files\Microsoft SQL Server\MSSQL11.S1\MSSQL\Binn\sqlservr.exe

    Image name: sqlservr.exe

    Timestamp:        Fri Apr 06 08:19:38 2012 (4F7F098A)

    CheckSum:         00036498

    ImageSize:        0003E000

    File version:     2011.110.2316.0                                                         

    Product version:  11.0.2316.0 èSQL Server Version

    File flags:       0 (Mask 3F)

    File OS:          40000 NT Base

    File type:        1.0 App

    File date:        00000000.00000000

    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Step 5:

Use !findstack command to find scheduler monitor thread (sqlservr!SQL_SOSNonYieldSchedulerCallback )

 

 

0:146> !findstack sqlservr!SQL_SOSNonYieldSchedulerCallback

Thread 006, 1 frame(s) match  è Thread ID of scheduler monitor.

        * 07 00000000336be420 000007fee36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

 

Step 6:

Switch to scheduler monitor thread using ~[threadID]s command

 

0:146> ~[006]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

 

Step 7:

Use kC or kP command to look at the stack on scheduler monitor thread.

 

0:006> kP                        Child-SP          RetAddr           Call Site

00000000`3369c218 000007fe`fcd210ac ntdll!NtWaitForSingleObject+0xa

00000000`3369c220 00000000`ffaeecce KERNELBASE!WaitForSingleObjectEx+0x79

00000000`3369c2c0 00000000`ffaef1a4 sqlservr!CDmpDump::DumpInternal+0x20e

00000000`3369c360 000007fe`dbe50794 sqlservr!CDmpDump::Dump+0x24

00000000`3369c3a0 000007fe`dbe511e6 sqllang!SQLDumperLibraryInvoke+0x2e4

00000000`3369c640 000007fe`dbe16ddb sqllang!CImageHelper::DoMiniDump+0x426

00000000`3369c830 00000000`ffae307f sqllang!stackTrace+0xbdb

00000000`3369e270 000007fe`e36e0955 sqlservr!SQL_SOSNonYieldSchedulerCallback+0x47f

00000000`336be430 000007fe`e36866da sqldk!SOS_Scheduler::ExecuteNonYieldSchedulerCallbacks+0x375

00000000`336bebf0 000007fe`e364b53f sqldk!SchedulerMonitor::CheckScheduler+0x307

00000000`336bed60 000007fe`e364aa8f sqldk!SchedulerMonitor::CheckSchedulers+0x211

00000000`336bf1f0 000007fe`e371c779 sqldk!SchedulerMonitor::Run+0xfb

00000000`336bf320 000007fe`e3642f10 sqldk!SchedulerMonitor::EntryPoint+0x9

00000000`336bf350 000007fe`e3642d04 sqldk!SOS_Task::Param::Execute+0x21e

00000000`336bf950 000007fe`e36429e6 sqldk!SOS_Scheduler::RunTask+0xa8

00000000`336bf9c0 000007fe`e365a29f sqldk!SOS_Scheduler::ProcessTasks+0x299

00000000`336bfa40 000007fe`e365a3b0 sqldk!SchedulerManager::WorkerEntryPoint+0x261

00000000`336bfae0 000007fe`e3659fcf sqldk!SystemThread::RunWorker+0x8f

00000000`336bfb10 000007fe`e365aaf8 sqldk!SystemThreadDispatcher::ProcessWorker+0x3c8

00000000`336bfbc0 00000000`76ad652d sqldk!SchedulerManager::ThreadEntryPoint+0x236

 

Step 8:

Switch to the thread which is reported as Non-Yielding in SQL Server error log using ~~[ThreadID]s command.

 

Recollect the below error in SQL error log.  Process 0:0:0 (0x1cb0) Worker 0x0000003054F62160 appears to be non-yielding on Scheduler 0.

 

0:006> ~~[0x1cb0]s 

ntdll!NtWaitForSingleObject+0xa:

00000000`76d3135a c3              ret

 

Step 9:

Look at the current stack of Non-yielding thread. using kc command

 

0:146> kc 10

Call Site

ntdll!NtWaitForSingleObject

KERNELBASE!WaitForSingleObjectEx

sqldk!SOS_Scheduler::SwitchContext

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqldk!WorkDispatcher::DequeueTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

 

 

 

 

Recollect the information which we discussed earlier in this blog. When the SQL Server decides to take the minidump on nonyield occurrence it copies the CONTEXT of the nonyielding thread to a global structure and then initiates the dump because Sometimes it is possible that by the time SQLDumper gets the dump, the non-yielding thread has already yielded. So to get the exact snapshot of the thread we need to trust on CONTEXT saved in global structure  also we can compare the current stack of the thread with the one which is copied  and check if the thread is progressing.

 

 

Look at the above stack it cannot be Non-Yield thread because we see SwitchNonPreemptive  and SwitchContex  in the thread. 

{

SwitchPreemptive or SuspendNonPreemptive forces another worker to become owner of the scheduler. It does this by making the head of the runnable list the new owner and removing the current worker from logical scheduler control. The worker transitions ownership and is removed from SQL scheduler control until the external activity is complete. When the external activity is complete, the worker returns to the end of the runnable list by calling SwitchNonPreemptive.

}

 

Step 10:

Search for the copied stack structure using X commad

 

0:146> X sqlmin!*copiedStack*    

000007fe`df11bfe0 sqlmin!g_copiedStackInfo = <no type information>               

 

It is sqlmin!g_copiedStackInfo in this dump because this is dump is from SQL Server2012. In earlier versions of SQL Server it is sqlservr!g_copiedStackInfo

 

Step 11:

We know copied CONTEXT is stored in g_CopiedStackInfo how to find the valid offset in this structure ? If the CONTEXT is valid Rip,Rbp and RSP registers has to be valid if the dump is from X64 system and Eip,Ebp and Esp has to be valid if it is X86 system.

 

 

Let us do dd on sqlmin!g_copiedStackInfo  (remember it is sqlservr!g_copiedStackInfo in SQL2008/2005/2000)

 

0:146> dd sqlmin!g_copiedStackInfo     

000007fe`df11bfe0  00000001 00000000 3369e2e0 00000000

000007fe`df11bff0  0000a998 00000000 00000000 00000000

000007fe`df11c000  00000000 00000000 00000000 00000000

000007fe`df11c010  00000000 00000000 00000000 00000000

000007fe`df11c020  00000000 00000000 00000000 00000000

000007fe`df11c030  0010000b 00001f80 00000033 00000000

000007fe`df11c040  002b0000 00000246 00000000 00000000

000007fe`df11c050  00000000 00000000 00000000 00000000

 

Step 12:

Let us dump each address with context and verify if Rip,Rbp and RSP registers are valid. This dump is from 64-bit SQL Server so we are using Rip,Rbp and RSP registers. If the dump is from x86 system use Eip,Ebp and Esp .

 

0:146> dt 000007fe`df11bfe0 CONTEXT Rip Rsp Rbp     èType cast  000007fe`df11bfe0  with CONTEXT  .  RsP,Rbp and Rip is invalid so this is not valid offset.

ole32!CONTEXT

   +0x098 Rsp : 2

   +0x0a0 Rbp : 0x5a4d

   +0x0f8 Rip : 0

0:146> dt 000007fe`df11bff0 CONTEXT Rip Rsp Rbp      è Type cast 000007fe`df11bff0  with CONTEXT  . RIP is invalid

ole32!CONTEXT

   +0x098 Rsp : 0x72120000

   +0x0a0 Rbp : 0x3369e3cc

   +0x0f8 Rip : 0xf2

0:146> dt 000007fe`df11c000 CONTEXT Rip Rsp Rbp   è Type cast 000007fe`df11c000 with CONTEXT  . RIP ,RSP  and Rbp is valid. So this should be a valid context.Let us switch to this context and verify

ole32!CONTEXT

   +0x098 Rsp : 0x3369e2e0

   +0x0a0 Rbp : 0x3369e498

   +0x0f8 Rip : 0x76d3139a

 

Now we know 000007fe`df11c000 is valid context.  So 000007fe`df11c000 -sqlmin!g_copiedStackInfo =0x20 hence  we can use .cxr sqlmin!g_copiedStackInfo+0X20 directly to switch to the context of copied stack.

 

Step 13:

 

Switch the context of copied stack using .cxr 000007fe`df11c000  or .cxr sqlmin!g_copiedStackInfo+0X20

 

0:146> .cxr 000007fe`df11c000

rax=0000000000000002 rbx=000000003369e3cc rcx=0000000000005a4d

rdx=0000000072120000 rsi=000000000000026c rdi=0000000000000000

rip=0000000076d3139a rsp=000000003369e2e0 rbp=000000003369e498

r8=00000000000000b0  r9=0000000084a85310 r10=0000000000000000

r11=0000000000000000 r12=0000000000000000 r13=0000000000000004

r14=00000000000000f2 r15=0000000000000001

iopl=0         nv up ei pl zr na po nc

cs=0033  ss=002b  ds=0000  es=0000  fs=0000  gs=0000             efl=00000246

ntdll!NtWriteFile+0xa:

00000000`76d3139a c3              ret

 

Step 14:

 

Dump the stack of copied context using kP or kc (kc displays clean stack trace. each display line includes only the module name and the function name)

 

 

0:146> Kc

  *** Stack trace for last set context – .thread/.cxr resets it

Call Site

ntdll!NtWriteFile

KERNELBASE!WriteFile

kernel32!WriteFileImplementation

sqllang!CErrorReportingManager::WriteToErrLog

sqllang!CErrorReportingManager::SendErrorToErrLog

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqllang!ReportLoginFailure

sqllang!FRedoLogin

sqllang!login

sqllang!process_login_finish

sqllang!process_commands

sqldk!SOS_Task::Param::Execute

sqldk!SOS_Scheduler::RunTask

sqldk!SOS_Scheduler::ProcessTasks

sqldk!SchedulerManager::WorkerEntryPoint

sqldk!SystemThread::RunWorker

sqldk!SystemThreadDispatcher::ProcessWorker

sqldk!SchedulerManager::ThreadEntryPoint

kernel32!BaseThreadInitThunk

ntdll!RtlUserThreadStart

Now compare the current stack and the copied stack to see if the thread has progressed after No-Yield condition. Stack look completely different So the Non-Yield thread has progressed and completed .It is doing new work now. Also to understand why the thread was Non-Yielding look at the copied stack and not the current unless both the stacks are same. 

 

 

Current thread stack which we dumped using the thread ID in SQL Errorlog.

0:146> kc                                                                                    

Copied thread stack which SQL Server copied to global structure before generating the dump.

0:146> Kc 10

ntdll!NtWaitForSingleObject

KERNELBASE!WriteFile

KERNELBASE!WaitForSingleObjectEx

kernel32!WriteFileImplementation

sqldk!SOS_Scheduler::SwitchContext

sqllang!CErrorReportingManager::WriteToErrLog

sqldk!SOS_Scheduler::SuspendNonPreemptive

sqllang!CErrorReportingManager::SendErrorToErrLog

sqldk!WorkDispatcher::DequeueTask

sqllang!CErrorReportingManager::CwchFormatAndPrint

sqldk!SOS_Scheduler::ProcessTasks

sqllang!ReportLoginFailure

sqldk!SchedulerManager::WorkerEntryPoint

sqllang!FRedoLogin

sqldk!SystemThread::RunWorker

sqllang!login

sqldk!SystemThreadDispatcher::ProcessWorker

sqllang!process_login_finish

sqldk!SchedulerManager::ThreadEntryPoint

sqllang!process_commands

kernel32!BaseThreadInitThunk

sqldk!SOS_Task::Param::Execute

ntdll!RtlUserThreadStart

sqldk!SOS_Scheduler::RunTask

 

sqldk!SOS_Scheduler::ProcessTasks

 

sqldk!SchedulerManager::WorkerEntryPoint

 

sqldk!SystemThread::RunWorker

 

sqldk!SystemThreadDispatcher::ProcessWorker

 

sqldk!SchedulerManager::ThreadEntryPoint

 

Now let us read the copied stack and understand what would have caused a Non-Yield condition (read from bottom to top)

 

ntdll!NtWriteFile  -> WriteFile function is at top of the stack and did not complete in expected time.          
KERNELBASE!WriteFile       
kernel32!WriteFileImplementation 
sqllang!CErrorReportingManager::WriteToErrLog  ->Write the error to errorlog
sqllang!CErrorReportingManager::SendErrorToErrLog  ->Send the error to SQL Server errorlog
sqllang!CErrorReportingManager::CwchFormatAndPrint  ->format the error
sqllang!ReportLoginFailure  ->Login failed
sqllang!FRedoLogin        
sqllang!login ->Login task is processed

 

 

From the above stack we are able to understand SQL Server is writing login failed information to SQL Error log (Synchronously) and the writefile function has taken long time and did not return.So there is Non-Yield scheduler dump.

 

When will writefile operation take long time?

When there is Disk bottleneck. So the obvious solution for this issue is to  fix the performance of the disk.

Similarly there could be numerous other reasons for Non-Yield condition so look at the stack of your Non-Yield scheduler dump using the method above and make out what could have caused the Non-Yield condition.

Also refer THIS LINK to check if your stack matches with any of the known issues in SQL Server.

To Be continued…………………………

Related posts:

  • How to Analyze "Deadlocked Schedulers" Dumps?
  • Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes
  • SQL Server generated Access Violation dumps while accessing oracle linked servers.
  • SQL Server Latch & Debugging latch time out

    If you liked this post, do like us on Face Book at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

  • Posted in Configuration, Debugging, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , , , , , , , , , , , , | 28 Comments »

    A significant part of SQL Server process memory has been paged out

    Posted by Karthick P.K on June 27, 2012

    When you get “A significant part of SQL Server process memory has been paged out. This may result in performance degradation.This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 2007640, committed (KB): 4594040, memory utilization: 43%.” message in SQL Server error log

    Pay attention to Working set (KB), committed (KB) and memory utilization:% (Percentage of SQL Server memory in RAM) in above warning message. Above warning message is logged in SQL Server error log when working set reaches 50% or  below of the overall committed memory by SQL Server memory manager.

     

    What is working set: Memory allocated by the process which is currently in RAM.

     

    Committed: Total memory that is allocated by process (allocated bytes can be in RAM or Page file)

     

    Working Set trimming:  Windows is moving the allocated bytes of the process from physical RAM to page file because of memory pressure. Memory pressure is most commonly caused by applications or windows components that are requesting more memory causing OS to start trimming working set of other processes to satisfy these new requests.

     

    Before we step in to troubleshooting working set trimming warnings, here are few basics about how SQL Server memory management is designed to dynamically adjust the committed memory based on the amount of available memory on the system.

    SQL Server uses CreateMemoryResourceNotification to create a memory resource notification object  and SQL Server Resource monitor threads calls QueryMemoryResourceNotification every time it runs to identify if there is any notification. If a low memory notification comes from Windows, SQL Server scales down its memory usage and when Windows sends the high memory notification, SQL Server Server can grow its memory usage target. Low memory notification is signaled by windows when the available physical memory drops approximately below 32 MB per 4 GB, to a maximum of 64 MB. The default level that signals a high-memory-resource notification event is three times the default low-memory value. As soon as the SQL Server resource monitor threads finds low-memory-resource notification it scales down SQL Server memory usage.

     

    Why do I see “A significant part of sql server process memory has been paged out. This may result in performance degradation.” By SQL Server In spite of having above mechanism to detect the system level memory pressure and scale SQL Server memory?

     

    There are couple of situations where SQL Server Process working set might be paged out by Windows despite these memory resource notification mechanism.

    1.If windows is not sending the correct notifications to all listening processes at the right moment and thresholds

    2.If SQL Server is not responding fast enough to the low memory resource notification from Windows

    3.When low physical memory notification is received by SQL Server it will scale down its memory usage by trimming internal caches. This effect is applied for 5 seconds and then paused for 1 minute. This is to avoid any hungry/faulty application to consume all the memory making SQL Server to  scale its usage continuously. If low memory conditions still exist after 1 minute pause, the effect is applied again. So if there is physical memory pressure even after SQL Server scaling its usage for 5 seconds windows will still page out SQL Server’s working set.

    4.Conditions in Windows where working sets of all processes are trimmed

    5.Windows might decide to trim a certain percentage of working set of various or specific processes

     

     

    We can also increase the LowMemoryThreshold value so the OS will notify applications such as SQL on low memory conditions much earlier and SQL Server can respond to memory pressure much early before the system is starving for memory.

     

    How to set the LowMemoryThreshold value (in MB)?

     

    In Regedit -> go to

    HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\SessionManager\MemoryManagement

    Right click on the right pane,

    Select New -> select click DWORD Value -> enter LowMemoryThreshold

    Double Click LowMemoryThreshold -> value (choose decimal) -> 512

    System Reboot is required to take effect.

     

    Default values as per MSDN:

    “The default level of available memory that signals a low-memory-resource notification event is approximately 32 MB per 4 GB, to a maximum of 64 MB. The default level that signals a high-memory-resource notification event is three times the default low-memory value.”

     

     

    We can use the below query to extract information about the condition of OS memory and SQL memory using a query like the following. Looking at this query, you will be able to easily determine the various indicators that would have triggered the Windows to page various processes including SQL Server. Use the following query to obtain the memory notification-related information from the XML data of the ring buffer

    SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
    dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as [Notification_Time],
    cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') AS [Notification_type],
    cast(record as xml).value('(//Record/MemoryRecord/MemoryUtilization)[1]', 'bigint') AS [MemoryUtilization %],
    cast(record as xml).value('(//Record/MemoryNode/@id)[1]', 'bigint') AS [Node Id],
    cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') AS [Process_Indicator],
    cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') AS [System_Indicator],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect/@type)[1]', 'varchar(30)') AS [type],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect/@state)[1]', 'varchar(30)') AS [state],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect/@reversed)[1]', 'int') AS [reserved],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect)[1]', 'bigint') AS [Effect],
    
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@type)[1]', 'varchar(30)') AS [type],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@state)[1]', 'varchar(30)') AS [state],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@reversed)[1]', 'int') AS [reserved],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect)[2]', 'bigint') AS [Effect],
    
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@type)[1]', 'varchar(30)') AS [type],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@state)[1]', 'varchar(30)') AS [state],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@reversed)[1]', 'int') AS [reserved],
    cast(record as xml).value('(//Record/ResourceMonitor/Effect)[3]', 'bigint') AS [Effect],
    
    cast(record as xml).value('(//Record/MemoryNode/ReservedMemory)[1]', 'bigint') AS [SQL_ReservedMemory_KB],
    cast(record as xml).value('(//Record/MemoryNode/CommittedMemory)[1]', 'bigint') AS [SQL_CommittedMemory_KB],
    cast(record as xml).value('(//Record/MemoryNode/AWEMemory)[1]', 'bigint') AS [SQL_AWEMemory],
    cast(record as xml).value('(//Record/MemoryNode/SinglePagesMemory)[1]', 'bigint') AS [SinglePagesMemory],
    cast(record as xml).value('(//Record/MemoryNode/MultiplePagesMemory)[1]', 'bigint') AS [MultiplePagesMemory],
    cast(record as xml).value('(//Record/MemoryRecord/TotalPhysicalMemory)[1]', 'bigint') AS [TotalPhysicalMemory_KB],
    cast(record as xml).value('(//Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [AvailablePhysicalMemory_KB],
    cast(record as xml).value('(//Record/MemoryRecord/TotalPageFile)[1]', 'bigint') AS [TotalPageFile_KB],
    cast(record as xml).value('(//Record/MemoryRecord/AvailablePageFile)[1]', 'bigint') AS [AvailablePageFile_KB],
    cast(record as xml).value('(//Record/MemoryRecord/TotalVirtualAddressSpace)[1]', 'bigint') AS [TotalVirtualAddressSpace_KB],
    cast(record as xml).value('(//Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [AvailableVirtualAddressSpace_KB],
    cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
    cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
    cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time],
    tme.ms_ticks as [Current Time]
    FROM sys.dm_os_ring_buffers rbf
    cross join sys.dm_os_sys_info tme
    where rbf.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR' --and cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') = 'RESOURCE_MEMPHYSICAL_LOW'
    ORDER BY rbf.timestamp ASC
    

    We can use below query to to check the health of SQL Server including SQL Server working set information in past

    SELECT  CONVERT (varchar(30), GETDATE(), 121) as runtime, DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE()) AS Notification_time,    a.* , sys.ms_ticks AS [Current Time]
    FROM   (SELECT x.value('(//Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS [ProcessUtilization],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle %],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/UserModeTime) [1]', 'bigint') AS [UserModeTime],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/KernelModeTime) [1]', 'bigint') AS [KernelModeTime],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/PageFaults) [1]', 'bigint') AS [PageFaults],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/WorkingSetDelta) [1]', 'bigint')/1024 AS [WorkingSetDelta],
    x.value('(//Record/SchedulerMonitorEvent/SystemHealth/MemoryUtilization) [1]', 'bigint') AS [MemoryUtilization (%workingset)],
    x.value('(//Record/@time)[1]', 'bigint') AS [Record Time]  FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers
    WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR') AS R(x)) a  CROSS JOIN sys.dm_os_sys_info sys ORDER BY DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE())
    
    • Process Utilization shows the percentage of overall used system CPU time that was consumed by sqlservr.exe. Process Utilization is calculated as sql_process_cpu_time/total_system_cpu_time for the current time interval.
    • SystemIdle is the percentage of time that the system’s CPU’s have been idle.
    • Page Faults value is the number of hard + soft page faults that have occurred since the last snapshot.
    • Working Set is the change in the working set size  in KB, since the last snapshot.
    • Memory Utilization is working set/committed memory which is another way of saying that the number is the percentage of the process’ committed memory that is in RAM.  The farther below 100% that this number falls, the larger the percentage of SQL memory that has been trimmed and moved to the page file.

    Common Side Effects of Working set Trimming

    1.  When OS starts trimming the working set of SQL Server we would see drastic performance drop, increased I/O,non-yielding Resource Monitor / scheduler dumps etc..

    2.  IS-alive check failures resulting in SQL Server resource failure.

    3. Resource monitor thread can start  Garbage collector  if SQLCLR is enabled on this instance of SQL. When Garbage collector  is kicked off during memory pressure all other threads in the process are suspended. So if Garbage collector is taking a long time reosurce monitor thread appears stuck and hence the non-yielding errors and dumps are generated. (Refer http://support.microsoft.com/kb/2504603)

    How to troubleshoot?

    1.  Capture perfmon counters (Process: Private bytes and Working set ) to determine which applications / windows component are requesting memory and causing OS to start trimming the working set of processes including SQL Server.

    2.  Use This exe which will print the memory information of all the processes and system wide memory information (Global memory status) when the operating system signals  low memory notification.

     

    3.  Cap the SQL Server MAX Server Memory after considering the memory required by other applications, Operating system, Drivers , SQL Server Non- Bpool allocations etc. Make sure you have adequate available physical memory even when the system is under heavy load.

     

    4.  We can consider using the Lock pages in memory privilege. Remember it protects only the BPool from paging and Non-Bpool allocations can still be paged out.

    If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group  https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

     

    https://mssqlwiki.com/sqlwiki/sql-performance/basics-of-sql-server-memory-architecture/

    https://mssqlwiki.com/sqlwiki/sql-performance/windows-2008-and-windows-2008-r2-known-issues-related-to-working-set-memory/

    https://mssqlwiki.com/2012/05/18/sql-server-performance-degraded-in-32-bit-sql-server-after-i-adding-additional-ram/

    https://mssqlwiki.com/sqlwiki/sql-performance/troubleshooting-sql-server-memory/

    https://mssqlwiki.com/sqlwiki/sql-performance/io-bottlenecks/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Posted in Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: , , , , , , , , , , , , , , , | 29 Comments »

    What is Target Server Memory (KB)?

    Posted by Karthick P.K on May 27, 2012

    What is Target Server Memory (KB)?

     

    To  super simplify  in conventional memory model  SQL Server calculates something like target1 and target2 pages using below formula

     

                   

    Target1 = Current committed pages of SQL Server + ( Available Physical Memory – min (Total Physical Memory Pages / 20, Available Physical Memory Pages / 2))

    ullAvailPageFile: The maximum amount of memory the current process can commit, in bytes. This value is equal to or smaller than the system-wide available commit value. To calculate the system-wide available commit value, call GetPerformanceInfo and subtract the value of CommitTotal from the value of CommitLimit.

     

    If (Max Server Memory < ullAvailPageFile)

    {

    Target2= Max Server Memory

    }

    Else

    {

    Target2=Total Physical Memory

    }

     

    Target Server Memory (KB) =Minimum (Target1,Target2)

     

    So if AvailablePhysicalMemory is very high (or) when MaxServermemory is low then Target Server Memory (KB) would give you the MaxServerMemory  else value derived from above formula.

     

     

     

    Thanks

    Karthick P.K

    Posted in Performance, SQL Server Engine, SQL Server memory | Tagged: , , , | 3 Comments »

    SQL Server performance degraded in 32-Bit SQL Server after adding additional RAM.

    Posted by Karthick P.K on May 18, 2012

     

    Do you know that adding additional RAM can affect the performance of SQL Server Sometimes?

    I am not going to write how Optimizer can some times choose suboptimal plans when we have large amount of memory on the system but We will see how the memory which can be used by other memory clerks (aks: stolen memory) can shrink when we have large physical memory and AWE enabled.

     

    If you notice  performance of 32-Bit SQL Server degraded after you added additional RAM or if you see SQL Server memory errors like ones below after adding RAM then it could be because of Large BUF structures which reduced the size of Bpool.

     

    Errors:

    SQL Server 2005/2008

     

    Buffer Pool errors:

        BPool::Map: no remappable address found.

     

    Either BPool or MemToLeave errors:

        Error: 17803 “Insufficient memory available..”

        Buffer Distribution:  Stolen=7901 Free=0 Procedures=1 Inram=201842 Dirty=0 Kept=572…

     

    Extract from SQL Server memory design

    {

    SQL Server "User address space" is broken into two regions: MemToLeave and Buffer Pool

     

    Size of MemToLeave (MTL) and Buffer Pool (BPool) is determined by SQL Server during start up as below.

     

    MTL (Memory to Leave)= (Stack size * max worker threads) + Additional space to load Dll’s.

     

    Stack size =512 KB per thread for 32 Bit SQL Server (904K under WOW)

     

    I.e. = (256 *512 KB) + 256MB =384MB

     

    Additional space to load Dll’s= 256 MB from SQLServer2000. This space is used to store COM objects, Extended stored procedure, Linked server in SQL Server process

     

    Note: Additional space to load Dll’s can be modified using -g startup parameter.

     

    on any machine with less than 4 processors the Maximum worker Thread’s is always 256 by default (unless we change the value using SP_configure)

     

    SQL Server Buffer Pool is minimum of “Physical RAM “ or “user mode memory(2GB or 3GB) – MTL-  BUF structures”

     

    BPool = Minimum (Physical memory, User address space – MTL) – BUF structures

     

    }

     

     

    When AWE is enabled in 32-Bit SQL Server M_pbuf (part of BUF structures) which is mentioned earlier is calculated and allocated for entire physical memory on the system . Regardless of “MAX Server Memory”   This is to adjust Max server memory without restarting SQL Server.

     

    SQL Server requires 8MB to create M_pbuf for every 1GB of RAM available on the server.

     

    Machine with 64 GB RAM can consume 64 (RAM) *8MB (M_pbuf for each GB) =512 MB just for the BUF array alone.

     

    So the amount of BPOOL available for SQL Server is adversely affected.

     

    Going back to the previous formula for BPOOL. Size of Bpool for 32-Bit SQL Server with AWE enabled and 64 GB of RAM would be.

     

    BPool = Minimum (Physical memory, User address space – MTL) – BUF structures

     

    BPool= Minimum (64GB, (2GB-384MB)) – BUF structures (512+ MB)

     

    Bpool would approximately become 1GB.  Since size BPOOL become very small we might end up with memory errors.

     

    Note:  In 32-Bit SQL Server Only data pages an index pages can be placed in AWE memory. So the memory available for other SQL Server memory objects is still limited to BPOOL and MTL.  

     

    How to resolve this issue?

    Remove few GB of RAM from server J if you can convince your management that removing RAM will improve performance.

    (Or)

    There is a startup trace flag TF 836 which you can use to indicate that BUF’s need to be allocated only for the configured max server memory setting. Enable this Trace Flag (836) and Reduce the “MAX Server Memory” of SQL Server.

    (Or)

    Enable /3GB. This will increase the Size of SQL Server BPOOL by 1GB providing relief to SQL Server BPOOL pressure.

    Note: When the physical RAM in the system exceeds 16 GB and the /3GB switch is used, the operating system will ignore the additional RAM until the /3GB switch is removed.

     

     

     

    Related posts

    Troubleshooting steps for all SQL Server Memory errors

    Trouble shooting working set trim “A significant part of SQL Server process memory has been paged out”

    SQL Server lock pages in memory should I use it?

    SQL Server memory leak

    What is new in SQL Server 2012 Memory

    How to set max server memory and min server memory

     

    If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Disclaimer
    The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

    Posted in Memory, Performance, SQL General, SQL Server Engine | Tagged: , , , , , , , | 6 Comments »

    Script to free cache

    Posted by Karthick P.K on December 6, 2010

     

    DBCC FREESYSTEMCACHE ( 'ALL' ) WITH MARK_IN_USE_FOR_REMOVAL

    GO

    DBCC FREESESSIONCACHE WITH NO_INFOMSGS

    GO

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    GO

    DBCC DROPCLEANBUFFERS

    GO

    Posted in Performance, SQL General, SQL Query | Tagged: , , , , , | 59 Comments »

    How to find SQL Server and system CPU usage history :

    Posted by Karthick P.K on November 30, 2010

    SQL Server and system CPU usage history can be obtained from sys.dm_os_ring_buffers  using below query

    Note: For troubleshooting  high CPU usage in SQL Server follow https://mssqlwiki.com/2012/10/04/troubleshooting-sql-server-high-cpu-usage/

    SQL Server 2005

    DECLARE @ts_now bigint

    SELECT @ts_now = cpu_ticks / CONVERT (float, cpu_ticks_in_ms) FROM sys.dm_os_sys_info

    SELECT top 20 record_id, EventTime, 

      CASE WHEN system_cpu_utilization_post_sp2 IS NOT NULL THEN system_cpu_utilization_post_sp2 ELSE system_cpu_utilization_pre_sp2 END AS system_cpu_utilization, 

      CASE WHEN sql_cpu_utilization_post_sp2 IS NOT NULL THEN sql_cpu_utilization_post_sp2 ELSE sql_cpu_utilization_pre_sp2 END AS sql_cpu_utilization

    FROM 

    (

      SELECT 

        record.value('(Record/@id)[1]', 'int') AS record_id,

        DATEADD (ms, -1 * (@ts_now - [timestamp]), GETDATE()) AS EventTime,

        100-record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS system_cpu_utilization_post_sp2,

        record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS sql_cpu_utilization_post_sp2 , 

        100-record.value('(Record/SchedluerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS system_cpu_utilization_pre_sp2,

        record.value('(Record/SchedluerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS sql_cpu_utilization_pre_sp2

      FROM (

        SELECT timestamp, CONVERT (xml, record) AS record 

        FROM sys.dm_os_ring_buffers 

        WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR'

          AND record LIKE '%<SystemHealth>%') AS t

    ) AS t

    ORDER BY record_id desc

     

    SQL Server 2008

    DECLARE @ts_now bigint

    SELECT @ts_now = cpu_ticks / (cpu_ticks/ms_ticks)  FROM sys.dm_os_sys_info

    SELECT top 20 record_id, EventTime, 

      CASE WHEN system_cpu_utilization_post_sp2 IS NOT NULL THEN system_cpu_utilization_post_sp2 ELSE system_cpu_utilization_pre_sp2 END AS system_cpu_utilization, 

      CASE WHEN sql_cpu_utilization_post_sp2 IS NOT NULL THEN sql_cpu_utilization_post_sp2 ELSE sql_cpu_utilization_pre_sp2 END AS sql_cpu_utilization

    FROM 

    (

      SELECT 

        record.value('(Record/@id)[1]', 'int') AS record_id,

        DATEADD (ms, -1 * (@ts_now - [timestamp]), GETDATE()) AS EventTime,

        100-record.value('(Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS system_cpu_utilization_post_sp2,

        record.value('(Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS sql_cpu_utilization_post_sp2 , 

        100-record.value('(Record/SchedluerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS system_cpu_utilization_pre_sp2,

        record.value('(Record/SchedluerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS sql_cpu_utilization_pre_sp2

      FROM (

        SELECT timestamp, CONVERT (xml, record) AS record 

        FROM sys.dm_os_ring_buffers 

        WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR'

          AND record LIKE '%<SystemHealth>%') AS t

    ) AS t

    ORDER BY record_id desc

    Posted in Performance, SQL Query | Tagged: , , , , , | 16 Comments »

    Script to get current blocking tree with wait types

    Posted by Karthick P.K on November 24, 2010

     

     

    SET NOCOUNT ON; 

    SET CONCAT_NULL_YIELDS_NULL OFF 

    GO 

    WITH BLOCKERS (SPID, BLOCKED, LEVEL, BATCH,waittype,lastwaittype) 

    AS 

    ( 

       SELECT 

       SPID, 

       BLOCKED, 

       CAST (REPLICATE ('0', 4-LEN (CAST (SPID AS VARCHAR))) + CAST (SPID AS VARCHAR) AS VARCHAR (1000)) AS LEVEL, 

       REPLACE (REPLACE (T.TEXT, CHAR(10), ' '), CHAR (13), ' ' ) AS BATCH, 

       R.waittype, 

       R.lastwaittype 

       FROM sys.sysprocesses R with (nolock) 

       CROSS APPLY SYS.DM_EXEC_SQL_TEXT(R.SQL_HANDLE) T 

       WHERE (BLOCKED = 0 OR BLOCKED = SPID) 

       AND EXISTS    (SELECT SPID,BLOCKED,CAST (REPLICATE ('0', 4-LEN (CAST (SPID AS VARCHAR))) + CAST (SPID AS VARCHAR) AS VARCHAR (1000)) AS LEVEL, 

       BLOCKED, REPLACE (REPLACE (T.TEXT, CHAR(10), ' '), CHAR (13), ' ' ) AS BATCH,R.waittype,R.lastwaittype FROM sys.sysprocesses R2 with (nolock) 

       CROSS APPLY SYS.DM_EXEC_SQL_TEXT(R.SQL_HANDLE) T 

    WHERE R2.BLOCKED = R.SPID AND R2.BLOCKED <> R2.SPID) 

     

    UNION ALL 

     

    SELECT 

        R.SPID, 

        R.BLOCKED, 

        CAST (BLOCKERS.LEVEL + RIGHT (CAST ((1000 + R.SPID) AS VARCHAR (100)), 4) AS VARCHAR (1000)) AS LEVEL, 

        REPLACE (REPLACE (T.TEXT, CHAR(10), ' '), CHAR (13), ' ' ) AS BATCH, 

        R.waittype, 

        R.lastwaittype 

        FROM sys.sysprocesses AS R with (nolock) 

        CROSS APPLY SYS.DM_EXEC_SQL_TEXT(R.SQL_HANDLE) T 

        INNER JOIN BLOCKERS ON R.BLOCKED = BLOCKERS.SPID WHERE R.BLOCKED > 0 AND R.BLOCKED <> R.SPID 

    ) 

     

    SELECT N'       ' + REPLICATE (N'|      ', LEN (LEVEL)/4 - 2) + CASE WHEN (LEN (LEVEL)/4 - 1) = 0 THEN 'HEAD - ' ELSE '|------ ' END + CAST (SPID AS VARCHAR (10)) + ' '  + BATCH AS BLOCKING_TREE ,  waittype ,lastwaittype,  GETDATE() as Time FROM BLOCKERS with (nolock) ORDER BY LEVEL ASC 

    go

     

    –By Ajith Krishnan

    Posted in Performance, SQL Query | Tagged: , , , , , , , , , | 6 Comments »

    How to rebuild index and update statistics for all the tables in database.

    Posted by Karthick P.K on September 26, 2010

     
    EXEC sp_MSforeachtable 'UPDATE STATISTICS ? WITH FULLSCAN'   --  {can be run anytime}
    
    Exec sp_MSforeachtable "dbcc dbreindex('?')"      --- {Always run this on a off-peak hour on any SQL Server instance}
     
    

    Thanks

    Karthick

    Posted in Optimizer, Performance, SQL General, SQL Query | Tagged: , , , , | 3 Comments »

    How to Analyze Deadlocked Schedulers Dumps?

    Posted by Karthick P.K on June 15, 2010

    How to Analyze "Deadlocked Schedulers" Dumps?

    Do you see "Deadlocked Schedulers" errors similar to one below and stuck?

    From SQL Server Errorlog

    **Dump thread – spid = 0, PSS = 0x0000000000000000, EC = 0x0000000000000000

    ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\SQLDump0001.txt

    * BEGIN STACK DUMP:

    * Deadlocked Schedulers

    * Short Stack Dump

    Stack Signature for the dump is 0x00000000000003D0

    New queries assigned to process on Node 0 have not been picked  up by a worker thread in the last 60 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time.  Use the "max worker threads" configuration option to increase number  of allowable threads, or optimize current running queries.  SQL Process Utilization: 0%. System Idle: 69%.

    New queries assigned to process on Node 3 have not been picked up by a worker thread in the last 300 seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time. Use the "max worker threads" configuration option to increase number of allowable threads, or optimize current running queries. SQL Process Utilization: 6%. System Idle: 90%.

    Cause

    We get Deadlocked Schedulers error (New queries assigned to process on Node n have not been picked up by a worker thread in the last 300 seconds) and dump when Scheduler Monitor detects threads(workers) are not Progressing on schedulers of one (or) all nodes for 60 seconds (300 seconds in Denali) and there are new work request (tasks) but there is no idle threads to pick up the the work request and SQL Server could not spawn new thread to process the new work request (Max worker threads reached).

    Some of common causes are most of the tasks are waiting on a single resource because of resource bottleneck, excessive blocking, excessive parallelism, very long running Queries executed by all workers, all threads waiting on trace write waits etc..

    SQL Server will not accept new connections or cannot process new requests  when there is deadlocked scheduler condition (Exception: deadlocked condition on single node). 

    Note: In systems with multiple nodes (NUMA)  If all the threads which belong to schedulers of single node is exhausted (or) Schedulers not progressing on single node can cause deadlocked scheduler condition.

    Detailed explanation about dead lock schedulers can be found in This blog from my mentor J.P

    Steps to analyze "Deadlocked Schedulers" Dumps.

    To analyze the dump download and Install Windows Debugger from This  link

    Step 1:

    Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

    Step 2:

    on command window type
    .sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

    Step 3:

    Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

    Step 4:

    Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

    0:002> lmvm sqlservr
    start             end                 module name
    00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb
        Loaded symbol image file: sqlservr.exe
        Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe
        Image name: sqlservr.exe
        Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)
        CheckSum:         025FEB5E
        ImageSize:        02679000
        File version:     2005.90.4266.0
        Product version:  9.0.4266.0
        File flags:       0 (Mask 3F)
        File OS:          40000 NT Base
        File type:        1.0 App
        File date:        00000000.00000000
        Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

    Step 5:

    Type  ~*kL 20   and look at the stack of all the threads  to find what majority of threads are doing.

    Note: If you find most of your threads are waiting on stack which is not listed below please paste the stack in comments session of this blog (or) In MSSQLWIKI  face book group we will try to answer you. If you don’t get prompt reply from the community, you may need to open a support ticket with Microsoft.

    1. If it is blocking issue and If most of the threads are  waiting to acquire a lock you will find the most of the stack similar to one below. (We try to acquire lock and go to wait, since someone is holding a lock)

    ntdll!ZwSignalAndWaitForSingleObject

    kernel32!SignalObjectAndWait

    sqlservr!SOS_Scheduler::SwitchContext

    sqlservr!SOS_Scheduler::Suspend

    sqlservr!SOS_Event::Wait

    sqlservr!LockOwner::Sleep

    sqlservr!lck_lockInternal

    sqlservr!GetLock

    2. If most of threads are stuck while trying to write profiler events to the destination you might find stack similar to one below

    ntdll!ZwSignalAndWaitForSingleObject

    kernel32!SignalObjectAndWait

    sqlservr!SOS_Scheduler::SwitchContext

    sqlservr!SOS_Task::Sleep

    sqlservr!CTraceRowsetIoProvider::GetFreeBuffers

    sqlservr!CTraceWriteRequest::InitForRowsetTrace

    sqlservr!CTraceRowsetIoProvider::InitializeWriteRequest

    sqlservr!CTrace::WriteRecord

    sqlservr!CTraceController::ProduceRecord

    sqlservr!CTraceData::TracePreBatchEvent

    sqlservr!CSQLSource::Execute

    sqlservr!process_request

    sqlservr!process_commands

    sqlservr!SOS_Task::Param::Execute

    sqlservr!SOS_Scheduler::RunTask

    sqlservr!SOS_Scheduler::ProcessTasks

    3. If your stack’s  are like one below refer http://support.microsoft.com/default.aspx?scid=kb;EN-US;974205

    sqlservr!SpinlockBase::Sleep
    sqlservr!SpinlockBase::SpinToAcquire
    sqlservr!TSyncHashTable_EntryAccessorsqlservr!CQSIndexStatsMgr::AddNewMissingIndex
    sqlservr!CIdxSuggestion::Register
    sqlservr!COptExpr::PqteConvert
    sqlservr!CPhyOp_Top::PqteConvert
    sqlservr!COptExpr::PqteConvert
    sqlservr!COptExpr::PqteConvertTree
    sqlservr!COptContext::PcxteOptimizeQuery
    sqlservr!CQuery::Optimize
    sqlservr!CQuery::PqoBuild
    sqlservr!CStmtQuery::InitQuery
    sqlservr!CStmtSelect::Init

    4. If you see many stacks like the one below it could be BPOOL memory pressure (or) Lazy writer  waiting on I/O

    sqlservr!BPool::Steal

    sqlservr!SQLSinglePageAllocator::AllocatePages

    sqlservr!MemoryNode::AllocatePagesInternal

    sqlservr!MemoryClerkInternal::AllocatePages

    sqlservr!IMemObj::PbGetNewPages

    sqlservr!CSlotPageMgr::PbAllocate

    5. If you see many stacks like the one below it should be because of excessive parallelism

    sqlservr!CQScanXProducerNew::Open

    sqlservr!FnProducerOpen

    sqlservr!FnProducerThread

    sqlservr!SubprocEntrypoint

    6. If you see many stacks like the one below (Many threads waiting to flush log) it should be because of disk bottleneck’s. Check if you see "I/O requests taking longer than 15 seconds" messages in Errorlog before Deadlocked Schedulers Dumps. Refer Troubleshooting sql server I/O for troubleshooting I/O issues.

    sqlservr!SOS_Event::Wait

    sqlservr!SQLServerLogMgr::WaitLCFlush

    sqlservr!SQLServerLogMgr::LogFlush

    sqlservr!SQLServerLogMgr::WaitLogFlush

    sqlservr!XdesRMFull::Commit

     

    If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI to post your SQL Server questions to SQL Server experts

    Related posts:

     

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Disclaimer

    The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

    Posted in Debugging, Performance, SQL Server Engine | Tagged: , , , , , , , , , , | 85 Comments »

    Script to clear stats

    Posted by Karthick P.K on January 20, 2010

    How to reset SQL Server stats with out restarting.

    DBCC SQLPERF ('spinlockstats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('netstats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('rastats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('iostats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('threads', CLEAR);

     

    GO

     

    DBCC SQLPERF ('logspace', CLEAR);

     

    GO

     

    DBCC SQLPERF ('umsstats', CLEAR);

     

    GO

     

    DBCC SQLPERF ('waitstats', CLEAR);

     

    GO

    Posted in Performance, SQL General, SQL Query | Tagged: | 2 Comments »

    Monitoring Tempdb usage

    Posted by Karthick P.K on January 13, 2010

    Monitoring Tempdb space usage and identifying the session and query which Consumes Tempdb

    The total space used by Tempdb consists of 

    1. User Objects 

    2.  Internal Objects

    3. Version Store

    4. Free Space.

    Use  Below Query to Track which objects (above) is consuming pace in TempDb. 
     

    SELECT
    
    SUM(unallocated_extent_page_count) AS [free pages], (SUM(unallocated_extent_page_count)*1.0/128) AS [free space in MB],SUM(version_store_reserved_page_count) AS [version store pages used],
    
    (SUM(version_store_reserved_page_count)*1.0/128) AS [version store space in MB],SUM(internal_object_reserved_page_count) AS [internal object pages used],
    
    (SUM(internal_object_reserved_page_count)*1.0/128) AS [internal object space in [MB],SUM(user_object_reserved_page_count) AS [user object pages used],
    
    (SUM(user_object_reserved_page_count)*1.0/128) AS [user object space in MB]
    
    FROM sys.dm_db_file_space_usage;
    
    go
    

    Once you have identified the objects identify the query and session which is consuming tempdb using the query listed below

    –Use below query to identify which Query and Session is consuming the space in TempDB

     
    SELECT R1.session_id, R1.request_id, R1.Task_request_internal_objects_alloc_page_count, R1.Task_request_internal_objects_dealloc_page_count,
    
    R1.Task_request_user_objects_alloc_page_count,R1.Task_request_user_objects_dealloc_page_count,R3.Session_request_internal_objects_alloc_page_count ,
    
    R3.Session_request_internal_objects_dealloc_page_count,R3.Session_request_user_objects_alloc_page_count,R3.Session_request_user_objects_dealloc_page_count,
    
    R2.sql_handle, RL2.text as SQLText, R2.statement_start_offset, R2.statement_end_offset, R2.plan_handle FROM (SELECT session_id, request_id, 
    
    SUM(internal_objects_alloc_page_count) AS Task_request_internal_objects_alloc_page_count, SUM(internal_objects_dealloc_page_count)AS 
    
    Task_request_internal_objects_dealloc_page_count,SUM(user_objects_alloc_page_count) AS Task_request_user_objects_alloc_page_count,
    
    SUM(user_objects_dealloc_page_count)AS Task_request_user_objects_dealloc_page_count FROM sys.dm_db_task_space_usage 
    
    GROUP BY session_id, request_id) R1 INNER JOIN (SELECT session_id, SUM(internal_objects_alloc_page_count) AS Session_request_internal_objects_alloc_page_count,
    
    SUM(internal_objects_dealloc_page_count)AS Session_request_internal_objects_dealloc_page_count,SUM(user_objects_alloc_page_count) AS Session_request_user_objects_alloc_page_count,
    
    SUM(user_objects_dealloc_page_count)AS Session_request_user_objects_dealloc_page_count FROM sys.dm_db_Session_space_usage 
    
    GROUP BY session_id) R3 on R1.session_id = R3.session_id 
    
    left outer JOIN sys.dm_exec_requests R2 ON R1.session_id = R2.session_id and R1.request_id = R2.request_id
    
    OUTER APPLY sys.dm_exec_sql_text(R2.sql_handle) AS RL2
    Where 
    Task_request_internal_objects_alloc_page_count >0 or  
    Task_request_internal_objects_dealloc_page_count>0 or 
    Task_request_user_objects_alloc_page_count >0 or 
    Task_request_user_objects_dealloc_page_count >0 or 
    Session_request_internal_objects_alloc_page_count >0 or 
    Session_request_internal_objects_dealloc_page_count >0 or 
    Session_request_user_objects_alloc_page_count >0 or 
    Session_request_user_objects_dealloc_page_count >0 
    

    Known issues related to TEMPDB Shrink

    FIX: The used space in the tempdb database increases continuously when you run a query that creates internal objects in the tempdb database in SQL Server 2005
     

    Thank you,

    Karthick P.K | My Facebook Page |My Site| Blog space| Twitter

    Posted in Performance, SQL General, SQL Query | Tagged: , , , , , , , , , , | 4 Comments »

    Trivial Plan

    Posted by Karthick P.K on May 29, 2009

    What is Trivial Plan?

    A trivial plan results when the query optimizer determines that given tables referenced in the query and the indexes existing on them, only one plan is possible.
    Obviously, a recompilation would be futile in such a case. A query that has generated a trivial plan may not always generate a trivial plan, of course.
    For example, new indexes might be created on the underlying tables, and so multiple access paths become available to the query optimizer. Additions of such indexes
    would be detected and a correctness-related recompilation might replace the trivial plan with a non-trivial one.
    SQL Server doesn’t  recompile trivial plans if we have update stats on the underlying table and  will not fire Auto updatestats for the query with Trivial Plan.

    Let us see the below example query

    create database AutoS

    use autos

    create table tab(col1 int)
    declare @i int

    set @i = 0

    while @i < 1000

    begin

    insert into tab(col1) values (@i) –We are inserting 1000 Rows with value 1 to 999
    set @i = @i + 1

    end

    set @i = 0

    while @i < 1000

    begin

    insert into tab(col1) values (9999) –We are inserting 1000 Rows with value 9999

    set @i = @i + 1

    end

    create statistics t_col1 on tab(col1) with fullscan

    dbcc show_statistics (‘dbo.tab’,’t_col1′)

    –So we have created  a table with 2000 rows. 1000 of them have the values 9999 and 1000 of them have values 0 to 999.
    –Create a Simple store Procedure

    create procedure test @a int
    as
    begin
    select * from tab where col1 = @a
    end
    set statistics profile on
    SET STATISTICS XML ON

    exec test 9999  — Check the XML Plan you will see it is trivial plan(StatementOptmLevel=”TRIVIAL”)
    set statistics profile off

    delete from tab –Now delete all the rows from the table

    Execute the store Procedure again

    exec test 9999

    If Optimizer has not choosed the Trival Plan which is already cached, auto updatestats would have been fired for the table(Trace flag 8721 will dump information when
    AutoStat has been run)and would have caused Recompile of SP.

    Regards

    Karthick P.K

    Posted in Performance | Tagged: , , | 15 Comments »

    Using DMVs to find out the index usage history- SQLServer Index Usage

    Posted by Karthick P.K on March 10, 2009

    SQLserver 2005 ships with a set of DMVs that can help you identify the missing indexes for your workload, Analyze the effectiveness of the existing ones and help find out index fragmentation.

    Using DMVs to find out the index usage history

    Over a period of time, you could create a lot of indexes on your tables and modify existing ones. However, in SQL 2000, you couldn’t estimate how effective were each of these indexes. Poorly design indexes could lead to performance overhead instead of enhancing performance.

    In SQL 2005, you can query the sys.dm_db_index_usage_stats DMV to find out the indexes that have NEVER been used since the last start of SQL Server. You can use the following query to find that out:

     1: select object_name(i.object_id) as ObjectName,

     

     2: i.name as IndexName, s.user_updates, s.user_seeks, s.user_scans,

     

     3: s.user_lookups

     

     4: from sys.indexes i

     

     5: left join sys.dm_db_index_usage_stats s

     

     6: on s.object_id = i.object_id and i.index_id = s.index_id and s.database_id =

     

     7: <dbid>

     

     8: where objectproperty(i.object_id, 'IsIndexable') = 1 and

     

     9: -- index_usage_stats has no reference to this index (not being used)

     

     10: s.index_id is null or

     

     11: -- index is being updated, but not used by seeks/scans/lookups

     

     12: (s.user_updates > 0 and s.user_seeks = 0

     

     13: and s.user_scans = 0 and s.user_lookups = 0)

     

     14: order by object_name(i.object_id) asc

     

    In the output, you will ALL the indexes that have never been used by any sort of workload on your server, since the last start of SQL Server.

    For indexes that have NEVER been used (either for a SELECT or a DML statement), all columns will be NULL

    For indexes that have NEVER been used (for a SELECT), but had to be updated due to a DML statement, the user_updates column will be >0, while other columns will be 0. It is these indexes that could cause severe performance overhead for your DML statements and might be worth dropping.

    Give ample time for SQL Server to get exposed to all the workload after a restart, before running this query.

    Using DMVs to find out missing indexes

    When the query optimizer generates a query plan, it analyzes what are the best indexes for a particular filter condition. If the best indexes do not exist, the query optimizer generates a suboptimal query plan, but still stores information about these indexes. The missing indexes feature enables you to access information about these indexes so you can decide whether they should be implemented.

    For more information on how to use this feature, please visit the following link

    http://msdn2.microsoft.com/en-us/library/ms345417.aspx

     

    Using DMVs to find out index fragmentation

    The sys.dm_db_index_physical_stats dynamic management function replaces the DBCC SHOWCONTIG statement.

    You can learn more about using this DMV to identify fragmentation, correcting it and possibly automating this activity for your server by visiting the following link:

    http://msdn2.microsoft.com/en-us/library/ms188917.aspx

    Regards

    Karthick P.K

    Posted in Performance, SQL General | Tagged: | 1 Comment »

    SQL Server: Table Variables (VS) Temp Tables

    Posted by Karthick P.K on January 15, 2009

    Table Variables (VS) Temp Tables

    SQLServer2005 caches temp tables and temp variables only under some conditions.
    Scenarios where temp table/variable are not cached (see below) may cause performance degradation as compared to SQLServer2000.

    Following are scenarios where temp table/variable are not cached:
    1. select into #t
    2. alter table #t
    3. create index on #t
    4. Global temp tables (##t)
    5. Local temp tables on adhoc level (nest level 0)
    6. table variables are also not cached for dynamic SQL.

    What are some of the drawbacks of table variables?

    These are some of the drawbacks as compared to temporary tables:

    Table variables do not maintain statistics like temporary tables can. Statistics cannot be created on table variables through automatic creation or by using the CREATE STATISTICS statement.

    Therefore, for complex queries on large tables, the lack of statistics may deter the optimizer to determine the best plan for a query, thus affecting the performance of that query.

    The table definition cannot be changed after the initial DECLARE statement.
    Tables variables cannot be used in a INSERT EXEC or SELECT INTO statement.
    CHECK constraints, DEFAULT values, and computed columns in the table type declaration cannot call user-defined functions.
    You cannot use the EXEC statement or the sp_executesql stored procedure to run a dynamic SQL Server query that refers a table variable, if the table variable was
    created outside the EXEC statement or the sp_executesql stored procedure because table variables can be referenced in their local scope only, an EXEC statement and
    a sp_executesql stored procedure would be outside the scope of the table variable.

    However, you can create the table variable and perform all processing inside the EXEC statement or the sp_executesql stored procedure because then the table
    variables local scope is in the EXEC statement or the sp_executesql stored procedure.

    Are table variables memory-only structures that are assured better performance as compared to temporary or permanent tables, because temporary or permanent tables
    are maintained in a database that resides on the physical disk and also logged?

    A table variable is not a memory-only structure. Because a table variable might hold more data than can fit in memory, it has to have a place on disk to store
    data. Table variables are created in the tempdb database similar to temporary tables. If memory is available, both table variables and temporary tables are
    created and processed while in memory (data cache).

    Do I have to use table variables instead of temporary tables?

    The answer depends on these three factors:
    The number of rows that are inserted to the table and are they joined with other tables.
    The number of recompilations the query is saved from.
    The type of queries and their dependency on indexes and statistics for performance.

     

    Option recompile can help optimizer to estimate the number of rows table variable  refer http://blogs.msdn.com/b/psssql/archive/2010/08/24/query-performance-and-table-variables.aspx 

     

    If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

    Thank you,

    Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

    Posted in Memory, Performance, SQL Query, SQL Server Engine | Tagged: , , , | 8 Comments »