MSSQLWIKI

Running 1 Million databases on Azure SQL

Posted by Karthick P.K on October 11, 2020

Curious to know how my team manage over 1 million SQL databases across dozens of datacenters in Azure to enable the Common Data Service behind the Power Platform & Dynamics 365 hands free through Spartan and DAMS!

Read for some never before shared details

https://www.linkedin.com/posts/karthick-pk-1b342727_running-1m-databases-on-azure-sql-for-a-large-activity-6720400296566243328-x6_t

Thank you

Karthick P.K

Posted in SQL General | 1 Comment »

Top SQL Server blogs from MSSQLWIKI

Posted by Karthick P.K on January 11, 2014

Top SQL Server blogs from MSSQLWIKI

Year 2013 was a wonderful year for MSQLWIKI.com and these are the top blogs that got the most views in 2013. I am summarizing the blogs based on category for easy read

Debugging ( Using public symbols )

What is SQL Server Latch & Debugging latch time out

How to analyze Non-Yielding scheduler or Non-yielding IOCP Listener dumps

SQL Server memory leak

Debugging memory Leaks using Debug diagnostic tool.

Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

SQL Server Exception , EXCEPTION_ACCESS_VIOLATION and SQL Server Assertion

SQL Server generated Access Violation dumps while accessing oracle linked servers.

SQL Server Memory

Troubleshooting SQL Server Memory
Basics of SQL Server Memory Architecture

SQL Server lock pages in memory

Max server memory – Do I need to configure?

A significant part of SQL Server process memory has been paged out
SQL Server memory leak

Debugging memory Leaks using Debug diagnostic tool.

SQL Server and VMware ballooning

SQL Server fails to start with error "Failed allocate pages: FAIL_PAGE_ALLOCATION 1" During startup

SQL Server -g

SQL Server 2012 Memory

False warning “A significant part of sql server process memory has been paged out”

What is RESOURCE_SEMAPHORE_QUERY_COMPILE?

What is Target Server Memory (KB)?

SQL Server performance degraded in 32-Bit SQL Server after adding additional RAM.

What does MemoryUtilization in sys.dm_os_ring_buffers and Memory_utilization_percentage in sys.dm_os_process_memory represents?

SQL Server operating system

SQL Server NUMA load distribution

SQL Server Operating system (SOS) Series 3

SQL Server Operating system (SOS) Series 2

SQL Server Operating system (SOS) Series 1

Latch contention

Tempdb latch contention

SQL Server Latch

SQL Server performance

SQL Server Query optimization

SQL Server Parameter sniffing

Optimizer Timeout or Optimizer memory abort

Troubleshooting SQL Server high CPU usage

I/O requests taking longer than 15 seconds to complete on file

SQL Server NUMA load distribution

Script to get all the Query and cached plans

Trivial Plan

Script to get current blocking tree with wait types

How to get SQL Text and Query Plan for statements which are executing now

Script to clear stats

Script to free cache

How to rebuild index and update statistics for all the tables in database.

SQL Server Database tuning advisor

Capture context switches from dm_os_ring_buffers

Replication

Transactional Replication Part-1

Transactional Replication Part-2

Troubleshooting Transactional replication Latency using AgentStatistics

I get error while I update a row in immediate updating subscription what is the cause?

How to start replication agent in command prompt.

How to create transactional replication in SQL Server

How to configure distribution for replication

Connectivity

SQL Server connectivity, Kerberos authentication and SQL Server SPN (Service Principal Name for SQL Server)

Behavior of SQL Server default instance on a NON-Default port

Always on

The connection to the primary replica is not active. The command cannot be processed

Ring buffers

Inside sys.dm_os_ring_buffers

Data structures

SQL Server: Ghost records

Database mail

Database Mail errors in SQL Server (Troubleshooting steps)

Copy Data

Copy database wizard or replication setup might fail due to broken dependency

How to move the LOB data from one file group to other?

Create script for all objects in database with data

SQL Server agent

SQL Server Agent is taking long time to start

Security

“An error occurred during decryption”

Configuring SSL for SQL Server using Microsoft Certificate Authority Server

How to find who altered my SQL Server Login

(SQLServer) Initializing the FallBack certificate failed with error code: 1, state: 1, error number: -2146893802.

How to enable Constraint delegation for SQLServer2005

Bulk insert fails with error 4861 Cannot bulk load because the file could not be opened

SQL Server cluster

SQL-Server resource fails to come online IS Alive check fails

SQLServer LooksAlive and IsAlive Check

Cannot bring the Windows Server Failover Clustering (WSFC) resource (ID ‘ ‘) online (Error code 5018).

SQL Server cluster installation checklist

How to add node to SQL Server cluster

HOW TO CREATE CLUSTER USING HYPER-V

HOW TO INSTALL SQL Server CLUSTER IN HYPER-V

How to install cluster in windows 2008 and windows 2012

windows cluster freezes at “waiting for notification that node ‘‘ is a fully functional member of the cluster”

Tempdb usage

How to monitor the Session and query which Consumes Tempdb

SQL Server setup

Service pack ,Hotfix and CU installation for SQL Server 2005 might fail with “Unable to install Windows Installer MSI file“

Installation of SQLServer2008 fails (The registry key SYSTEM\CurrentControlSet\Services\RsFx0102\InstancesShares is missing)

Installation of SQLserver2008 cluster fails on windows2008.(The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0x8007139F)

Installation of SQLServer2005/2008 Fails on Windows2008 Cluster.

Transaction log for the database is growing and system SPID is holding open transaction

Unable to start SQLServer agent resource on cluster after upgrading to 9.00.3186 or Higher

A failure was detected for a previous installation, patch, or repair during configuration for features [SQL_PowerShell_Engine_CNS,SQL_PowerShell_Tools_ANS]. In order to apply this patch package (KB968369), you must resolve any issues with the previous operation that failed.

SQL Server2008/SQL Server2012: Script level upgrade for database ‘master’ failed because upgrade step ‘sqlagent100_msdb_upgrade.sql’ encountered error 574, state 0, severity 16

SQLServer 2008 Fails to come online on cluster after upgrade

Transaction log for the database is growing and system SPID is holding open transaction

SQL Server Error while enabling Windows feature : NetFx3, Error Code : -2146498298

A SQL product other than SQL Server 2014 CTP1 is detected. You cannot install this release until the existing instances of SQL products are uninstalled

Backup & restore

The backup of the file or filegroup "" is not permitted because it is not online. BACKUP can be performed by using the FILEGROUP or FILE clauses to restrict the selection to include only online data.

Ring buffers

Inside sys.dm_os_ring_buffers

How to find SQL Server and system CPU usage history

SQL Agent and SQL Server startup issues

SQL Server Agent is taking long time to start

The database ‘model’ is marked RESTORING and is in a state that does not allow recovery to be run.Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive

We do not see the SQL server and SQL Agent status from management studio. When we right click the instance, start, stop and resume options is disabled.

(SQLServer) Initializing the FallBack certificate failed with error code: 1, state: 1, error number: -2146893802.

Profiler and SQL Server tools

My “c:\” Drive gets full when I open the profiler trace

Beyond XP_READERRORLOG (Parameters of XP_READERRORLOG)

“Value cannot be null” when i connect SQL Server from SSMS

How to find all the profiler traces running on my SQL Server

I cant not backup my database from SSMS….

I get Exception when i open SQL Server management studio

Tasks and connection icons missing after importing DTS in SQL Server management studio

Saving changes is not permitted. The changes you have made require the following tables to be dropped and re-created

Runtime error: ActiveX component can’t create object: ‘SQLDMO.SQLServer’

Others

System stored procedures like sp_addsrvrolemember or sp_addserver may fail because of McAfee Host Intrusion Prevention

Mssqlsystemresource database

How to Browse (or) view objects and there code in mssqlsystemresource Database

Checkdb failures

DBCC CheckDB fails with error "The database could not be checked as a database snapshot could not be created and the database or table could not be locked

SSIS/DTS

SSIS package fails with out of memory errors

SSIS package fails when executed as job using proxy account

Programming

Multi Threaded OVELAPPED and Nonbuffered I/O Example

Asynchronous I/O example

How to Create process in c++….. CreateProcess function

how to Open CreateFile (Createfile example)

How to get the current state of cluster resource?

QueryMemoryResourceNotification & CreateMemoryResourceNotification (How SQL Server identifies low system memory on the system and respond to low system memory?)

AWE allocator API’s (How SQL Server AWE works)

How to check if local system is connected to a network and identify the type of network connection

How to check if my account has LPM privilege

The backup of the file or filegroup "" is not permitted because it is not online. BACKUP can be performed by using the FILEGROUP or FILE clauses to restrict the selection to include only online data.

How to retrieve information about the file system and volume associated with the specified root directory (GetVolumeInformation function)

How to add an IP Address when we Add new NIC to node where SQLServer2005 instance is running.

CryptAcquireContext and CryptReleaseContext example

Criticalsection example

CreateProcess example

CreateFileMapping or MapViewOfFileEx example

How to & Other miscellaneous blogs

PREEMPTIVE_OS_AUTHORIZATIONOPS waits in SQL Server

SQL Server Backup compression

Types of isolation levels in SQL Server

SQL Server database snapshot

How to create table with filestream column and Insert data

How to enable and configure Filestream in SQL SERVER 2008 / 2012

How to import/Export data in SQL Server

Steps to enable Alwayson in SQL Server 2012

How to create database mirroring

How to create log shipping in SQL Server

How to create merge replication in SQL Server

How to create shared disk using iSCSI Software Target

NON CLUSTERED COLUMNSTORE INDEX

How to create clustered and non-clustered index

How to create SQL Server Agent Jobs

How to create backups using database maintenance plan

How to set Max degree of parallelism (MAXDOP)

How to configure maximum server memory

How to stimulate a SQL Server resource failure in cluster

Error 601 : Could not continue scan with NOLOCK due to data movement.

SQL Server monitoring

Removing database mirroring

Builtin\Administrators cannot login in to SQL Server

SQL Server assert in Location: purecall.cpp:51

Trace waits in SQLServer (SQLTRACE_BUFFER_FLUSH,TRACEWRITE,SQLTRACE_WAIT_ENTRIES,SQLTRACE_LOCK)

How to view the Space used by each table in database

FILESTREAM feature is disabled

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company or anyone else. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in Summary | Tagged: sqlserver blogs list, Top blogs in sqlserver from mssqlwiki, Top sqlserver blogs from mssqlwiki | 8 Comments »

SQL Server connectivity, Kerberos authentication and SQL Server SPN (Service Principal Name for SQL Server)

Posted by Karthick P.K on December 9, 2013

SQL Server connectivity, Kerberos authentication and SQL Server SPN (SQL Server Service Principal Name )

Most of you would already be aware of Kerberos authentication in SQL Server (http://technet.microsoft.com/en-us/library/cc280744%28v=sql.105%29.aspx) It is mandate for delegation and highly secured method for client server authentication.

Connection failures caused by Kerberos authentication issues drives majority of questions in MSDN and other SQL Server forums. Some of the common errors you would get when Kerberos authentication fails include.

{

Cannot generate SSPI context

Login failed for user ‘(null)’
Login failed for user ”
Login failed. The login is from an untrusted domain and cannot be used with Windows authentication.

Linked server connections failing

SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security; the connection has been closed
SSPI handshake failed with error code 0x80090304 while establishing a connection with integrated security; the connection has been closed

Note: For the last two errors error code translates to

Error -2146893039 (0x80090311): No authority could be contacted for authentication
Error -2146893052 (0x80090304): The Local Security Authority cannot be contacted

So it is pretty much clear that if you get last two errors then it means secure session could not be established with you domain controller. So you can use nltest /SC_QUERY:YourDomainName to check the domain connection status.

You will also see below event from netlogon session in system event log when your SQL Server connection fails with last two errors in the above list

Log Name: System
Source: NETLOGON
Event ID: 5719
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: client.Contoso.com
Description: This computer was not able to set up a secure session with a domain controller in domain CONTOSO due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

}

Before we jump into troubleshooting Connection failures caused by Kerberos authentication let see how to force SQL Server to use Named pipes protocol when you get above errors and workaround the problem till you fix the Kerberos authentication with TCP/IP. To force SQL Server to use NP protocol you can use any one of the below methods.

1. Prefix the SQL Server instance name with np: Ex: If your server name is Mssqlwiki\Instance1 , modify the connection string to np: Mssqlwiki\Instance1

2. Change the order of client protocols and bring Named pipes before the TCP/IP protocol (SQL Server configuration manager -> SQL Server native client configuration -> Client protocols -> Order – >Bring Named pipes above TCP/IP)

Note: You have to do the change both in 32-Bit and 64-Bit SQL Server native client configuration in your client systems.

3. Create a named pipe Alias

When you get Kerberos authentications errors or if you notice SQL Server is failing back to NTLM authentication you can follow below steps to troubleshoot Kerberos failures.

1. How to check If SQL Server is suing Kerberos authentication?

SELECT net_transport, auth_scheme FROM sys.dm_exec_connections WHERE session_id = @@spid

For the Kerberos authentication to work in SQL Server, SPN (Service principal name) has to be registered for SQL Server service. SPN is automatically registered by SQL Server using the startup account of SQL Server when SQL Server starts and deregistered when SQL Server is stopped. Kerberos authentication would fail when the SPN is not registered (or) when there is duplicate SPN’s registered in Active directory (or) client system is not able to get the Kerberos ticket (or) DNS is not configured properly.

2. How to Check if SPN’s are successfully registered in the active directory?

When SPN’s is registered in active directory during the startup of SQL Server by startup account of SQL Server, a message similar to one below is logged in SQL Server error log.

2013-12-05 22:21:47.030 Server The SQL Server Network Interface library successfully registered the Service Principal Name (SPN) [ MSSQLSvc/node2.mssqlwiki.com ] for the SQL Server service.

2013-12-05 22:21:47.030 Server The SQL Server Network Interface library successfully registered the Service Principal Name (SPN) [ MSSQLSvc/node2.mssqlwiki.com:1433 ] for the SQL Server service.

When SQL Server could not register SPN’s during the startup below error message is logged in SQL Server error log?

Server The SQL Server Network Interface library could not register the Service Principal Name (SPN) [ MSSQLSvc/node2.mssqlwiki.com ] for the SQL Server service. Windows return code: 0xffffffff, state: 53. Failure to register a SPN might cause integrated authentication to use NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies and if the SPN has not been manually registered.

Server The SQL Server Network Interface library could not register the Service Principal Name (SPN) [ MSSQLSvc/node2.mssqlwiki.com:1433 ] for the SQL Server service. Windows return code: 0xffffffff, state: 53. Failure to register a SPN might cause integrated authentication to use NTLM instead of Kerberos. This is an informational message. Further action is only required if Kerberos authentication is required by authentication policies and if the SPN has not been manually registered.

3. I see SQL Server could not register SPN error message in SQL Server errorlog. How do I make SQL Server register SPN’s automatically?

If your Domain controller is windows2008R2 or lower grant Read servicePrincipalName and Write servicePrincipalName privilege for startup account of SQL Server using ADSIEDIT.msc tool

Launch the ADSI Edit -> Domain -> DC=DCNAME,DC=com -> CN=Users -> CN=SQLServer_ServiceAccount -> Properties -> security tab-> advanced ->Add self -> Edit ->in permissions ->Click properties -> grant ->Read servicePrincipalName and -> Write servicePrincipalName

If your domain controller is Windows2012 grant Validate write to service principal name for startup account of SQL Server using Active directory user and computers snap in

4. From SQL Server error log I see SPN’s are registered successfully but still Kerberos authentication is failing. What is next?

Check if there are duplicate SPN’s registered in Ad using the LDIFDE tool. Below query will fetch all the SQL Server SPN’s from active directory and print in c:\temp\spnlist.txt.

Ldifde -f c:\temp\spnlist.txt -s YourDomainName -t 3268 -d "" -r "(serviceprincipalname= MSSQLSvc/*)"

Search for duplicate SPN in the output file (spnlist.txt). In our case SPN name is MSSQLSvc/node2.mssqlwiki.com:1433 .So if there are more than one entry in the output file for MSSQLSvc/node2.mssqlwiki.com:1433 then there is a duplicate SPN’s which has to be deleted.

5. How do I identify which SPN is duplicate?

In the output of the LDIFDE you will find the SAM accountName which registered the SPN, just above the ServicePrincipalName (Refer the sample below). If the SAM account is not the startup account of SQL Server then it as duplicate SPN.

{

sAMAccountName: NODE2$

sAMAccountType: 805306369

dNSHostName: NODE2.mssqlwiki.com

servicePrincipalName: MSSQLSvc/node2.mssqlwiki.com

servicePrincipalName: MSSQLSvc/node2.mssqlwiki.com:1433

}

6. There is a duplicate SPN in active directory how do I delete?

Use the setspn tool

Syntax: Setspn -D "MSSQLSvc/FQDN:port" "SAMAccount name which has duplicate SPN "

Setspn -D " MSSQLSvc/node2.mssqlwiki.com:1433" "DOMAIN\Accountname"

7. SPN’s are registered properly, there is no duplicate SPN but still the Kerberos authentication is not working ?

Run the KLIST exe from the client and check if it is able to get the ticket

Example:

Klist get MSSQLSvc/node2.mssqlwiki.com:1433

If the client is able to get the ticket then you should see a output similar to one below

{

c:\Windows\System32>Klist get MSSQLSvc/node2.mssqlwiki.com:1433

Current LogonId is 0:0x2de9f6

A ticket to MSSQLSvc/node2.mssqlwiki.com:1433 has been retrieved successfully.

Cached Tickets: (10)

}

If the client is unable to get the ticket then you should see an error similar to one below.

{

c:\Windows\System32>Klist get MSSQLSvc/node2.mssqlwiki.com:1433

Current LogonId is 0:0x2de9f6

Error calling API LsaCallAuthenticationPackage (GetTicket substatus): 0x6fb

klist failed with 0xc000018b/-1073741429: The SAM database on the Windows Server

does not have a computer account for this workstation trust relationship.

}

If the client is unable to get the ticket check if it not able to retrieve the ticket only the ticket for SQL Server (or) not able to get any tickets. You can use below commands

Klist get Host/FQDN of DC where SQLServer is installed

Klist get Host/FQDN of SQLServer Machine name

If all the tickets are failing then most probably the issue should be with DNS/Network setting, you can troubleshoot further based on the error you receive from klist or collect Netmon traces to troubleshoot further.

8. If the client is able to get the ticket and still Kerberos authentication fails?

Ping the SQL Server name and IP address (with –a ) and identify if it is able to resolved to fully qualified name DNS name, If it is not able to resolve to FQDN of SQL Server then fix the DNS settings

9. How to Collect Netmon traces and identify Kerberos authentication failure?

Wait for my next blog

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Connectivity, Security | Tagged: Cannot generate SSPI context, Error: 18456), Failure to register a SPN might cause integrated authentication to use NTLM instead of Kerberos, Login failed for user ‘NT AUTHORITY\ANONYMOUS LOGON’. (Microsoft SQL Server, login failed for user NT Authority Anonymous, SSPI handshake failed with error code 0x80090304 while establishing a connection with integrated security the connection has been closed, SSPI handshake failed with error code 0x80090311 while establishing a connection with integrated security the connection has been closed, The SQL Server Network Interface library could not register the Service Principal Name (SPN) | 40 Comments »

Transactional Replication Part -2

Posted by Karthick P.K on November 22, 2013

Transactional Replication Part -2 of transactional replication series covers

Demo of data flow, configuring distributor, publisher, publication, subscription etc. After watching this video you will be able to correlate the concepts we discussed in earlier video, configure transactional replication on your own , Understand different replication agents like snapshot agent, log reader agent and distribution agent and how to monitor these agents after the transactional replication is configured

How to configure transactional replication By Gaurav Mathur

Posted in SQL General | Tagged: How to configure transactional replication, Setup transactional replication, step by step guide to transactional replication | 3 Comments »

Transactional Replication Part -1

Posted by Karthick P.K on November 22, 2013

Transactional Replication Part -1 of transactional replication series covers about

1. Architecture and transactional replication data flow.
2. Different entities involved in transactional replication like Publisher Server, Distributor Server and Subscriber Server, publication, publication database, subscription, subscription database, articles are discussed in this video.
3. Replication agents involved in one way transactional replication and their usage of different agents like snapshot agent, log reader agent and distribution agent are also discussed
4. Steps involved in configuring transactional replication like configuring distributor, publisher and subscriber along with configuring publication and subscription are also discussed in this video.
5. Any DBA can look into this video and can learn the Transactional Replication Data flow, working and how to configure Transactional replication.

After watching the below video you can look at the Transactional replication Part 2 demo video which will help you to learn the above concepts practically and will enable you to configure replication on your servers.

Transactional Replication internals and architecture by Gaurav Mathur

Posted in Replication, SQL General | Tagged: How Transactional replication works, replication agents, replication data flow, Transactional replication architecture, Transactional Replication internals | 2 Comments »

Tempdb latch contention

Posted by Karthick P.K on September 17, 2013

You might see Page latch contention in tempdb when you repeatedly drop and create TempDb objects (Temp tables, table variables etc.).

When you notice PAGELATCH_* contention on tempdb (Wait resource in sysprocesses starts with 2: ) check if the latch wait is on PFS,GAM or SGAM page. When there is latch contention on tempdb you will see lot of sessions waiting on Pagelatch_* similar to one below.

In the below output session is waiting on resource 2:15:121320 . If we decode the wait resource it is 2: database id of tempdb , 15: file number , 121320 is page number. 121320 is in multiple of 8088 so it is a PFS page, similarly identify if the page we are waiting is GAM or SGAM page if it is not PFS page.

Wait type Wait resource

PAGELATCH_UP 2:15:121320

How to identify if page is PFS,GAM or IAM?

PFS Page: A PFS page occurs once in 8088 pages. SQL Server will attempt to place a PFS page on the first page of every PFS interval(8088Pages). The only time a PFS page is not the first page in its interval is in the first interval for a file. File header page is first, and the PFS page is second. (Page ID starts from 0 so the first PFS page is at Page ID 1). If (page number)/8088 is round value then the page is PFS page.

GAM Page: GAM page is page 2 in the data file, next GAM page is placed at 511230 Page after first GAM page (GAM interval). If (page number-1)/511230 is round value then the page is GAM page.

SGAM Page: SGAM page is page 3 in data file , next SGAM page is placed at 511230 Page after first SGAM page. If (page number-2)/511230 is round value then the page is GAM page.

How to resolve?

1. Increase the number of TEMPDB data files files and size them equally. As a general rule, if the number of logical processors is less than or equal to 8, use the same number of data files as logical processors. If the number of logical processors is greater than 8, use 8 data files and then if contention continues further increase the number of data files by multiples of 4 (You may not see improvement once you reach 32 files).

2. Enable server side trace flag 1118.

3. If you further see latch contention on PFS page after following above two steps then the only option is to modify your application to limit the tempdb usage.

4. If you see contention on 2:1:103 (Page 103 is for system table sys.sysmultiobjrefs. This table manages the relationship between created objects in every database). The only way to reduce contention on this page is reduce the relation. Example creating lot of temp tables with primary key can cause this contention because the relation between the table and PK constraint has to be updated in sys.sysmultiobjrefs.

What’s the best practice ?

1. Create multiple tempdb data files instead of creating 1 large file and size them equally in all your SQL Server instances.

2. Make TF1118 (Uniform allocation) as default. (Extra space required by this trace flag shouldn’t really matter as amount additional space required is minimal and storage cost is not that high these days).

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Performance, Space management, SQL General, SQL Server Engine | Tagged: PAGELATCH_up, Wait resource 2:1:1, Wait resource 2:1:103, Wait resource 2:1:2, Wait resource 2:1:3 | 1 Comment »

Troubleshooting Transactional replication Latency using Agent Statistics

Posted by Prabhakar Bhaskaran on September 13, 2013

Troubleshooting latency issues in replication is black box for many DBA’s, In this post I will explain how you can leverage the agent statistics to troubleshoot the latency issues.

Before understanding how to decode the agent statistics, lets take a look at the some of the basic things which will help us to troubleshoot the replication performance issue in better way.

The following MSDN diagram depicts the transactional replication architecture in simple manner.

Transactional replication components and data flow

Troubleshooting latency issues is multi step approach, first step is identify which agent is slow,

Log reader Agent (Publisher to Distributor)
Distribution Agent (Distributor to Subscriber)

So, the problem can be either log reader or distribution agent, we can identify this by just simply inserting the tracer token.

Once we find out the problematic agent the next step is to identify within the agent which particular thread causing the issue.

Let me introduce you to the important threads and its work on these replication agents in nutshell.

Log Reader Agent

Reader Thread – It scans the publisher database transaction log using sp_replcmds

Writer Thread – Add the queued transactions to Distribution database using sp_MSadd_repl_commands

Distribution Agent

Reader thread – It finds the watermark from the table Msreplication_subscriptions(on subscriber) and uses this information to retrieve pending commands from the Distribution database. It basically uses the stored procedure sp_MSget_replcommands to achieve it.

Writer thread – Writer thread uses the Batched RPC calls to write the information to subscriber database.

Now that we understood the threads in the replication agents. let’s assume we already identified which agent is slow by inserting tracer token. Next is to dig deeper on thread level, this is where our replication agent statistics comes to rescue us.

Agent statistics entries appended to history tables every 5 minutes by default. It provides the historical view of how the agent has been performing and keeps the last 3 days data. You can keep for more days by changing the history retention period.

MSlogreader_history

MSdistribution_history

the above two tables are located in Distribution database. The statistics information is added as XML blob in comments column of these tables.

Now, lets take a look at how to decipher this XML Data for each agents.

Log Reader Agent statistics

– State = 1 means stats after batch commit

–Work = cumulative time spent by the agent since restart – idle time

–Idle = Time spent waiting to call sp_replcmds

–Reader fetch = Time to do execute sp_replcmds

Wait = Time spent waiting on writer to release buffer

–Writer write = Time spent writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer

Note: Each thread will have their own buffer with 40k in size.

Here,we need to look at the wait time to understand where the bottleneck exist.For example, if you notice wait time for Reader thread is high then it essentially means your writer thread is slow since reader thread is waiting for writer to release the buffer. Similarly, if you notice high wait time for writer thread then your reader thread is performing slow.

The simple way to decode this is,

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the commands to distribution database)

HIGH Wait time on Writer thread = Reader thread is slow ( thread which scans the transaction log)

Distribution Agent Statistics

<stats state=”1″ work=”154″ idle=”351464″>
<reader fetch=”144″ wait=”11″/>
<writer write=”12″ wait=”338″/>
<sincelaststats elapsedtime=”305″ work=”10″ cmds=”81262″ cmdspersec=”8041.000000″><reader fetch=”0″ wait=”9″/><writer write=”10″ wait=”0″/></sincelaststats></stats>
– State =1 means stats after a batch commit

– Work = cumulative time spend by the agent since restart – idle time (seconds)

– Idle = Time spend waiting to call sp_msget_repl_commands

– Reader fetch = Time to do execute sp_msget_repl_commands

Wait = Time spent waiting on writer to release buffer.

– Writer write = Time spend writing commands into distribution database

Wait = Time spent waiting on reader to populate buffer.

Similar to log reader agent, the decoding of wait time is same way we did for log reader agent.

HIGH wait time on Reader thread = Writer thread is slow ( thread which writes the subscriber database using batched RPC Calls)

HIGH wait time on Writer thread = Reader thread is slow ( thread which takes the pending commands from Distribution database)

Distributor Writer thread Slow Scenario

We would be able to understand this concepts better by looking at the example statistics, In this below case, I explicitly started the transaction on subscriber table to simulate blocking at the subscriber side making the writer thread of distribution agent to wait and build up latency.

This is how stats looked,

<stats state=”1″ work=”755″ idle=”354505″>
<reader fetch=”153″ wait=”604″/>
<writer write=”613″ wait=”346″/>
<sincelaststats elapsedtime=”636″ work=”515″ cmds=”45033″ cmdspersec=”87.000000″><reader fetch=”0″ wait=”515″/><writer write=”515″ wait=”0″/></sincelaststats></stats>

We can clearly see Reader thread wait time is high(515) which means writer thread is slow since we simulated the blocking on subscriber side.

Similarly,we can simulate the blocking on replication tables msrepl_commands and msrepl_transactions which will cause Log reader writer thread to be slow and stats will show Reader thread wait time as high.

Ok, now we isolated the source of bottleneck in thread level, After this we can just follow the standard performance troubleshooting approach described in this Whitepaper to troubleshoot the slowness of the replication session.

For instance, check out the video where Joe Sack talks about using Extended events to troubleshoot the Distributor writer thread slowness.

In Summary

1. Find which agent is causing slowness using tracer token.

2. Leverage the Agent statistics to narrow down problem to thread level .

3. Follow standard performance troubleshooting approach to resolve the issue.

Thanks for reading! I hope this will help you to troubleshoot the replication performance better next time.

Posted in Performance, Replication, SQL General | Tagged: Agent statistics, latency, Replication, replication latency, replication performance, Transactional replication | 2 Comments »

The connection to the primary replica is not active. The command cannot be processed

Posted by Karthick P.K on June 20, 2013

When you configure SQL Server always on available group from management studio it may fail with below error while joining secondary replica to the availability group.

Error 1

{

Joining database on secondary replica resulted in an error. (Microsoft.SqlServer.Management.HadrTasks)

——————————

ADDITIONAL INFORMATION:

Failed to join the database ‘AG’ to the availability group ‘AG1’ on the availability replica ‘NODE2’. (Microsoft.SqlServer.Smo)

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

——————————

The connection to the primary replica is not active. The command cannot be processed. (Microsoft SQL Server, Error: 35250)

}

Error 2

{

TITLE: Microsoft SQL Server Management Studio

——————————

Failed to join the instance ‘NODE2’ to the availability group ‘AG1’. (Microsoft.SqlServer.Management.SDK.TaskForms)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=11.0.2100.60+((SQL11_RTM).120210-1917+)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&LinkId=20476

——————————

ADDITIONAL INFORMATION:

Failed to join local availability replica to availability group ‘AG1’. The operation encountered SQL Server error 41106 and has been rolled back. Check the SQL Server error log for more details. When the cause of the error has been resolved, retry the ALTER AVAILABILITY GROUP JOIN command. (Microsoft SQL Server, Error: 41158)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=11.00.2100&EvtSrc=MSSQLServer&EvtID=41158&LinkId=20476

}

You may get below error when you configure AG availability group using alter database command mentioned below or synchronization might fail with 35250 error mentioned below.

ALTER DATABASE [AG] SET HADR AVAILABILITY GROUP = [Group name];

Error 1

Msg 35250, Level 16, State 7, Line 1

The connection to the primary replica is not active. The command cannot be processed.

To resolve above errors

1. Ensure always on endpoint ([Hadr_endpoint]) are not blocked by firewall (Default port 5022).

2. Make sure startup account of primary server is added to all secondary server’s and Startup accounts of all secondary servers are added to primary servers.(Startup account of each replica to be added to other replica’s)

3. If log on account of SQL Server is “Nt service\” or local system account then ensure system account (Domainname\systemname$) of each replica is added to other replicas.

{

CREATE LOGIN [MSSQLWIKI\node2$] FROM WINDOWS

}

4. Grant connect on always on endpoints created on each replicas for startup account of other replica servers (Grant connect on endpoints even if startup account of other replicas are added as sysadmins).

{

GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [MSSQLWIKI\node1$]

}

5. Make sure SQL Server name (select @@servername) matches with hostname.

6. Make sure cluster service startup account is part of SQL Server logins (More details in This link).

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Always On, Configuration, Connectivity, Security, SQL General | Tagged: Error: 35250), Failed to join local availability replica to availability group 'AG1'. The operation encountered SQL Server error 41106 and has been rolled back, Failed to join the database '' to the availability group '' on the availability replica, Joining database on secondary replica resulted in an error, The connection to the primary replica is not active. The command cannot be processed., The connection to the primary replica is not active. The command cannot be processed. (Microsoft SQL Server, The operation encountered SQL Server error 41106 and has been rolled back | 28 Comments »

False warning “A significant part of sql server process memory has been paged out”

Posted by Karthick P.K on June 13, 2013

In A significant part of SQL Server process memory has been paged out we discussed about SQL Server working set trim warning, when they can occur and how to troubleshoot them.

I the same blog I mentioned SQL Server will log “A significant part of sql server process memory has been paged out” warning when working set reaches 50% or below of the overall committed memory by SQL Server memory manager. In this blog I will try to cover when this warning could be a false warning and how to identify it.

Let us recollect what is committed memory and different states of committed bytes in windows.

Committed: Total memory that is allocated by process (allocated bytes can be in RAM or Page file)

Committed working set: Committed memory which is currently in RAM

Committed Paged : Committed memory which is currently page file

Committed Mapped : Committed mapped to page file.

Committed untouched: committed memory which is never accessed (When a page is committed in windows it will never become working set unless accessed).

Let us understand what Committed untouched is. Download Memoryallocator exe from This link and Keep committing memory using the same exe.

You will notice that the committed memory of the Memoryallocator process increases, but the physical memory usage (RAM usage or Working set) (or) Page file usage will not increase at all. Only the committed memory of the process and committed memory of overall system increases.

Why?

When a page is committed in windows it will not become part of working set or page file unless it is accessed.

Similarly when SQL Server estimates the memory requirements of different clerks and allocates them during startup or on need. These allocated memory is part of committed memory but will not have a page in RAM or Page file unless accessed for the first time.

So during this condition SQL Server’s working set can go far below the committed bytes and once working set reaches 50% or below of overall committed bytes then ““A significant part of sql server process memory has been paged out” warning is logged in SQL Server errorlog.

How do you identify if this warnings are false warnings?

We can identify if these warnings are false using the SQL Server memory dump or using the Perfmon counters.

Let us stimulate a false warning situation using the below backup query and see how to identify if the warning is false.

Run the below query in your test system.

Note: If you do not get the warning message increase the buffer count in below query. If you get “There is insufficient system memory in resource pool” then reduce the buffer count.

WARNING: dumptrigger and below trace flag’s are undocumented and should be used only in test environments with caution (or) under Microsoft Support supervision. There is no guarantee that they will work in future versions of SQL Server.

DBCC TRACEON(8026,-1) --Trace flag –T8026 tells dump trigger to remove the trigger after the first dump has been triggered.
go
DBCC DUMPTRIGGER('SET',17890)
go
BACKUP DATABASE MSDB TO DISK = N'msdb.BAK' WITH NOFORMAT, INIT,NAME = N'msdb', SKIP, NOREWIND, NOUNLOAD, STATS = 1 ,BUFFERCOUNT = 10000,BLOCKSIZE = 65536 ,MAXTRANSFERSIZE=2097152

Once you run the above backup command you will see error: 17890 “A significant part of sql server process memory has been paged out” and a mini memory dump will be created in Errrorlog folder along with SQLDump00nn.txt

Using memory dump

Open the SQLDump00nn.txt and review the memory section in SQLDump00nn.txt. This will give you the system memory information when the error occurred.

Snippet from my SQLDump00nn.txt.

{

Memory

MemoryLoad = 26%

Total Physical = 131067 MB

Available Physical = 96691 MB

Total Page File = 393201 MB

Available Page File = 334217 MB

Total Virtual = 8388607 MB

Available Virtual = 8166328 MB

**Dump thread – spid = 0, EC = 0x00000020F19C2B90

***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL11.RBS\MSSQL\LOG\SQLDump0048.txt

}

In the above output “Available Physical is 96,691 MB” which indicates there is no physical memory pressure when SQL Server raised 17890 warning so widows is not trimming the working set and obviously we can come to conclusion that this instance of warning is false warning.

Note: Above method may not work well in earlier versions of windows in which working set of all processes are hard trimmed when there memory pressure in the system.

Using perfmon counters

In below perfmon graph I have collected three counters

1. Process\SQLServr\Working set (highlighted Black)

2. Process\SQLServr\Private bytes (Committed memory. Green line)

3. Memory\AvailableMbytes (Red line)

I you review your SQL Server error log you would notice 17890 warning at the same time when Private bytes (Green line) spiked.

How to conclude that the warning is printed because of “untouched committed pages” by SQL Server.

In general when a page is committed and accessed it will be part of working set as long as there is enough available memory on the system. If you look at below graph you will notice that the private bytes (committed) is increasing but the working set is not increasing at same phase though there is adequate available memory in the system. This can happen only when the pages are committed and not accessed (If you look at below graph carefully committed memory increased and dropped with in 10 seconds so when you configure Perfmon choose sample rate every 1 second else perfmon might miss the data and you will find some thing like this happened).

Trouble shooting working set trim “A significant part of SQL Server process memory has been paged out”

SQL Server lock pages in memory should I use it?

SQL Server memory leak

What is new in SQL Server 2012 Memory

How to set max server memory and min server memory

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Memory, SQL Server Engine, SQL Server memory | Tagged: A significant part of sql server process memory has been paged out. This may result in a performance degradation, False working set trim warning, significant part of sql server process memory has been paged, SQL Server is paged, SQLserver working set | 6 Comments »

What does MemoryUtilization in sys.dm_os_ring_buffers and Memory_utilization_percentage in sys.dm_os_process_memory represents?

Posted by Karthick P.K on June 2, 2013

Few days back someone asked me an interesting question. Why memory_utilization_percentage (working set ) is 100 % when Virtual_address_space_committed_Kb ( committed ) is around 10 GB and Physical_memory_in_use_kb is just 1.7 GB (refer below image)?

Physical_memory_in_use_kb is Memory allocated by the SQL Server process which is currently in RAM. (This includes AWE and Large pages allocation).

Virtual_address_space_committed_Kb is total memory that is allocated by process (allocated bytes can be in RAM, page file, mapped or in not used state)

Memory_utilization_percentage is ratio between Physical_memory_in_use_kb and Memory allocated by SQL Server using SQL Server memory manager(derived from dm_os_memory_nodes). If the Memory_utilization_percentage is greater than 100% then it is capped to 100% . Memory used by external components in the SQL Server address space is not considered while SQL Server derive the memory utilization percentage.

Memory_utilization_percentage = (Physical_memory_in_use_kb/Memory allocated by SQL Server using SQL Server memory manager) * 100

To reproduce the above behavior download VirtualallocLeak.dll from THIS link and copy to ‘C:\EXE\’ folder.

Execute the below script and then query the sys.dm_os_process_memory DMV

select * from sys.dm_os_process_memory
exec sp_addextendedproc  'VirtualallocLeak','C:\exe\VirtualallocLeak.dll' 
exec VirtualallocLeak –This allocates 1048576 bytes per execution
go 300
select * from sys.dm_os_process_memory

Review the memory_utilization_percentage , Virtual_address_space_committed_Kb and Physical_memory_in_use_Kb

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in SQL Server Engine, SQL Server memory | Tagged: dm_os_process_memory, Memory utilization percentage, MemoryUtilization SQLServer, SQL Server working set percentage | 8 Comments »

SQL Server monitor

Posted by Karthick P.K on May 11, 2013

Every SQL Server DBA would have faced situations similar to SQL Server not accepting connections for few minutes, SQL Server not responding for few minute or Applications not able to connect with SQL Server for few minutes. Before DBA’s gets alerted about the situation and starts troubleshooting the issue everything becomes normal. Challenge in this situations is it becomes very difficult to understand where the underlying problem was, It could be a network connectivity, Application server problem or It might be an issue with SQL Server itself. How do we collect diagnostic data to prove that SQL Server was stable at the time of issue (or) If the issue is with SQL Server then how to collect data we need for diagnosing the issue?

You can use SQL Monitor to monitor SQL Server instances

SQL Server Monitoring exe monitors the SQL Server services and creates diagnostic data and memory dump if SQL Server service is down (or) If SQL Server is not accepting connections (or) If SQL Server is not responding to Queries

How it works?

SQL Monitor checks the SQL Server in 3-Phases

1. Check the status of all the SQL Server service through the windows service control manager every 60 seconds.

2. If the service is running then check if SQL Server is accepting connections every 60 seconds.

3. If SQL Server is accepting Connections then probe to perform a simple query and see if SQL Server is responding properly.

4. If any of the SQL Server is not accepting Connections then connect to SQL Server using DAC, take a filtered stack dump which will be stored in errorlog directory of the instance , executes custom diagnostic script (c:\sqlmonitor\failoveranalysis.sql) and stores the output in c:\SQLmonitor\ with name “Servername+instancename.txt” which can be used to identify if there is any issue in SQL Server.

5. Once dump is taken release the DAC connection and wait for some time before we attempt to connect again. If connection is successful during subsequent attempt SQLMonitor.exe will continue monitoring the instance but if the connection fails again a new dump is generated and new diagnostic data is collected and appended to Servername+instancename.txt file in SQLMonitor folder.

There will be a gap of X minute between each diagnostic data and stack dump collection when the issue is continuing where X is (Number of Diagnostic data/dump already collected for this instance * Number of Diagnostic data/dump already collected for this instance)

How to Configure?

1. Create a folder called SQLMonitor in C:\

2. Create a Text file called serverlist.txt and enter all the SQL Servers in your environment to be monitored in below format.

Format:

Servername [TAB] Servicename;

Ex:

Server1 MSSQLServer;

Server2 MSSQL$Prod;

3. Invoke command prompt and open SQLmonitor.EXE.

Advantage:

1. Multi-threaded each server and service is verified using its own thread so retrieving information from one server will not affect the pooling interval to other server.

2. Single exe can be scaled to monitor more than 1000 servers and 1000 services.

3. Uses few MB of memory and system resources.

Requirements:

1. This exe can be invoked from any of the client systems with SQL Server client tools and SQL native clients installed.

2. Remote DAC connection has to be enabled in SQL Servers which are monitored.

3. EXE should be invoked under credential of user who has access to all the SQL Servers which are monitored and permission to view service control manager of windows servers in which SQL Server is running.

You can Download SQLMonitor.exe from this link

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Connectivity, SQL Server Engine, SQL Server Tools | Tagged: Monitoring SQL Server, sql server monitoring, SQLServer connectivity failure | 4 Comments »

Max server memory – Do I need to configure?

Posted by Karthick P.K on April 22, 2013

Do I need to configure Max server memory and min server memory? What is the right value for this configuration and how to determine it?

There are many debates around this and above questions are raised by many SQL Server DBA’s frequently in many forums. If you ask me , “It depends” on various factors.

Before we choose to configure or leave this value to default it is very important to understand how SQL Server grow and shrink its memory usage based on the available memory in operating system even when Max server memory is not configured or defaulted.

How SQL Server grow and shrink its memory usage based on the available memory in operating system even when Maximum server memory is not configured or defaulted?

SQL Server memory management is designed to dynamically adjust its memory usage based on the amount of available memory on the system. SQL Server will keep allocating memory based on its need as long as there is memory available I.e. as long as MEMPHYSICAL_HIGH (HighMemoryResourceNotification )notification is signaled in widows and will scale down its usage when there is MEMPHYSICAL_LOW (LowMemoryResourceNotification) signaled in windows. When available memory is between the low memory and high memory SQL Server will try to maintain the memory usage stable( RESOURCE_MEM_STEADY) with some exceptions.

You can download the ResourceNotificationHighandLow.exe from This link to see memory notifications from windows.

The default level of available memory that signals a LowMemoryResourceNotification event is approximately 32 MB per 4 GB, to a maximum of 64 MB. (By default, the threshold is 64mb on most systems).

The default level that signals a high-memory-resource notification event is three times the default low-memory value (By default, the threshold is 64*3=192 MB on most systems).

Key points:

1. Once the available memory on the system goes below 192 MB HighMemoryResourceNotification (MEMPHYSICAL_HIGH) signal is revoked by windows and SQL Server will not grow its Bpool.

2. Once the available memory on the system goes below 64 MB LowMemoryResourceNotification (MEMPHYSICAL_low) is signaled by windows and SQL Server will shrink its Bpool (reduce its memory usage).

3. When the available memory in the system is between 192Mb and 64 Mb (I.e between LowMemoryThreshold and HighMemoryThreshold) SQL Server will not grow or shrink its usage (With some exceptions which we will see in a while)

Note: So unless there is an crazy application in the system that keeps allocating and releasing memory in Zigzag fashion making windows trigger HighMemoryResourceNotification and LowMemoryResourceNotification one after the other SQL Server will not grow and shrink its memory usage in Loop continuously. If there are such application in system then even configuring max server memory may not help.

The default Low memory threshold 64MB may not be ideal for all systems. Ex: Let as assume an application is requesting 150MB of memory suddenly when the available memory is 190 MB and the grant is successful. Available memory will now drop to 40 MB making windows signal the LowMemoryResourceNotification. SQL Server will start responding to the LowMemoryResourceNotification from windows but at the same time windows working set manager will also start trimming the working set of all the processes. Which will bring down the overall performance of the system.

We can increase the LowMemoryThreshold value by making the following registry changes If LowMemoryThreshold set to higher value OS will notify applications such as SQL on low memory conditions much earlier and SQL Server can respond to memory pressure much early before the system starves for memory and before windows working set manger starts trimming the working set of all the processes.

In Regedit -> go to

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\SessionManager\MemoryManagement

Right click on the right pane,

Select New -> select click DWORD Value -> enter LowMemoryThreshold

Double Click LowMemoryThreshold -> value (choose decimal) -> 512

System Reboot is required to take effect.

In the above example I have set the LowMemoryThreshold to 512 MB hence the MEMPHYSICAL_LOW notification will be signaled as soon as the available memory drops to 512MB and HighMemoryResourceNotification (MEMPHYSICAL_HIGH) will be in signaled state till the available memory is 1536MB (LowMemoryThreshold *3).

After making the above change SQL Server will grow its Bpool memory till the available memory in the system is greater than 1536 MB and as soon as the available memory drops below 1536MB HighMemoryResourceNotification signal will be revoked by windows causing SQL Server to maintain steady state and will not grow its memory usage further but that doesn’t mean SQL Server will wait for the LowMemoryResourceNotification notification to scale down its memory usage after the HighMemoryResourceNotification notification is revoked. SQL Server will always try to keep the available physical memory in the system high (I.e. SQL Server will try to keep the available memory in system to HighMemoryThreshold (LowMemoryThreshold * 3 ).

What if I have multiple instances of SQL Server on same server and how they load balance the memory among themselves?

SQL Server will try to balance to balance its memory usage with other instances of SQL Server running on the same box . As I mentioned earlier SQL Server will try to maintain the available memory on the system to High memory threshold. SQL Server Lazy writer checks If there is disk reads performed in last 10 seconds and if there is no reads for last 10 seconds then SQL Server will reduce its memory usage until HighMemoryResourceNotification is signaled by OS.

Let us see this with an example :

Let us assume there are 2 SQL Server instances running on server with 32 GB of RAM and Lowmemorythreshold is set to 512MB on the system (so HighMemoryThreshold is 1536 (Lowmemorythreshold *3)).

1. When the OS starts HighMemoryResourceNotification is set to on because there is adequate available memory on the server.

2. SQL Server instance 1 starts first and it will consume memory till the HighMemoryResourceNotification resource notification is revoked ( HighMemoryResourceNotification will be revoked when available memory drops below 1536 MB).

2. Now the 2nd SQL Server instance is started and it finds High memory resource notification is revoked so it will not increase its memory usage.

3. Lazy writer thread of 1st instance checks if there is any disk reads performed by 1st Instance in last 10 seconds , If there is no disk reads then first instance will scale down its usage until HighMemoryResourceNotification is signaled by OS (HighMemoryResourceNotification will be signaled again when available memory becomes 1536 MB).

4. 2nd Instance which is hungry for memory sees the High memory resource notification and starts growing its usage till the high memory notification is revoked. Once the high memory notification is revoked 2nd instance will stop growing.

5. 1st instance finds the high memory notification is revoked and will again check if there are any disk reads in last 10 seconds and if there are no reads then It will further scale down till there is high memory resource notification.

6. Once the high memory is signaled 2nd instance will start growing again.

7. Over a time each instance will very well balance their memory requirements among themselves. ( I.e. if there is read performed from disk with in last 10 seconds we assume there is additional memory requirement for the instance so it will not scale down while on other hand if there is no reads for more than 10 seconds and if the memory available is below the high memory threshold instance it will scale down to give memory for other instance)

8. Instance with higher memory requirement will be consuming more memory than the instance with low memory requirement in some time. This way both the instances will balance their memory requirements with each other.

Note:

1. Above logic may not fit well if the total Physical memory on the system is very low compared with the memory requirements of multiple SQL Server instances running on the system because if you start the second instance while the first SQL Server is running with full memory utilization but still performing lot of reads I.e. RESOURCE_MEM_STEADY and still lot of reads , second instance may take long time or may not scale up its memory usage soon. In such case you can cap the max server memory but the performance of SQL Server will be very poor because of memory contention.

2. Also be cautious when you increase the value of LowMemoryThreshold beyond 512 MB. Increasing this threshold increases the range of memory that is available where neither the LowMemoryResourceNotification or HighMemoryResourceNotification object is signaled ( RESOURCE_MEM_STEADY). So when you have multiple instance , if you start the second instance while the first SQL Server is running with full memory utilization and with lot of reads I.e. RESOURCE_MEM_STEADY + reads continuously , second instance may take time scale up its memory usage soon and chances of getting Lowmemorythreshold is low because of wider range of RESOURCE_MEM_STEADY

FAQ:

1. What will happen when MTL allocations increases?

Available memory in system drops when the MTL consumption increases. If the MEMPHYSICAL_HIGH is set then there will not be any effect to bPool. If MTL consumption increases drastically it might cause available memory to drop further causing windows to trigger LowMemoryResourceNotification.

If LowMemoryThreshold is siganled SQL Server will scale down its bPool usage.

2. Will windows working set manager starts trimming the working set of all processes as soon as the LowMemoryResourceNotification is signaled?

No.

3. What are the other effects of changing LowMemoryThreshold?

There might be other application and drivers which is also using memory notification from windows to grow and shrink memory usage. They will also shrink and grow when there is notification from windows.

4. Why would I need to CAP my SQL Server memory when we have a great dynamic mechanism in SQL Server to grow and shrink its memory usage?

You can leave the max server memory as default If your operating system is Windows 2008 or above and if you have all the fixes in This link and This link and if you do not have any faulty drivers or applications which will request large amount of memory suddenly and if you are not using large pages memory model else I would suggest capping the Max server memory

If you have decided to configure the Max server memory remember it will not control the overall memory used by SQL Server. There are significant changes in memory allocations controlled by Max server Memory between SQL Server2012 and earlier versions. Let us understand what allocations it controls in SQL Server 2012 and earlier versions of SQL Server

What is controlled by SQL Server Max Server Memory (Extract from SQLServer2012 Memory) ?

SQL Server memory is internally divided in to two regions known as BPOOL and NonBPool (aka MTL or MTR) More details about BPOOL and MTL can be found in This blog.

In earlier versions of SQL Server (Till 2008 R2) “Max Server Memory” controlled the Maximum physical memory Single page allocator (BPOOL) can consume in SQL Server user address space.

Only the single page allocator was part of BPOOL and Max server memory controlled only BPOOL, so the following allocations came outside BPOOL (Max server memory)

1.Multi-Page allocations from SQL Server [These are allocations which request more > 8 KB and required contiguous memory]

2.CLR allocations [These include the SQL CLR heaps and its global allocations created during startup]

3.Memory used for thread stacks within SQL Server process (Max worker threads * thread stack size). Thread stack size is 512K in 32 bit SQL Server, 904 K in WOW mode and 2 MB in 64-Bit

4.Direct windows allocations made by Non-SQL Server dll’s ([These include windows heap usage and direct virtual allocations made by modules loaded into SQL Server process. Examples: allocations from extended stored procedure dll’s, objects created using OLE Automation procedures (sp_OA calls), allocations from linked server providers loaded in sqlserver process)

SQL Server 2012 memory manager has now clubbed single page allocator and multipage allocator together as any-size page allocator . As a result, the any-size page allocator now manages allocations categorized in the past as single page and Multi-Page allocations.

1. "max server memory" now controls and includes “Multi pages allocations”.

2. In earlier versions of SQL Server CLR allocated memory was outside BPOOL (Max server memory) . SQL Server 2012 includes SQL CLR allocated memory in "max server memory".

SQL Server 2012 "max server memory" configuration does not include only the following allocations:

1. Memory allocations for thread stacks within SQL Server process

2. Memory allocation requests made directly to Windows [Ex: Allocations (Heap, Virtualalloc calls ) from 3rd party Dll’s loaded in SQL Server process , objects created using OLE Automation procedures (sp_oa) etc]

Hope you got clarity on allocations controlled by Max server memory , Let us see how to set it.

How to set correct value for SQL Server Max server memory?

There is no magic formula for this. Estimate the memory required by other applications running on same server, Operating system, Drivers , SQL Server Non- bPool allocations, jobs, anti virus etc.. Make sure you have acceptable available physical memory even when the system is under heavy load.

1. Consider the operating system memory requirement.

Approximately 1 GB (Would increase if it is DC, cluster etc.)

2. Consider the memory requirements by other applications/processes running on the server.

You have to derive it based on applications/processes/AV’s running on the system and their memory requirements. (Perfmon Process-> Private bytes and Working set can help)

3. Consider the memory requirements of the drivers/firmwares.

You have to derive it based on memory requirements by drivers installed on the system. (RAMMAP can help)

4. Consider the NonbPool (aka MTL or MTR) memory requirements by SQL Server.

select sum(multi_pages_kb)/1024 as multi_pages_mb from sys.dm_os_memory_clerks

(You can skip above query if your SQL Server version is 2012)

Max worker threads * 2MB

Memory for direct Windows allocations approximately 0 to 300 MB in most of the cases but you may have to increase it if there are many 3 party components loaded in SQL Server process (Including linked server dll’s, 3^rd party backup dll’s etc.)

If you are using CLR extensively add some additional memory for CLR.

5. Consider the memory requirement by jobs (Including replication agents, Log shipping etc. ) and packages that will run on the server.

You have to derive (May vary from few Mb’s to GB’s)

6. Consider SSAS and RS memory requirements.

You have to derive

7. Make sure there is good enough free space for operating system.

Approximately (100 MB for each GB till 4G) + (50 MB for each additional GB till 12GB) + (25 MB for each additional GB till your RAM size)

8. Other memory requirements.

If you have any other memory requirement specific to your environment.

Once you have calculated a reasonable value for all the above memory requirements take the sum of all the above requirements and deduct it with total physical memory to derive an ideal value for your max server memory.

Max server memory= Total physical memory – (1+2+3+4+5+6+7+8)

If you still see LowMemoryResourceNotification or working set below 100% frequently then use This exe which will print the memory information of all the processes and system wide memory information (Global memory status) when the operating system signals low memory notification. Once you get the output from the exe when there is LowMemoryResourceNotification review requirements of each process and tweak Max server Memory accordingly.

Important: Make sure you have this fix if you are on windows2003 http://support.microsoft.com/kb/938486

What about Min server memory and should I configure it?

I mentioned earlier that when LowMemoryResourceNotification comes from Windows or HighMemoryResourceNotification is revoked+No reads for 10 seconds , SQL Server scales down its memory usage.

How much it scales down?

Until “Minimum server memory” is reached (If there is continuous memory pressure on the system).

What happens when you set Max server memory and min server memory to same value?

SQL Server will never scale down its memory usage even when there is memory pressure system wide (Lowphysicalmemory notification set at system level). Note: This setting does not affect OS from paging.

What are the affects?

When there is LowMemoryResourceNotification If LPM is not enabled SQL Server’s working set (Bpool + Non bPool )will be paged. If LPM is enabled system will starve for memory and non-bpool will be paged.

If you do not want SQL Server to scale down its usage when there is LowMemoryResourceNotification in windows configure Min server memory and Max server memory to same value (Bad choice).

If you want to limit “how much SQL Server wants to scale down“ you can configure this value.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: how to set max server memory, How to set min server memory, max server memory, Minimum server memory, sql server maximum server memory, sql server memory configuration | 39 Comments »

SQL Server and VMware ballooning

Posted by Karthick P.K on March 31, 2013

VMware and SQL Server performance

If you are running production SQL Server on VM-Ware double check if you have configured/disabled ballooning for the virtual machine in which SQL Server is running.

What is ballooning? Method by which VMware host can reclaim memory from the Virtual machines.

Is it really bad to give memory from Guest to Host ? My opinion is yes if you are running production SQL Server on VM-Ware.

Why I think it’s bad? SQL Server is memory intensive application and requires adequate memory for smooth running. If SQL Server doesn’t have adequate memory to run you see poor response time, Resource_semaphore/ Resource_semaphore_Query_compile waits, increased I/O, OOM errors, Non-Yield condition’s etc. In addition to all this when memory is reclaimed from virtual machines available memory in windows drops triggering windows to page out the working set of all the processes and you will face all side affects discussed in A significant part of SQL Server process memory has been paged out .

In worst case it is better not to give memory for SQL Server instead of give and take back. Remember Max server Memory is also a factor which will impact the generation of execution plan by the optimizer, So plan generated when you have X GB of max server memory may not be the right plan to use when you have Y GB actual memory after ballooning reclaimed memory from guest OS.

What if hypervisor runs low in physical memory? It gives a hint that you have a poor consolidation. You can pick up the other Virtual machines that are not hosting production SQL Servers (or SQL Servers) from same Hypervisor and tweak reservations (or) Increase the maximum memory that can be reclaimed when the hypervisor is under memory pressure.

What if I don’t disable ballooning for my production SQL Server? Ballooning can slowly take the memory from virtual machine in which SQL Server is hosted and can cause all the problems I mentioned above.

To make things confusing when you look at task manager you may not even realize that ballooning has reclaimed memory from Guest OS, Because total physical memory shown in performance tab includes the memory taken by ballooning driver.

How to identify > Look at the driver locked memory from RAMMAP sysinternals tool. (VM memory performance counters can also be used)

Some of the RAMMAP output captured in production SQL Servers can self-explain. Driver locked value would be few MB’s in normal systems, If the value is very high in VMware virtual machines then you can assume ballooning is reclaiming the remaining memory .

Below is output of RAMMAP from Virtual machine with 12 GB memory hosting SQL Server with max server memory capped to 8GB.

Driver locked is around 8GB. So the system is running with less than 4Gb of RAM and how much is for SQL Smile ?

Below is output of RAMMAP from Virtual machine with 24 GB memory hosting SQL Server with max server memory capped to 20GB.

Driver locked is around 16 GB. So the system is running with less than 8Gb of RAM and how much is for SQL Smile ?

How to disable ballooning ? Refer http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002586

You don’t agree ? I respect your view but my view is different Smile .

Note: I have not recommended to disable ballooning in every virtual machines. I recommend to do it for your performance sensitive SQL Server and if you find your hypervisor is running low in memory revisit your consolidation (or) Configure other non critical virtual machines running on same host in such a way that hypervisor can reclaim memory from them when it is under low memory condition.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

Posted in Performance, SQL Server Engine, SQL Server memory | Tagged: Driver locked is high, SQLServer and VMware ballooning, SQLServer on VMware, SQLServer slow in VMware, VMware driver locked | 4 Comments »

Inside sys.dm_os_ring_buffers

Posted by Karthick P.K on March 29, 2013

Sys.dm_os_ring_buffers DMV can be used to troubleshoot connectivity errors, track exceptions, monitor system health, memory pressure, Non-yielding/Deadlocked schedulers and a lot more.

You can use below scripts to query the data from sys.dm_os_ring_buffers during troubleshooting.

USE master
go
SET NOCOUNT ON
SET QUOTED_IDENTIFIER ON
GO
PRINT 'Start Time: ' + CONVERT (varchar(30), GETDATE(), 121)
GO
PRINT ''
PRINT '==== SELECT GETDATE()'
SELECT GETDATE()
PRINT ''
PRINT ''
PRINT '==== SELECT @@version'
SELECT @@VERSION
GO
PRINT ''
PRINT '==== SQL Server name'
SELECT @@SERVERNAME
GO
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_CONNECTIVITY - LOGIN TIMERS'
 
SELECT a.* FROM
(SELECT 
x.value('(//Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(30)') AS [RecordType], 
x.value('(//Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(30)') AS [RecordSource], 
x.value('(//Record/ConnectivityTraceRecord/Spid)[1]', 'int') AS [Spid], 
x.value('(//Record/ConnectivityTraceRecord/OSError)[1]', 'int') AS [OSError], 
x.value('(//Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') AS [SniConsumerError], 
x.value('(//Record/ConnectivityTraceRecord/State)[1]', 'int') AS [State], 
x.value('(//Record/ConnectivityTraceRecord/RecordTime)[1]', 'nvarchar(30)') AS [RecordTime],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') AS [TdsInputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') AS [TdsOutputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') AS [TdsInputBufferBytes],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/TotalLoginTimeInMilliseconds)[1]', 'int') AS [TotalLoginTimeInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/LoginTaskEnqueuedInMilliseconds)[1]', 'int') AS [LoginTaskEnqueuedInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/NetworkWritesInMilliseconds)[1]', 'int') AS [NetworkWritesInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/NetworkReadsInMilliseconds)[1]', 'int') AS [NetworkReadsInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/SslProcessingInMilliseconds)[1]', 'int') AS [SslProcessingInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/SspiProcessingInMilliseconds)[1]', 'int') AS [SspiProcessingInMilliseconds],
x.value('(//Record/ConnectivityTraceRecord/LoginTimers/LoginTriggerAndResourceGovernorProcessingInMilliseconds)[1]', 'int') AS [LoginTriggerAndResourceGovernorProcessingInMilliseconds]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers 
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') AS R(x)) a
where a.RecordType = 'LoginTimers'
order by a.recordtime 
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_CONNECTIVITY - TDS Data'
 
SELECT a.* FROM
(SELECT 
x.value('(//Record/ConnectivityTraceRecord/RecordType)[1]', 'varchar(30)') AS [RecordType], 
x.value('(//Record/ConnectivityTraceRecord/RecordSource)[1]', 'varchar(30)') AS [RecordSource], 
x.value('(//Record/ConnectivityTraceRecord/Spid)[1]', 'int') AS [Spid], 
x.value('(//Record/ConnectivityTraceRecord/OSError)[1]', 'int') AS [OSError], 
x.value('(//Record/ConnectivityTraceRecord/SniConsumerError)[1]', 'int') AS [SniConsumerError], 
x.value('(//Record/ConnectivityTraceRecord/State)[1]', 'int') AS [State], 
x.value('(//Record/ConnectivityTraceRecord/RecordTime)[1]', 'nvarchar(30)') AS [RecordTime],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferError)[1]', 'int') AS [TdsInputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsOutputBufferError)[1]', 'int') AS [TdsOutputBufferError],
x.value('(//Record/ConnectivityTraceRecord/TdsBuffersInformation/TdsInputBufferBytes)[1]', 'int') AS [TdsInputBufferBytes],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/PhysicalConnectionIsKilled)[1]', 'int') AS [PhysicalConnectionIsKilled],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/DisconnectDueToReadError)[1]', 'int') AS [DisconnectDueToReadError],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/NetworkErrorFoundInInputStream)[1]', 'int') AS [NetworkErrorFoundInInputStream],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/ErrorFoundBeforeLogin)[1]', 'int') AS [ErrorFoundBeforeLogin],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/SessionIsKilled)[1]', 'int') AS [SessionIsKilled],
x.value('(//Record/ConnectivityTraceRecord/TdsDisconnectFlags/NormalDisconnect)[1]', 'int') AS [NormalDisconnect]
FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers 
WHERE ring_buffer_type = 'RING_BUFFER_CONNECTIVITY') AS R(x)) a
where a.RecordType = 'Error'
order by a.recordtime
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_SECURITY_EORROR'
 
SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, rbf.[timestamp] - tme.ms_ticks, GETDATE()) as [Notification_Time],
cast(record as xml).value('(//SPID)[1]', 'bigint') as SPID,
cast(record as xml).value('(//ErrorCode)[1]', 'varchar(255)') as Error_Code,
cast(record as xml).value('(//CallingAPIName)[1]', 'varchar(255)') as [CallingAPIName],
cast(record as xml).value('(//APIName)[1]', 'varchar(255)') as [APIName],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time],tme.ms_ticks as [Current Time]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
where rbf.ring_buffer_type = 'RING_BUFFER_SECURITY_ERROR'
ORDER BY rbf.timestamp ASC
 
PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_EXCEPTION'
 
SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime],
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as Time_Stamp,
cast(record as xml).value('(//Exception//Error)[1]', 'varchar(255)') as [Error],
cast(record as xml).value('(//Exception/Severity)[1]', 'varchar(255)') as [Severity],
cast(record as xml).value('(//Exception/State)[1]', 'varchar(255)') as [State],
msg.description,
cast(record as xml).value('(//Exception/UserDefined)[1]', 'int') AS [isUserDefinedError],
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type], 
cast(record as xml).value('(//Record/@time)[1]', 'int') AS [Record Time],
tme.ms_ticks as [Current Time]
from sys.dm_os_ring_buffers rbf
cross join sys.dm_os_sys_info tme
cross join sys.sysmessages msg
where rbf.ring_buffer_type = 'RING_BUFFER_EXCEPTION' 
and msg.error = cast(record as xml).value('(//Exception//Error)[1]', 'varchar(500)') and msg.msglangid = 1033 
ORDER BY rbf.timestamp ASC

PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_RESOURCE_MONITOR to capture external and internal memory pressure'

SELECT CONVERT (varchar(30), GETDATE(), 121) as [RunTime], 
dateadd (ms, (rbf.[timestamp] - tme.ms_ticks), GETDATE()) as [Notification_Time],  
cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') AS [Notification_type],  
cast(record as xml).value('(//Record/MemoryRecord/MemoryUtilization)[1]', 'bigint') AS [MemoryUtilization %],  
cast(record as xml).value('(//Record/MemoryNode/@id)[1]', 'bigint') AS [Node Id],  
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsProcess)[1]', 'int') AS [Process_Indicator],  
cast(record as xml).value('(//Record/ResourceMonitor/IndicatorsSystem)[1]', 'int') AS [System_Indicator], 
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect/@reversed)[1]', 'int') AS [reserved], 
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[1]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[2]/@reversed)[1]', 'int') AS [reserved],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[2]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@type)[1]', 'varchar(30)') AS [type],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@state)[1]', 'varchar(30)') AS [state],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect[3]/@reversed)[1]', 'int') AS [reserved],  
cast(record as xml).value('(//Record/ResourceMonitor/Effect)[3]', 'bigint') AS [Effect], 
  
cast(record as xml).value('(//Record/MemoryNode/ReservedMemory)[1]', 'bigint') AS [SQL_ReservedMemory_KB],  
cast(record as xml).value('(//Record/MemoryNode/CommittedMemory)[1]', 'bigint') AS [SQL_CommittedMemory_KB],  
cast(record as xml).value('(//Record/MemoryNode/AWEMemory)[1]', 'bigint') AS [SQL_AWEMemory],  
cast(record as xml).value('(//Record/MemoryNode/SinglePagesMemory)[1]', 'bigint') AS [SinglePagesMemory],  
cast(record as xml).value('(//Record/MemoryNode/MultiplePagesMemory)[1]', 'bigint') AS [MultiplePagesMemory],  
cast(record as xml).value('(//Record/MemoryRecord/TotalPhysicalMemory)[1]', 'bigint') AS [TotalPhysicalMemory_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailablePhysicalMemory)[1]', 'bigint') AS [AvailablePhysicalMemory_KB],  
cast(record as xml).value('(//Record/MemoryRecord/TotalPageFile)[1]', 'bigint') AS [TotalPageFile_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailablePageFile)[1]', 'bigint') AS [AvailablePageFile_KB],  
cast(record as xml).value('(//Record/MemoryRecord/TotalVirtualAddressSpace)[1]', 'bigint') AS [TotalVirtualAddressSpace_KB],  
cast(record as xml).value('(//Record/MemoryRecord/AvailableVirtualAddressSpace)[1]', 'bigint') AS [AvailableVirtualAddressSpace_KB],  
cast(record as xml).value('(//Record/@id)[1]', 'bigint') AS [Record Id],  
cast(record as xml).value('(//Record/@type)[1]', 'varchar(30)') AS [Type],  
cast(record as xml).value('(//Record/@time)[1]', 'bigint') AS [Record Time], 
tme.ms_ticks as [Current Time] 
FROM sys.dm_os_ring_buffers rbf 
cross join sys.dm_os_sys_info tme 
where rbf.ring_buffer_type = 'RING_BUFFER_RESOURCE_MONITOR' --and cast(record as xml).value('(//Record/ResourceMonitor/Notification)[1]', 'varchar(30)') = 'RESOURCE_MEMPHYSICAL_LOW' 
ORDER BY rbf.timestamp ASC


PRINT ''
PRINT ''
PRINT '==== RING_BUFFER_SCHEDULER_MONITOR to Monitor system health'

SELECT  CONVERT (varchar(30), GETDATE(), 121) as runtime, DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE()) AS Notification_time,    a.* , sys.ms_ticks AS [Current Time]  
FROM   (SELECT x.value('(//Record/SchedulerMonitorEvent/SystemHealth/ProcessUtilization)[1]', 'int') AS [ProcessUtilization],    
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/SystemIdle)[1]', 'int') AS [SystemIdle %],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/UserModeTime) [1]', 'bigint') AS [UserModeTime],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/KernelModeTime) [1]', 'bigint') AS [KernelModeTime],    
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/PageFaults) [1]', 'bigint') AS [PageFaults],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/WorkingSetDelta) [1]', 'bigint')/1024 AS [WorkingSetDelta],   
x.value('(//Record/SchedulerMonitorEvent/SystemHealth/MemoryUtilization) [1]', 'bigint') AS [MemoryUtilization (%workingset)],   
x.value('(//Record/@time)[1]', 'bigint') AS [Record Time]  FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers    
WHERE ring_buffer_type = 'RING_BUFFER_SCHEDULER_MONITOR') AS R(x)) a  CROSS JOIN sys.dm_os_sys_info sys ORDER BY DATEADD (ms, a.[Record Time] - sys.ms_ticks, GETDATE())

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights

Posted in SQL General, SQL Server Engine, SQL Server memory, SQL Server Tools | Tagged: dm_os_ring_buffers, RING_BUFFER_CONNECTIVITY, RING_BUFFER_EXCEPTION, RING_BUFFER_RESOURCE_MONITOR, RING_BUFFER_SCHEDULER_MONITOR, RING_BUFFER_SECURITY_ERROR | 2 Comments »

SQL Server lock pages in memory

Posted by Karthick P.K on March 26, 2013

Lock pages in memory is again a black box for many SQL Server DBA’s. Configuration choice to enable lock pages in memory depends on various aspects.

Before we get in to internals we will recollect some of the basics of SQL Server lock pages in memory in 64-Bit SQL Servers.

What is Locked pages in memory in windows?

Its user right for windows account and can be enabled by using secpol.msc or gpedit.msc

Why do we need this privilege for SQL Server startup account?

There are 3 different memory models in 64-bit SQL Server. They are conventional ,locked pages and large pages memory model.

Locked pages memory model: In lock pages memory mode SQL Server uses allocateuserphysicalpages and mapuserphysicalpages function to allocate memory. Caller token of this function should have LPIM privilege else the function call would fail, hence you need LPIM for startup account of SQL Server to use lock pages memory mode.

Large pages memory model: In large pages memory model I.e When you use TF834 in enterprise edition on systems with physical memory >8GB SQL Server uses large pages memory model. In this memory model SQL Server uses vitualalloc API with MEM_LARGE_PAGES allocation type. For using MEM_LARGE_PAGES in virtualalloc caller token must have LPIM privilege.

Memory allocated using AWE allocator API’s (or) Virtualalloc function with MEM_LARGE_PAGES are not part of Process working set ,hence cannot be paged out and not visible in private bytes or working set in task manger and Perfmon. process. Private bytes (or) Perfmon.process. working set.

What is the Advantage of using Lockedpages or Largepages? SQL Server working set (BPOOL) cannot be paged by windows even when there is system wide memory pressure.

Disadvantage: Operating system will starve for memory when there is system wide memory pressure. OS has to completely rely on SQL Server to respond to low memory notification and scale down its memory usage . SQL Server may not respond fast enough to low memory condition at system level because OS is already starving for memory. LPIM prevents only the BPOOL from paging, Non-Bpool components can still be paged and we have critical portions of SQL Server memory like thread stack, SQL Server Images/DLL’s in Non-Bpool which can still be paged by OS.

So many disadvantage…. But still why do we recommend LPIM in some places?

In earlier versions of windows 2003 (If This fix is not applied) when there is system wide memory pressure windows memory manger would trim one-quarter of working set of all the process. Imagine If SQL Server is using 200GB of RAM and there is system wide memory pressure, Windows memory manager would move 50 GB of SQL Server working set to page file and we would end with performance problems. If LPIM is enabled OS cannot trim. Imagine there is a faulty application/drivers in the server and it leaks memory fast , It might consume all the memory in the server and windows memory manager might trim all of SQL Server working set.

Known issues in windows like the one in This and few in windows 2008 mentioned in This link can cause windows memory manager to trim the working set of SQL Server process suddenly. Windows has a background process which keeps writing the contents of working set to page file, so when there is paging only the dirty pages needs to be moved to the page file others are already backed by back ground process, So paging would be very fast and SQL Server working set would be moved to page file in seconds before SQL Server responds to low memory resource notification from OS causing negative performance. In systems with large amount of memory (Ex: 1 TB )we might get non yielding scheduler situations when allocating memory in conventional memory model. LPIM is only option is this case. LPIM can be used in servers in which it might take long time to identify the cause of the working set trim. It is always suggested to identify the cause of TRIM before choosing LPIM in first place. You can use the steps mentioned in This link to troubleshoot working set trims.

Note:

1. Local system account has LPIM privilege by default, so if you are using local system as startup account of SQL Server then SQL Server might be using lock pages memory model by default with out your knowledge.

2. In earlier versions of SQL Server (Till 2008R2) you need TF845 with fix in KB970070 in standard and BI edition to make use of lock pages memory model.

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Memory, SQL Server Engine, SQL Server memory | Tagged: Large pages SQLServer, lock pages in memory, SQLServer LPIM | 7 Comments »

SQL Server -g

Posted by Karthick P.K on March 5, 2013

I decide to write this quick blog after seeing a lot of confusion around sqlserver -g switch. sql server -g switch is nop in 64-Bit sqlserver and it is used in 32-bit sqlserver to increase the size of MTL(AKA MemToLeave).

The default value of -g switch is 256 MB. I.e. if you do not specify value for sqlserver -g switch it is defaulted to 256 MB.

Initialization of sqlserver memory during the startup of SQL Server is as follows.

1. Calculate the size of MemtoLeave and reserve it using the algorithm below

MTL (Memory to Leave)= (Stack size * max worker threads) + Additional space (By default 256 MB and can be controlled by -g).

Stack size =512 KB per thread for 32 Bit SQL Server and 904 KB for 32Bit SQL Server running on 64-Bit systems.

I.e = (256 *512 KB) + 256MB =384MB

-g switch is used to increase the additional space from 256 to any desired value.

2. Calculate the size of BPOOL using below algorithm.

SQL Server Buffer Pool is minimum of “Physical RAM “ or “user mode memory(2GB or 3GB) – MTL- BUF structures”

BPool = Minimum (Physical memory, User address space – MTL) – BUF structures

Buf structures are arrays maintained by sqlserver to track the status of each buffer in BPOOL . SQL Server makes maximum of 32 allocation requests to the OS to reserve bpool pages.

SQL Server maintains contiguous array to track status information associated with each buffer (8 KB page) in the BPool. In addition SQL Server maintains a second array to track the committed and reserved bitmap pages.

This bit can be 0 or 1 . 1 indicates buffer is committed and 0 indicated page is reserved.

Size of Buf structure is approximately 16 MB when AWE is not enabled and when AWE is enabled buf structures use additional 8MB per each GB of RAM in the system.

3. Release the MTL region which is reserved initially. We reserve MTL at startup and releases it after BPOOL is reserved to ensure MemToLeave region are contiguous.

More details about the SQL Server memory architecture in https://mssqlwiki.com/sqlwiki/sql-performance/basics-of-sql-server-memory-architecture/

If you are in SQL Server 2012 read https://mssqlwiki.com/2012/10/21/sql-server-2012-memory-2/

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Memory, SQL Server Engine, SQL Server memory | Tagged: Memtoleave calculation, sqlserver -g, sqlserver -g switch, sqlserver memory, what is sql server -g | 16 Comments »

SQL Server Operating system (SOS) – Series 3

Posted by Karthick P.K on February 11, 2013

Thread synchronization

When we discussed about thread I mentioned In multi-threaded applications each thread has to synchronize their activities among other threads. Sometimes a thread has to wait for other thread to complete before it can execute (Ex: SQL Server blocking) sometimes a thread has to synchronize with other thread and continue execution (Ex: CX packet). If we allow multiple thread access the same resource they might get corruption or inconsistency.

Windows offers different ways to synchronize multiple threads before we jump in to different synchronization techniques let us see why thread synchronization is very important using the small program below.

In the below program I have declared two global called a and b and set the values of this globals to 0 . We define the number of thread which we are going to create in global named Threadcount (64).

We create 64 thread in main function and each thread start executing a function called Submain. In Submain each thread increases the value of a and b 1000 times.

So ideally value of A and B has to be 64,000 at the end of program execution (64 Threads *1000 increments).

Let us check what happens.

#include <windows.h>
#include <string>
#include <iostream>
#include <process.h>    /* _beginthread, _endthread */
long a=0;
long b=0;
long g_InUse = FALSE;
long g_fResourceInUse = FALSE;
int Threadcount=64;
int s=Threadcount;
bool d=FALSE;

void Submain(void *x)
{
       for (int L=0;L<1000;L++)
              {

                     a=a++;

                     while (InterlockedExchange (&g_InUse, TRUE) == TRUE)
                     {
                           Sleep(0);
                     }
       //Sleep(10); //-->How Spinlock can cause CPU Spike
       b=b++;

       InterlockedExchange (&g_InUse, FALSE);
       }

/*
s=s-1;  //Simple synchronization technique. May be useful if you like to increase the thread count WaitForMultipleObjects support value defined for MAXIMUM_WAIT_OBJECTS 64
       if(s==0)
       {
              d=TRUE;
       }
*/
_endthread();
}

void main()

{

HANDLE *hThreads;
hThreads = new HANDLE[Threadcount] ;
for (int i=0;i<Threadcount;i++)
{
hThreads[i]=  CreateThread(NULL,NULL,(LPTHREAD_START_ROUTINE  )Submain,  NULL,  0,  NULL);

              if (hThreads[i]==NULL)
              {
                     printf("\nThread creation failed for thread %d with error %d",i,GetLastError());
              }

}
SetLastError(0);

DWORD rw=WaitForMultipleObjects(Threadcount,hThreads,true,INFINITE);

//while(!d); //Simple synchronization technique

printf("Value of a is:%d\n" ,a);
printf("Value of b is:%d\n" ,b);
system("pause");
}

Why the value of a and b are different and why b is accurate while a is not. If you look at the program closely atomic access to global b is guaranteed using the InterlockedExchange function so only one thread could access global b any time while there was no synchronization for global a so the end value is incorrect.

Thread synchronization can be achieved in user mode or using kernel objects

User mode thread synchronization: Threads can be synchronized in User mode using interlocked family functions or using critical sections. User mode thread synchronization is faster than using kernal objects. In the above program we used interlocked family function to synchronize the threads to access of gloabal b. interlocked family functions should be used with caution on multiprocessor system and should be avoided in uniprocessor machines.

Spinlock: A method by which we continuously check if the resource is available. I the above program global a and b are resource. Since we guaranteed atomic access to global b. Only one thread can access it at any time so what about the other threads they continuously spin to check if the resource becomes available. Look at the below portion of above program. While loop checks the value of g_InUse. If the value is FALSE the resource was not is use and calling thread will set the value to TRUE so other threads cannot access it and continue the execution. Once it completes its task i.e. incrementing the value of b it sets the value of g_InUse to false so others can access it. If the value is false then some other thread is currently using the global resource b and the while loop continues to spin.

while (InterlockedExchange (&g_InUse, TRUE) == TRUE)
                     {
                           Sleep(0);
                     }
                     //Sleep(10); //-->How Spinlock can cause CPU Spike
                     b=b++;
              InterlockedExchange (&g_InUse, FALSE);
                    }

Incorrect use of spinlock can waste CPU and can cause extreme CPU spikes. In the above program uncomment the line “Sleep(10); //–>How Spinlock can cause CPU Spike” and build the exe and execute it. Look at your task manger and check the CPU utilization. It would be extremely high because each time a thread takes lock on of g_InUse. It sleeps for 10 milliseconds, increments the value id b and then releases the lock. While the other threads continuously spins to check if the resources are available thus causing CPU spike. In real time a thread may not sleep after taking a lock but assume it is performing some task which takes time and other threads will keep spinning consuming CPU.

Critical section: Like interlocked family functions critical section is also used to guarantee atomic access to a resource. Major difference between the interlocked functions and critical section is when criticalsection is owned by other thread calling thread is immediately placed in waitstate, so thread transits from user to kernel mode and this transition is expensive (about 1000 CPU cycles as per Jeffrey Richter) . when the thread which owns the critical section releases the critical section one of the waiting thread is signaled and scheduled. Programmers should make wise decision on when to use interlocked family functions and critical sections. Above program caused severe CPU spike after we uncommented line “Sleep(10); //–>How Spinlock can cause CPU Spike” let us do the same implementation using Critical section in below program.

#include <windows.h>
#include <string>
#include <iostream>
#include <process.h>    /* _beginthread, _endthread */
long a=0;
long b=0;
int Threadcount=64;
int s=Threadcount;
bool d=FALSE;
CRITICAL_SECTION  gcs;

void Submain(void *x)
{
       for (int L=0;L<1000;L++)
              {

                     a=a++;
                     EnterCriticalSection(&gcs);
                     Sleep(10); //-->How Spinlock can cause CPU Spike
                     b=b++;
                     LeaveCriticalSection(&gcs);
              }

/*
    s=s-1;  //Simple synchronization technique. May be useful if you like to increase the thread count WaitForMultipleObjects support value defined for MAXIMUM_WAIT_OBJECTS 64
       if(s==0)
       {
              d=TRUE;
       }
*/
_endthread();
}

void main()

{

HANDLE *hThreads;
hThreads = new HANDLE[Threadcount] ;
InitializeCriticalSection(&gcs);
for (int i=0;i<Threadcount;i++)
{
hThreads[i]=  CreateThread(NULL,NULL,(LPTHREAD_START_ROUTINE  )Submain,  NULL,  0,  NULL);

              if (hThreads[i]==NULL)
              {
                     printf("\nThread creation failed for thread %d with error %d",i,GetLastError());
              }

}
SetLastError(0);

DWORD rw=WaitForMultipleObjects(Threadcount,hThreads,true,INFINITE);
DeleteCriticalSection(&gcs);
//while(!d); //Simple synchronization technique

printf("Value of a is:%d\n" ,a);
printf("Value of b is:%d\n" ,b);
system("pause");
}

After building the above program run the executable and you will notice that it doesn’t consume high CPU. Does this mean critical section is better than interlock functions? No. It depends. In this exe lock is held for long time so critical section was ideal. Assume each thread would have got access to the resource after spinning once (or) twice then definitely interlock functions would have been an ideal choice because we would have avoided transition of each thread from user mode to kernel mode which is expensive. There is also a API called InitializeCriticalSectionAndSpinCount. What is the difference between InitializeCriticalSection and InitializeCriticalSectionAndSpinCount? InitializeCriticalSectionAndSpinCount Spins to acquire resource n mumber of time and only if all attempts fail then the thread transits to kernel mode.

Thread deadlock: Similar to SQL Server locks what happens when two threads wait to acquire critical sections owned on resource owned by other? If there is no timeout threads will attempt to wait forever and will never get scheduled. In SQL Server we have deadlock monitor to detect this condition and rollback one of the transaction but windows doesn’t offer any such facility.

Orphan or unreleased critical section: When a thread takes critical section it is expected to release it, Assume a flaw in code or exception caused a thread to abort after taking a critical section and before releasing it, Critical section taken by the terminate thread is never destroyed and all the other threads will wait indefinitely on it.

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in SQL Server Engine, SQLServer SOS | Tagged: Inside SQLserver OS, SOS scheduler, SQLOS, SQLServer operating system, SQLServer scheduler, SQLServer SOS, UMS, User mode sceduler, What is SQLSOS? | 2 Comments »

SQL Server Operating system (SOS) – Series 2

Posted by Karthick P.K on January 13, 2013

Context Switching:
When a thread is yielded from CPU windows stores the CONTEXT (current state such as CPU registers, program counters Etc) information of the current thread and loads the CONTEXT of the new thread which will run in the CPU.
Why? A thread is yielded when executing an instruction, How the thread will resume from same point when it is rescheduled in CPU. Context information is stored while yielding and loaded when thread in rerun.
Note: yielding from CPU=Coming out of CPU

If there is high Context switching then system would spend more time on doing context switching than doing meaningful work.

Yielding:
When a thread moves out of CPU it is called as yielding.

When a thread can yield?
Thread which is running on CPU can yield out of the CPU under following key condition

Voluntary yield
Thread decides to yield by itself because of the logic in code executed by the thread. Generally a thread will voluntarily yield by calling Sleep(0) or SwitchToThread. When a thread voluntarily yields it is placed at the end of runnable list.
Ex: SQL Server thread will yield after sorting 64K of sort records.

Preempted
Thread which is running on the CPU will be forced to yield from CPU when a thread with higher priority is ready to run. When a thread is preempted it is placed in the beginning of the runnable list.

Quantum end
All the threads which is executed in operating system will get a time slice called as quantum. When a thread completes its quantum it is yielded and next thread is run. If there is no other thread is ready to run than thread run for another quantum.

Termination
Thread is terminated when it finishes execution and destroyed by calling TerminateThread

Thread and process priorities

windows supports thread priority level ranging from 0 (Lowest) to 31 (Highest). If all the threads have same priority they are scheduled in round robin basis, but in reality threads running on OS will have different priorities. Among all the threads which can be run windows scheduler picks thread with highest priority to run first. Priority of a thread can be changed using WINAPI SetThreadPriority, Similarly priority of a process can be set while creating the process (WINAPI CreateProcess dwCreationFlags ) ,using WINAPI SetPriorityClass after the process is created and by using tools like task manager.

Let us attach the debugger to SQL Server process and see how to view the threads, thread stack and how SQL Server threads wait with out being scheduled.

Download the windows debugger from below link

Windbg 32-bit package:

http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

Windbg X64 package:

http://msdl.microsoft.com/download/symbols/debuggers/dbg_amd64_6.11.1.404.msi

1. Start SQL Server in your test system.

2. Attach the debugger to SQL Server process. Refer below image. If you have more than one instance use the process id to attach with correct SQL Server process.

3. On command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

4. Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

5. Verify if symbols are loaded for SQL Server by using the debugger command lmvm

lmvm sqlservr

6. Type

!peb to display the information about process from process environment block.

7. Type ~ to display all the threads in the process. First column in the output represents thread ID.

8. To look at the stack of a specific thread,switch to the thread using thread ordinal (or) thread ID.

Debugger command ~ displays all the thread’s of the process.

Thread ordinal: is decimal value used by debugger to identify the thread starts from 0 (First columns in below output).

Thread ID : Is the ID assigned to each thread by operating to system. You can switch to a thread using thread ID by debugger command ~~[ThreadID]s (4th column in ~ output )

In the below image I have printed all the thread’s and stack of thread ordinal 8 which is scheduler monitor thread.

9. Type g in command prompt to resume the process.

10. Connect to SQL Server from management studio. Run select * from sysprocesses. All the sessions which has a non zero value for KPID has a valid windows thread associated with the session. When executing queries which choose parallel plan there will be more than one row for same session and each rows will have different KPID.

11. To look at the thread stack of a session which is currently executing a task or background process . Convert KPID value associated with session in sysprocess in Hexadecimal value and type below command window of debugger.

~~[Hex value of a thread]s

Type kC

12. Let us create a small blocking scenario to understand how threads wait in SQL Server.

Session-1

create table a (A int)
go
insert into a values (1);
go
begin transaction
update a set A =1+1

Session-2

select * from a

— Session-2 will be blocked. Look at the stack of blocked thread

13. Run select * from sysprocesses where blocked<>0

Identify the KPID of the session which is blocked.

14. Convert KPID in to hexadecimal value

15. Break the debugger to execute the debugger commands (CTRL+B) or 7th Icon in menu bar.

16. Look at the stack of the thread which is waiting for lock (Blocked) by using the Hex value of KPID

~~[Hex value of KPID]s

17. Above thread from second session which we created is waiting for an event using WaitForsingleObject. When the first session releases the lock this thread will be signaled and resumes execution.

We will see the details about WaitForsingleObject ,Waitformultipleobjects, Event (Manual auto reset) etc. in more details in forthcoming blogs

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in SQL Server Engine, SQLServer SOS | Tagged: Inside SQLOS, SOS, SQL Server operating system, SQL Server SOS, SQLOS, SQLServer scheduler, UMS, User mode sceduler, What is SQLSOS? | 2 Comments »

SQL Server Operating system (SOS) – Series 1

Posted by Karthick P.K on January 10, 2013

Before we start studying SQLOS we will recollect some of the basic OS concepts.

What is process?

Process is instance of service or application which is running. Each process will have address space that contains all the Executable, Dlls, data, thread stacks etc. Operating system maintains a kernel objects for each process to manage the process. One or more threads which runs under the context of the process and execute the code in address space. Each thread can execute the code and maintains its own set of CPU registers and stack. When a process is created primary thread of the process is created and starts executing the main() or Other similar function. Primary thread can create additional threads using create thread function and lpStartAddress (Thread entry point ) defines the function that is to be executed by the thread which is created.

Ex: Ureadfile will be the thread entry point for the new thread which is created using below code.

CreateThread(0,0,(LPTHREAD_START_ROUTINE )Ureadfile,(LPVOID)&PSUreadfile[i], 0, NULL);

What is thread?

Threads executes the code in process. There can be one (Single threaded ) or more( Multi-threaded) threads for every process. In multi-threaded applications each thread has to synchronize their activities among other threads. Ex: Allowing one thread to modify a Global while other is reading it can cause race conditions.

SQL Server uses synchronization techniques like Spinlocks, Latches, Events Etc.

Threads states

Threads can be in below core states (There are other states which we will discuss on need)

Waiting

Wait state represents thread is waiting for some resource. A thread in this state is not eligible to be scheduled from OS.

SQL Server threads can be in wait state in multiple places. A thread requesting for lock has to wait till it is available and goes for sleep unless signaled when the lock is available.

A thread can call WaitForSingleObject and wait without competing CPU resource

Ex:

LMHandle = CreateMemoryResourceNotification(LowMemoryResourceNotification);

WaitForSingleObject( LMHandle,INFINITE);

In above example Thread will wait till there is Lowmemoryresourcenotification from windows.

Running

Thread is running in CPU.

Ready

Thread is ready to run and waiting for its CPU slice. In SQL Server thread which are ready to run on scheduler will stay in runnable list of scheduler till they get chance to run on scheduler.

Quantum
All the threads which is executed in operating system will get a time slice to run in CPU called as quantum.Thread is yielded from scheduler after its quantum is completed.

Scheduling

Preemptive scheduling:

Operating system can interrupt the thread execution any time. OS can halt the thread execution and schedule another thread to run at any time.

Non-Preemptive scheduling:

Operating system cannot interrupt the thread execution any time. The worker owns the scheduler until it yields to another worker on the same CPU. If the thread which runs on CPU(Scheduler) don’t yield in time it monopolizes the CPU until it finishes.

Windows 3.x and DOS were using Non-Preemptive scheduling. In Non-Preemptive scheduling context switching is generally reduced because the operating system does not interrupt code execution and It is easier to implement a multi-threaded application in Non-Preemptive mode because synchronization may be less of an issue. A bad application can easily ‘hang’ the entire system if thread from application does not yield from CPU allowing other applications threads to execute.

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in SQL Server Engine, SQLServer SOS | Tagged: Inside SQLOS, SOS, SQLOS, SQLServer operating system, SQLServer SOS, UMS, What is SOS | 2 Comments »

SQL Server fails to start with error "Failed allocate pages: FAIL_PAGE_ALLOCATION 1" During startup

Posted by Karthick P.K on January 6, 2013

SQL Server fails to start and If you look at the SQL Server Error log you will find "Failed allocate pages: FAIL_PAGE_ALLOCATION" and SQL Server generating exception dump. Similar to the SQL Server error log below.

Note: This blog is applicable when you out get of memory error during startup (or) with event ID: 2019 in system event log. For general troubleshooting of SQL Server out of memory errors follow steps in Troubleshooting SQLServer Memory

{

2013-01-02 12:31:20.91 Server Microsoft SQL Server 2008 R2 (SP1) – 10.50.2500.0 (Intel X86)

Jun 17 2011 00:57:23

Enterprise Edition on Windows NT 5.2 <X86> (Build 3790: Service Pack 2)

2013-01-02 12:31:20.91 Server (c) Microsoft Corporation.

2013-01-02 12:31:20.91 Server Server process ID is 1583.

2013-01-02 12:31:20.91 Server Authentication mode is MIXED.

2013-01-02 12:31:20.91 Server Logging SQL Server messages in file ‘C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\Log\ERRORLOG’.

2013-01-02 12:31:20.91 Server This instance of SQL Server last reported using a process ID of 9240 at 1/3/2013 7:31:20 PM (local) 1/4/2013 12:31:20 AM (UTC). This is an informational message only; no user action is required.

2013-01-02 12:31:20.91 Server Registry startup parameters:

-d C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\DATA\master.mdf

-e C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\Log\ERRORLOG

-l C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\DATA\mastlog.ldf

2013-01-02 12:31:20.92 Server SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required.

2013-01-02 12:31:20.92 Server Detected 24 CPUs. This is an informational message; no user action is required.

2013-01-02 12:31:20.94 Server Address Windowing Extensions is enabled. This is an informational message only; no user action is required.

2013-01-02 12:31:27.33 Server Failed allocate pages: FAIL_PAGE_ALLOCATION 1

2013-01-02 12:31:27.33 Server

Memory Manager KB

—————————————- ———-

VM Reserved 1534584

VM Committed 51576

AWE Allocated 0

Reserved Memory 1024

Reserved Memory In Use 0

2013-01-02 12:31:27.33 Server Error: 17311, Severity: 16, State: 1. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.

2013-01-02 12:31:27.33 Server Using ‘dbghelp.dll’ version ‘4.0.5’

2013-01-02 12:31:27.34 Server **Dump thread – spid = 0, EC = 0x00000000

2013-01-02 12:31:27.34 Server ***Stack Dump being sent to C:\Microsoft SQL Server\MSSQL10_50.MSSQLWIKIServer\MSSQL\LOG\SQLDump0008.txt

2013-01-02 12:31:27.34 Server * *******************************************************************************

2013-01-02 12:31:27.34 Server *

2013-01-02 12:31:27.34 Server * BEGIN STACK DUMP:

2013-01-02 12:31:27.34 Server * 01/03/13 19:31:27 spid 4344

2013-01-02 12:31:27.34 Server *

2013-01-02 12:31:27.34 Server * ex_handle_except encountered exception C0000005 – Server terminating

}

Why would SQL Server fail with out of memory error (FAIL_PAGE_ALLOCATION)during the startup? Only possible reason that I could think of is Paged or NonPaged pool is empty.

How to prove if my Paged / NonPaged pool is empty? Look at the system event log for the Event ID: 2019

You will find error in system event log similar to one you see below.

{

Event Type: Error

Event Source: Srv

Event Category: None

Event ID: 2019

Date: 2013-01-02

Time: 12:31:00 PM

User: N/A

Computer: MSSQLWIKIServer

Description:

The server was unable to allocate from the system nonpaged pool because the pool was empty.

}

Above error indicates nonpaged pool is empty, When Nonpaged pool is empty every application would fail. How to identify who is consuming Nonpaged pool?

Use poolmon.exe from windows support tools. (Steps are documented in This KB).

If you r OS is windows 2003 or above you can simple run the exe from command prompt and identify who is consuming (Leaking J) space in Paged / NonPaged pool.

Below is sample output of poolmon.exe which I collected from my test system

Memory consumption by each tag is printed in above output. After finding the tag which is leaking the memory (Highest bytes)identify the Driver which is using the tag by using find command or strings utility from sysinternals (search for TAG in drivers folder %Systemroot%\System32\Drivers). Once you identify the driver, check if there are any known issue with the driver or you may have to contact the vendor of the driver to identify why the driver is consuming large amount of pooled /Non-pooled memory.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group MSSQLWIKI and post your SQL Server questions to get answered by experts.

Thank you,

Karthick P.K |

My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Configuration, Memory, SQL Server Engine, SQL Server memory, Startup failures | Tagged: 17311, Event ID: 2019, Failed allocate pages: FAIL_PAGE_ALLOCATION 1, Severity: 16 State: 1. (Params:)., The server was unable to allocate from the system nonpaged pool because the pool was empty | 26 Comments »

Debugging memory Leaks using Debug diagnostic tool.

Posted by Karthick P.K on December 6, 2012

In my previous post (SQL Server memory leak ) I explained how to identify the modules which are leaking the memory using ‘!heap’ commands. Sometimes we may not be able to find the cause by displaying the memory using ‘!d’ commands to find the patterns or using search memory commands (s).

In such scenarios we can use Debug Diagnostic Tool or UMDH to track memory leaks. This blog will explain how to identify the memory leaks using Debug diagnostics tools.

Download and install Debug Diagnostic Tool from http://www.microsoft.com/en-us/download/details.aspx?id=26798

1. Go to ToolsàOptions ->Preferences àSelect Record call stacks immediately when monitoring the leaks.

2. Go to the rules tab and select add rule.

3. Choose Native (non .Net) memory leak and handle leak.

4. Select the SQL Server or any process which has to be tracked for memory leak.

5. Click next and leave the default options (you can choose auto-unload Leak track when rule is completed or deactivated).

6. Click next and Activate the rule now.

7. Leaktrack.dll would have loaded to the process for which we are tracking the allocations.

8. Now you can wait for the leak to happen again.

{

–If you are learning how to troubleshoot SQL Server memory leak follow the steps which we followed in previous post (https://mssqlwiki.com/2012/12/04/sql-server-memory-leak/)to leak the memory.

–Download HeapLeak.dll from this link.

–Create an extended stored procedure in SQL Server

sp_addextendedproc ‘HeapLeak’,‘C:\HeapLeakdll\HeapLeak.dll’

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

}

9. Once you suspect memory is leaked. Go to the rules and take a full user dump by right clicking the Leak rule.

10. After the dump is captured , go to the advanced analysis tab, Add data files and select the dump which we generated.

11. Go to ToolsàOptions ->set the symbol path for analysis. Default Microsoft symbol path is below.

srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;c:\Release

Important: Replace c:\Release with symbol path of dll’s which you have loaded in SQL Server (optional)

11. In the available analysis script select memory pressure analyzers (memory analysis.asp).

12. Click start analysis.

13. Analysis might take a while depending on time it takes to load the symbols. Once the analysis is completed it would generate and open a HTML report.

This HTML report is stored in C:\Program Files\DebugDiag\Reports\ by default and can be used for later reference.

I have attached a sample report which I collected when leaking memory using heapleak.dll in This link. You can use it for reference.

Report generated by debug diagnostic tool memory pressure analyzer will have the analysis summary and below Table Of Contents

sqlservr.exe__…………dmp

Virtual Memory Analysis Report

Heap Analysis Report

Leak Analysis Report

Outstanding allocation summary

Detailed module report (Memory)

Detailed module report (Handles)

14. Analysis summary is good portion in the report to start with and would give the module which is leaking the memory. Look at the below report.

15. Report has clearly indicated HeapLeak.dll has 255 MB of outstanding allocations. In heapleak.dll “Sub“ is the function which allocated this memory at offset 23.

16. Look at the virtual memory summary. It gives complete picture about memory distribution in the virtual address space. In the below summary memory reserved is 1.57 GB which is normal in 32-Bit SQL Server, but native heaps is 272.94 MB which is not normal.

Look at the heap summary there are 50 heaps.

17. Now look at the Outstanding allocation summary. It gives top 10 modules by allocation count and allocation size. In below summary HeapLeak has 26,182 allocations with size of 255.6 MB.

Note: In this report it is HeapLeak but in real time it might be any module which is leaking the memory

18. You can also look at detailed module report(Memory). It gives the memory allocation from each module along with function and source line which allocated the memory (If you set the symbols for all the modules loaded).

By now we are sure that sub function in HeapLeak.dll has allocated 255 MB in line number 14 and has not released. The report also gives you the callstack samples that show the code path when functions was doing allocations. Refer This sample HTML report file.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki , join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/ and post your SQL Server questions to get answered by experts.

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, Memory, Performance, SQL Server memory | Tagged: debug diagnostics tool memory leak, how to find a leak, memory leak, memory leak in, SQLServer memory leaks, what are memory leaks | 5 Comments »

SQL Server memory leak

Posted by Karthick P.K on December 4, 2012

What is memory leak?

When a process allocates memory it is supposed to de-allocate it and release it back to OS. If it misses to de-allocate the memory due to flaw in code it is called as leak and It can cause memory pressure both to the operating system and application.

Myth about SQL Server memory leak

SQL Server memory management is designed to dynamically grow and shrink its memory based on the amount of available memory on the system and Max server memory setting in SQLServer.

Many times system admins look at the memory usage of SQLServer and assume SQLServer is leaking memory if they find SQL Server memory usage is high.

This is incorrect SQL Server is server based application and its memory manager is designed in such a way that it will keep growing its memory usage on need (Exception large pages) and will not scale down its usage unless there is low memory notification from Windows. We can control the memory usage of SQL Server using Max server memory setting in SQLServer. This setting limits the Bpool usage of SQL Server and doesn’t control the overall memory usage of SQLServer. There are portions of SQLServer memory that is allocated outside BPOOL (aks: MTL or MTR) we do not have a way to control how much memory SQL Server can use outside bpool, but non bool memory usage will be normally low and can be easily estimated by studying the components running in SQL Server.

Ex: If you want to set SQLServer to use only 10GB RAM on server. Consider how much memory SQL Server might need outside Bpool and set the “max server memory” setting accordingly. In this case if you estimate SQL Server will use 1.5GB outside Bpool then set the Max server memory to 8.5GB.

What can cause SQL Server Memory leak?

SQL Server code has a logic to allocate memory but doesn’t de-allocate it. If any of the components in SQL Server is causing a memory leak in SQL Server it can be identified easily using the DMV’s like sys.dm_os_memory_allocation,sys.dm_os_memory_clerks and sys.dm_os_memory_objects etc., but most of the memory leaks in SQL Server is caused by 3rd party Dll’s which are loaded in SQL Server process.

Note: All the memory allocations by Non SQL server Dll’s loaded in SQL Server will happens in “Mem to Leave”(outside the Bpool) and they are called as direct windows allocations (DWA)

When there is out of memory conditions in SQL Server and if you suspect there is a memory leak.First thing to determine is who is consuming the memory. If SQL Server is not using majority of the memory in MemToLeave and still you get Mem to leave errors probably there is a leak and it caused by some DLL’s loaded in

SQL Server. Refer Section 1 (MTL error) in https://mssqlwiki.com/sqlwiki/sql-performance/troubleshooting-sql-server-memory/

Below query can be used to determine actual memory consumption by SQL Server in MTL.

select sum(multi_pages_kb) from sys.dm_os_memory_clerks

If the memory consumption by SQL Server is very low and still if you see SQL Server memory errors like few below then focus on Leaks.

Ex:

SQL Server 2000

WARNING: Failed to reserve contiguous memory of Size= 65536.

WARNING: Clearing procedure cache to free contiguous memory.

Error: 17802 “Could not create server event thread.”

SQL Server could not spawn process_loginread thread.

SQL Server 2005/2008

Failed Virtual Allocate Bytes: FAIL_VIRTUAL_RESERVE 122880

How to identify and troubleshoot the memory leak?

There are multiple ways in windows to identify who is leaking memory in process. We will discuss how to identify the memory leak using

1. Windows debugger 2. Debug diagnostics tools for windows and 3. UMDH in this blog.

Let us create a sample DLL to load in SQL server process to leak memory and see how to use the tools I mentioned above to troubleshoot the leak.

Download HeapLeak.dll from This link and install Microsoft Visual C++ 2010 Redistributable Package from this links 32-Bit or 64-Bit to make this DLL work.

–Create an extended stored procedure in SQL Server

exec sp_addextendedproc  'HeapLeak','C:\HeapLeakdll\HeapLeak.dll'

–Let us execute this Extended SP 30 times and leak memory.

exec HeapLeak

go 30

We will also enable below trace flags in SQL Server to automatically generate filter dump when there is out of memory errors and see how to identify who is leaking.

dbcc traceon (2551,-1) — 2551 is used to enable filter dump.

dbcc traceon (8004,-1) –8004 is used to take memory dump on first occurrence of OOM condition

–Note: Both the trace flags listed above are un-documented, So use it at your own risk and there is no guarantee that this trace flags will work in future versions of SQL Server

Once we enable the trace flag . We have to cause out memory error in SQL Server to generate OOM memory dump. We have leaked around 300 MB of memory from MTL by executing above extended SP 30 times.

Let use execute below script which create XML handles. Memory for xml handles is allocated from MTL we will get out of memory errors very soon because extended stored procedure which we executed has already leaked the memory.

(Do not run below XML script directly with out executing HeapLeak Below script will cause OOM error because of handle created for each execution, but it is accounted as SQL Server allocation so will not help us to understand the how to debug leaks caused by 3rd party DLL’s)

Note: 1. SQL Server memory dump will be generated in SQL Server error log folder.
2. Size of MTL is 256 MB + Max worker threads *.5 in 32-Bit SQL Server. So approximately 384 MB unless modified using –g switch.

DECLARE @idoc int
 
DECLARE @doc varchar(1000)
 
SET @doc ='<ROOT>
<Customer CustomerID="VINET" ContactName="Paul Henriot">
<Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00">
     <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/>
      <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/>
   </Order>
</Customer>
<Customer CustomerID="LILAS" ContactName="Carlos Gonzlez">
   <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00">
   <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/>
   </Order>           
</Customer>
</ROOT>'
 
EXEC sp_xml_preparedocument @idoc OUTPUT, @doc
 
go 10000

We will receive below error after few executions.

Msg 6624, Level 16, State 12, Procedure sp_xml_preparedocument, Line 1

XML document could not be created because server memory is low.

To analyze the dump download and Install Windows Debugger from http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

Step 1 (Load the memory dump file to debugger):

Open Windbg . Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Note : You will find SQLDump000#.mdmp in your SQL Server error log when you get the Exception or assertion.

Step 2 (Set the symbol path to Microsoft symbols server):

on command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3 (Load the symbols from Microsoft symbols server):

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4 (check if symbols are loaded):

Verify if symbols are loaded for SQL Server by using the debugger command lmvm

:028> lmvm sqlservr

start end module name

01000000 02ba8000 sqlservr (pdb symbols) c:\websymbols\sqlservr.pdb\93AACB610C614E1EBAB0FFB42031691D2\sqlservr.pdb

Loaded symbol image file: sqlservr.exe

Mapped memory image file: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

Image name: sqlservr.exe

Timestamp: Fri Oct 14 15:35:29 2005 (434F82E9)

CheckSum: 01B73B9B

ImageSize: 01BA8000

File version: 2005.90.1399.0

Product version: 9.0.1399.0

File flags: 0 (Mask 3F)

File OS: 40000 NT Base

File type: 1.0 App

File date: 00000000.00000000

Translations: 0409.04e4

CompanyName: Microsoft Corporation

ProductName: Microsoft SQL Server

InternalName: SQLSERVR

OriginalFilename: SQLSERVR.EXE

ProductVersion: 9.00.1399.06

FileVersion: 2005.090.1399.00

FileDescription: SQL Server Windows NT

LegalTrademarks: Microsoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

Comments: NT INTEL X86

Step 5 : (!address to display the memory information)

Use !address command to display the memory information of the process from dump.

0:028> !address -summary

——————– Usage SUMMARY ————————–

TotSize ( KB) Pct(Tots) Pct(Busy) Usage

686a7000 ( 1710748) : 81.58% 81.80% : RegionUsageIsVAD

579000 ( 5604) : 00.27% 00.00% : RegionUsageFree

4239000 ( 67812) : 03.23% 03.24% : RegionUsageImage

ea6000 ( 15000) : 00.72% 00.72% : RegionUsageStack

1e000 ( 120) : 00.01% 00.01% : RegionUsageTeb

122d0000 ( 297792) : 14.20% 14.24% : RegionUsageHeap

0 ( 0) : 00.00% 00.00% : RegionUsagePageHeap

1000 ( 4) : 00.00% 00.00% : RegionUsagePeb

1000 ( 4) : 00.00% 00.00% : RegionUsageProcessParametrs

1000 ( 4) : 00.00% 00.00% : RegionUsageEnvironmentBlock

Tot: 7fff0000 (2097088 KB) Busy: 7fa77000 (2091484 KB)

——————– Type SUMMARY ————————–

TotSize ( KB) Pct(Tots) Usage

579000 ( 5604) : 00.27% : <free>

4239000 ( 67812) : 03.23% : MEM_IMAGE

5fc000 ( 6128) : 00.29% : MEM_MAPPED

7b242000 ( 2017544) : 96.21% : MEM_PRIVATE

——————– State SUMMARY ————————–

TotSize ( KB) Pct(Tots) Usage

1b7bd000 ( 450292) : 21.47% : MEM_COMMIT

579000 ( 5604) : 00.27% : MEM_FREE

642ba000 ( 1641192) : 78.26% : MEM_RESERVE

Largest free region: Base 00000000 – Size 00010000 (64 KB)

Look at the RegionUsageHeap it is around 297792 KB and largest free region is just 64KB. We know SQL Server doesn’t use Heap’s extensively so normally the heap allocated by SQL Server will not go beyond few MB. In this case it is consuming around 290 MB and so other components which use MTL can easily fail.

Let us try to understand why the Heap is around 297792 KB and try to identify if there is a pattern.

Step 6: (Let us use !heap –s to display summary information about the heap)

0:028> !heap -s

LFH Key : 0x672ddb11

Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast

(k) (k) (k) (k) length blocks cont. heap

—————————————————————————–

000d0000 00000002 1024 896 896 6 1 1 0 0 L

001d0000 00008000 64 12 12 10 1 1 0 0

002c0000 00001002 1088 96 96 2 1 1 0 0 L

002e0000 00001002 64 52 52 3 2 1 0 0 L

007c0000 00001002 64 64 64 56 1 0 0 0 L

00d10000 00001002 256 24 24 8 1 1 0 0 L

340b0000 00001002 64 28 28 1 0 1 0 0 L

340c0000 00041002 256 12 12 4 1 1 0 0 L

342a0000 00000002 1024 24 24 3 1 1 0 0 L

34440000 00001002 64 48 48 40 2 1 0 0 L

61cd0000 00011002 256 12 12 4 1 1 0 0 L

61d10000 00001002 64 16 16 7 1 1 0 0 L

61d20000 00001002 64 12 12 4 1 1 0 0 L

62a90000 00001002 1024 1024 1024 1016 2 0 0 0 L

62b90000 00001002 1024 1024 1024 1016 2 0 0 0 L

62c90000 00001002 256 40 40 7 1 1 0 0 LFH

00770000 00001002 64 16 16 2 2 1 0 0 L

63820000 00001002 64 24 24 3 1 1 0 0 L

63830000 00001001 10240 10240 10240 160 21 0 0 bad

64230000 00001001 10240 10240 10240 160 21 0 0 bad

64c30000 00001001 10240 10240 10240 160 21 0 0 bad

65630000 00001001 10240 10240 10240 160 21 0 0 bad

66030000 00001001 10240 10240 10240 160 21 0 0 bad

66a30000 00001001 10240 10240 10240 160 21 0 0 bad

67430000 00001001 10240 10240 10240 160 21 0 0 bad

68130000 00001001 10240 10240 10240 160 21 0 0 bad

68b30000 00001001 10240 10240 10240 160 21 0 0 bad

69530000 00001001 10240 10240 10240 160 21 0 0 bad

69f30000 00001001 10240 10240 10240 160 21 0 0 bad

6a930000 00001001 10240 10240 10240 160 21 0 0 bad

6b330000 00001001 10240 10240 10240 160 21 0 0 bad

6bd30000 00001001 10240 10240 10240 160 21 0 0 bad

6c730000 00001001 10240 10240 10240 160 21 0 0 bad

6d130000 00001001 10240 10240 10240 160 21 0 0 bad

6db30000 00001001 10240 10240 10240 160 21 0 0 bad

6e530000 00001001 10240 10240 10240 160 21 0 0 bad

6ef30000 00001001 10240 10240 10240 160 21 0 0 bad

6f930000 00001001 10240 10240 10240 160 21 0 0 bad

70330000 00001001 10240 10240 10240 160 21 0 0 bad

70d30000 00001001 10240 10240 10240 160 21 0 0 bad

7a160000 00001001 10240 10240 10240 160 21 0 0 bad

7ab60000 00001001 10240 10240 10240 160 21 0 0 bad

7b560000 00001001 10240 10240 10240 160 21 0 0 bad

7d0d0000 00001001 10240 10240 10240 160 21 0 0 bad

7e030000 00001001 10240 10240 10240 160 21 0 0 bad

7ea30000 00001001 10240 10240 10240 160 21 0 0 bad

67f90000 00001003 256 16 16 14 1 1 0 bad

71850000 00001003 256 4 4 2 1 1 0 bad

71890000 00001003 256 4 4 2 1 1 0 bad

67fd0000 00001002 64 16 16 4 1 1 0 0 L

718d0000 00001003 256 40 40 3 1 1 0 bad

71910000 00001003 256 4 4 2 1 1 0 bad

71950000 00001003 256 4 4 2 1 1 0 bad

71990000 00001003 256 4 4 2 1 1 0 bad

67ff0000 00001002 64 16 16 4 1 1 0 0 L

719d0000 00001003 1792 1352 1352 5 2 1 0 bad

71a10000 00001003 256 4 4 2 1 1 0 bad

71a50000 00001003 256 4 4 2 1 1 0 bad

71a90000 00001002 64 16 16 1 0 1 0 0 L

—————————————————————————–

If you look at the above out put you can clearly identify a pattern. There are multiple created and each of them is 10 MB. But how to identify who actually created them?

Step 7:

Let us pickup one of the heap which is 10 MB and display all the entries (allocations) with in this 10 MB heap using !heap with –h parameter

Heap I have picked is 63830000.

0:028> !heap -h 63830000

Index Address Name Debugging options enabled

19: 63830000

Segment at 63830000 to 64230000 (00a00000 bytes committed)

Flags: 00001001

ForceFlags: 00000001

Granularity: 8 bytes

Segment Reserve: 00100000

Segment Commit: 00002000

DeCommit Block Thres: 00000200

DeCommit Total Thres: 00002000

Total Free Size: 00005048

Max. Allocation Size: 7ffdefff

Lock Variable at: 00000000

Next TagIndex: 0000

Maximum TagIndex: 0000

Tag Entries: 00000000

PsuedoTag Entries: 00000000

Virtual Alloc List: 63830050

UCR FreeList: 63830588

FreeList Usage: 00000000 00000000 00000000 00000000

FreeList[ 00 ] at 63830178: 6422de88 . 638ad7e0 Unable to read nt!_HEAP_FREE_ENTRY structure at 638ad7e0

(1 block )

Heap entries for Segment00 in Heap 63830000

63830608: 00608 . 00040 [01] – busy (40)

63830648: 00040 . 02808 [01] – busy (2800)

641b6698: 02808 . 02808 [01] – busy (2800)

……………………………………

Step 8: (Let us pickup one of the heap entry (allocation) and try to identify what is in it)

0:028> db 641b6698

641b6698 01 05 01 05 93 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

641b66a8 6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20 ng the heap for

641b66b8 64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b demo at MSSQLWIK

641b66c8 49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00 I.COM………..

641b66d8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

641b66e8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

641b66f8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

641b6708 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

0:028> db 63830648

63830648 01 05 08 00 89 01 08 00-49 61 6d 20 66 69 6c 69 ……..Iam fili

63830658 6e 67 20 74 68 65 20 68-65 61 70 20 66 6f 72 20 ng the heap for

63830668 64 65 6d 6f 20 61 74 20-4d 53 53 51 4c 57 49 4b demo at MSSQLWIK

63830678 49 2e 43 4f 4d 00 00 00-00 00 00 00 00 00 00 00 I.COM………..

63830688 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

63830698 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306a8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

638306b8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 …………….

Similarly you can dump multiple heap allocations to identify a pattern.

Now if you look at the memory dumped you see a string which might help you to identify the DLL which created the heap. There is a pattern in above heaps. All the heap allocations have below string

“Iam filing the heap for demo at MSSQLWIKI.COM”

Note : You can use L Size to dump more memory using db or dc command’s example db 63830648 L1500

Step 9:

Let us open the DLL which we loaded in SQL Server for testing using notepad and see if there is string which matches the pattern

Yes there is which proves that this DLL’s has caused the leak. In real time you may have to play with different heap allocations to identify the pattern.

This is one way to find the leaks from the memory dump after the leak has actually happened. It may not be always easy to find a pattern and identify the modules who allocated the memory, In such scenarios you may have to track the leak using the tools like debug diagnostic tool, UMDH etc.In the my next blog I will post how to track memory leak using Debug diagnostics tool.

Continued in Debugging memory Leaks using Debug diagnostic tool

SQL Server 2012 Memory

Troubleshooting SQL Server Memory

A significant part of SQL Server process memory has been paged out

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, Memory, Performance, SQL General, SQL Server Engine | Tagged: memory leak, Memory leaks in SQL Server, MTL erros in SQL Server, sqlserver memory, tracking memory leaks in SQL Server | 46 Comments »

SQL Server NUMA load distribution

Posted by Karthick P.K on November 22, 2012

When port affinity is not configured all the connection to SQL Server enters through single port and connections are tied to nodes in round robin basis.

We might end with Imbalance of Workload in NUMA systems under below conditions.

1. When a connection is tied (or) affinitized to a node, all the work from that connection will be completed on the same node (in which connection is directed) if plans are serial. We don’t consider the CPU load across the NUMA to pick up the node for serial plans, We use the same node in which connection is made for serial plan execution. Parallel query would use any NUMA node regardless of node this query came from. When all the queries execute from connections made to same node and if plans are also serial we might end up with overloading one Node while others are not fully used.

2. State of each nodes is internally maintained by SQL Server and updated every 2 seconds so there is remote possibility that all parallel queries end with same node some times and cause spike in one node, while the other nodes are unused.

3. When there is imbalance between the number of online schedulers in each node (Ex: 16-CPU in Node1 and 4-CPU in Node2 ) and if all plans are serial (assume we have set Max DOP 1) We might end up with overloading the schedulers in node with least schedulers. while the schedulers on other node is underused, similarly when memory is shared across nodes we share it equally irrespective of number of schedulers on each node so in this case first 16 schedulers would have got half of memory and 4 schedulers of second node would have got remaining half. So ensure you choose the CPU affinity carefully (Specially when you have installed SQL Server with limited processor license on system with larger number of CPU’s).

Image 1: sys.dm_os_schedulers (6 – CPU’S on node-0 and 1- CPU on node-1. Look at current task count)

Image 2 (Look at the current and pending task in node 0 and in node 1)

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Debugging, Memory, Performance, SQL Server Engine, SQL Server memory | Tagged: configuring NUMA nodes in sqlserver, CPU Affinity, SQLSERVER NUMA, Work load distribution in NUMA | 3 Comments »

SQL Server Query optimization

Posted by Karthick P.K on November 6, 2012

SQL Server Query optimization (or) Tuning slow queries in SQL Server.

How to troubleshoot (or) tune slow queries in SQL Server, Optimize slow queries to run faster , resolve error sql server -2147217871 Query timeout expired and make them run faster?

A query in considered to be slow when it is executing for longer duration than expected. Total duration of the query can be broken in to compile time, CPU time and Wait time.

Before you start troubleshooting the query which is running for longer duration, Identify if the query is slow because it is long waiting (or) Long running (or) Long compiling.

Compile time:Time taken to compile the query. compile time can be identified by looking at the

1. CompileTime=”n” in XML plan

2. SQL Server parse and compile time when Set statistics time on is enabled.

CPU time: Time taken by the query in CPU (Execution time – (compile time+ wait time). CPU time can be identified by looking at the

1. CPU column in profiler.

2. CPU time under SQL Server Execution Times when statistics time on is enabled.

Execution time: Time taken by the query for complete execution ( Execution time = CPU time (CPU time for compilation+execution) +Wait time). Total duration of the query can be identified by using the

1.Duration column in profiler

2. SQL Server Execution Times, elapsed times when statistics time on is enabled.

What is long waiting query?

A query is considered to be long waiting query, when it spend most of its time waiting for some resource.

How to identify if the query is long waiting?

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set .

When a query is waiting for a resource (such as lock, network I/O, Page_I/O Etc) it will not consume CPU. So if you see duration being higher than CPU (Difference between Duration and CPU is wait time),It indicates that the query has spent large amount of time waiting for some resource.

Let us see an example of long waiting query. I have collected profiler trace while executing the query.

set statistics io on

set statistics time on

–Place your query here

select top 10000 * from a

set statistics io off

set statistics time off

Look at the Duration and CPU column in the profiler Cpu=256 and duration =1920. So this query has spent majority of time waiting for some resource.

Look at the output of statistics time and statistics I/O in above image.

SQL Server has spent only 2 milliseconds compiling the query and 256 milliseconds on CPU, but the overall duration was 1920 milliseconds so the query has spent maximum time waiting for some resource.

Identify the resource in which this query is waiting on using one of the steps listed below.

1. Look at the wait type column of the sysprocesses for the spid which is executing query while the query is executing.

2. If there is no other activity on the server collect sys.dm_os_wait_stats output before and after the query execution and identify the wait (Will not help in tuning queries running for short duration)

3. Collect XEvent to gather the wait stats of individual query.

Once you identify the resource in which the query is waiting on tune the resource. Most of the times queries would be slow waiting for below resource.

PAGEIOLATCH_* or Write log: This indicates I/O resource bottleneck follow the detailed troubleshooting steps mentioned in This Link to fix the I/O bottleneck. If you find SQL Server spawning excessive I/O Create necessary indexes.

a. Logical reads + Physical reads in statistics I/O output (Refer above image) or Reads and writes in profiler will indicate the I/O posted by this query. If you see very high reads for query compared with the result rest retuned by query it is an indication of missing indexes or bad plan. Create necessary indexes (You can use DTA for index recommendations.).

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO.

a.This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid The wait_resource is the page number (the format is dbid:file:pageno) that is being accessed.

b. We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine whether contention is for allocation, data or text.

c. If the pages that SQL Server is most frequently waiting on are in Tempdb database ,check the wait resource column for a page number in dbid 2 Ex(2:1:1 or 1:1:2). Enable TF 1118 and increase the number of TEMPDB data files and size them equally (You may be facing tempdb llocation latch contention mentioned in http://support.microsoft.com/kb/328551)

d. If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table. In this case we need to choose a different clustered index key to spread the work across different pages.

LATCH_*: Non-buf latch waits can be caused by variety of things. We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101).

a. A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

b. Auto Grow and auto shrink while query is executed.

c. Queries going for excessive parallelism.

Blocking (LCK*): Use the query in This Link to identify the blocking. Tune the head blocker.

Asynch_network_io (or) network IO: Keep the result set returned by the query smaller. Follow detailed troubleshooting refer This Link

Resource_semaphore waits: Make sure there is no memory pressure on the server Follow steps in This Link for detailed troubleshooting.

SQL Trace: Stop all the profiler traces running on the server. Identify the traces which are running on the server using the query in This Link

Cx packet: Set the Max degree of parallelism. But remember Cxpacket wait type is not always a problem.

a. For servers that have eight or less processors, use the following configuration where N equals the number of processors: max degree of parallelism = 0 to N .

b. For servers that use more than eight processors, use the following configuration: max degree of parallelism = 8.Refer This Link

SOS_SCHEDULER_YIELD : Identify if there is CPU bottleneck on the server. This waiting means that the thread is waiting for CPU.

a. SQL Server worker thread’s Quantum target is 4ms which means the thread(worker) Will ( is expected to) yield back to SQL Server scheduler when it exceeds 4ms and before it yields back it check if there are any other runnable threads, If there is any runnable threads then the thread which is in top of runnable list is scheduled and current thread will go to the tail of the runnable list and will get rescheduled when the other threads which are already waiting in SOS Scheduler (runnable list) finishes its execution or quantum. The time thread spends in runnable list waiting for its quantum is accounted as SOS_SCHEDULER_YIELD. You will see this type when multiple threads are waiting to get CPU cycle. Follow trouble shooting the steps mentioned This Link

Important: In SQL Server instances when there more than 1 CPU it is possible that the CPU is higher than the duration. Because CPU is sum of time spend by query in all the CPU’s when choosing a parallel whereas the duration is actual duration of the query.

What is long running query?

A query is considered to be long running query, when it spend most of its time on CPU and not waiting for some resource.

How to identify if the query is long running ?

Long running query can be identified by comparing the CPU and duration column in profiler (or) CPU and elapsed time when statistics time on is set . If the CPU and duration is close than the query is considered to be long running. If the query is long running identify where the query spend the time ,It could be for compiling or post compilation (For executing the query). compare the duration of the query with CompileTime (XML plan compile time (or) SQL Server parse and compile time when statistics time is on refer above image).

High Compile time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on).Compile time will normally be in few millisecond . Follow the below steps if you see high compile time

1. Identify if you have large token perm refer http://support.microsoft.com/kb/927396

2. Create necessary indexes and stats. Tune the query manually (or) in DTA and apply the recommendation

3. Reduce the complexity of query. Query which joins multiple tables (or) having large number of IN clause can taking a while to compile.

4. You can reduce the compile’s by using force parameterization option.

High CPU time:

Compare the duration of the query with Compile Time (XML plan compile time (or) SQL Server parse and compile time when statistics time is on). If the compile time is very low compared to the duration. Then follow the below steps.

1. Update the stats of tables and indexes used by the query (If the stats are up to date Estimated rows and estimated execution will be approximately same in execution plan .If there is huge difference stats are out dated and requires update) .

2. Identify if the query has used bad plan because of parameter sniffing (If the ParameterCompiledValue and ParameterRuntimeValue is different in XML plan). Refer THIS LINK to know more about Parameter Sniffing

3. If updating the stats and fixing the parameter sniffing doesn’t resolve the issue it is more likely optimizer is not able to create efficient plan because of lack of indexes and correct statistics. Run the query which is driving the CPU in database tuning advisor and apply the recommendations. (You will find missing index detail in xml plan but DTA is more efficient).

4. If the query which is running longer and consuming CPU is linked server query try changing the security of linked server to ensure linked server user has ddl_admin or dba/sysadmin on the remote server. More details regarding the issue in THIS LINK.

5. Ensure optimizer is not aborting early and creating bad plan. For details refer THIS LINK

6. Ensure the query which is spiking the CPU doesn’t have plan guides (xml plan will have PlanGuideDB attribute. Also sys.plan_guides will have entries) and query hints(index= or (option XXX join) or inner (Join Hint) join).

7. Ensure that SET options are not changed.

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer:

The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in Performance, SQL General, SQL Query | Tagged: query optimization, query performance tuning, Query tuning, query tuning in sql server, sql performance, sql query optimizer, sql query tuning, sql server -2147217871, sql server query tuning, Tuning sql server query | 11 Comments »

SQL Server 2012 Memory

Posted by Karthick P.K on October 21, 2012

SQL Server 2012 has made many changes to the memory manager to govern the SQL Server memory consumption in efficient way compared with earlier versions. Important changes to SQL Server 2012 memory which every DBA should be aware of is documented in this blog. If you are not familiar with the SQL Server memory architecture of earlier versions I would recommend reading THIS ARTICLE before you continue with changes in Denali memory manager.

Max Server Memory

In previous versions of SQL Server “Max Server Memory” controlled the Maximum physical memory Single page allocator (BPOOL) can consume in SQL Server user address space.