MSSQLWIKI

Karthick P.K on SQL Server

Posts Tagged ‘What is SQLSOS?’

SQL Server Operating system (SOS) – Series 3

Posted by Karthick P.K on February 11, 2013

Thread synchronization

When we discussed about thread I mentioned In multi-threaded applications each thread has to synchronize their activities among other threads. Sometimes a thread has to wait for other thread to complete before it can execute (Ex: SQL Server blocking)   sometimes a thread has to synchronize with other thread and continue execution (Ex: CX packet).  If we allow multiple thread access the same resource they might get corruption or inconsistency.

Windows offers different ways to synchronize multiple threads before we jump in to different synchronization techniques let us see why thread synchronization is very important using the small program below.

In the below program I have declared two global called a and b and set the values of this globals to 0 . We define the number of thread which we are going to create in global named Threadcount (64).

We create 64 thread in main function and each thread start executing a function called Submain. In Submain each thread increases the value of a and b  1000 times.

So ideally value of A and B has to be 64,000 at the end of program execution  (64 Threads *1000 increments).

Let us check what happens.

#include <windows.h>
#include <string>
#include <iostream>
#include <process.h>    /* _beginthread, _endthread */
long a=0;
long b=0;
long g_InUse = FALSE;
long g_fResourceInUse = FALSE;
int Threadcount=64;
int s=Threadcount;
bool d=FALSE;

void Submain(void *x)
{
       for (int L=0;L<1000;L++)
              {

                     a=a++;

                     while (InterlockedExchange (&g_InUse, TRUE) == TRUE)
                     {
                           Sleep(0);
                     }
       //Sleep(10); //-->How Spinlock can cause CPU Spike
       b=b++;

       InterlockedExchange (&g_InUse, FALSE);
       }

/*
s=s-1;  //Simple synchronization technique. May be useful if you like to increase the thread count WaitForMultipleObjects support value defined for MAXIMUM_WAIT_OBJECTS 64
       if(s==0)
       {
              d=TRUE;
       }
*/
_endthread();
}

void main()

{

HANDLE *hThreads;
hThreads = new HANDLE[Threadcount] ;
for (int i=0;i<Threadcount;i++)
{
hThreads[i]=  CreateThread(NULL,NULL,(LPTHREAD_START_ROUTINE  )Submain,  NULL,  0,  NULL);

              if (hThreads[i]==NULL)
              {
                     printf("\nThread creation failed for thread %d with error %d",i,GetLastError());
              }

}
SetLastError(0);

DWORD rw=WaitForMultipleObjects(Threadcount,hThreads,true,INFINITE);

//while(!d); //Simple synchronization technique

printf("Value of a is:%d\n" ,a);
printf("Value of b is:%d\n" ,b);
system("pause");
}

 
clip_image002

Why the value of a and b are different and why b is accurate while a is not. If you look at the program closely atomic access to global b is guaranteed using the InterlockedExchange function so only one thread could access global b any time while there was no synchronization for global a so the end value is incorrect.

Thread synchronization can be achieved in user mode or using kernel objects

User mode thread synchronization: Threads can be synchronized in User mode using interlocked family functions or using critical sections. User mode thread synchronization is faster than using kernal objects. In the above program we used interlocked family function to synchronize the threads to access of gloabal b.  interlocked family functions should be used with caution on multiprocessor system and should be avoided in uniprocessor machines.

Spinlock: A method by which we continuously check  if the resource is available. I the above program global a and b are resource. Since we guaranteed atomic access to global b. Only one thread can access it at any time so what about the other threads they continuously spin to check if the resource becomes available.  Look at the below portion of above program.  While loop checks the value of g_InUse. If the value is FALSE the resource was not is use and calling thread will set the value to TRUE so other threads cannot access it and continue the execution. Once it completes its task  i.e. incrementing the value of b it sets the value of g_InUse to false so others can access it.  If the value is false then some other thread is currently using the global resource b and the while loop continues to spin.

while (InterlockedExchange (&g_InUse, TRUE) == TRUE)
                     {
                           Sleep(0);
                     }
                     //Sleep(10); //-->How Spinlock can cause CPU Spike
                     b=b++;
              InterlockedExchange (&g_InUse, FALSE);
                    }

 

Incorrect use of spinlock can waste CPU and can cause extreme CPU spikes. In the above program uncomment the line “Sleep(10); //–>How Spinlock can cause CPU Spike” and build the exe and execute it. Look at your task manger and check the CPU utilization. It would be extremely high because each time a thread takes lock on  of g_InUse. It sleeps for 10 milliseconds, increments the value id b and then releases the lock. While the other threads continuously spins to check if the resources are available thus causing CPU spike. In real time a thread may not sleep after taking a lock but assume it is performing some task which takes time and other threads will keep spinning consuming CPU.

Critical section: Like interlocked family functions critical section is also used to guarantee atomic  access to a resource. Major difference between the interlocked functions and critical section is  when criticalsection is owned by other thread calling thread is immediately placed in waitstate, so thread transits from user to kernel mode  and this transition is expensive (about 1000 CPU cycles as per Jeffrey Richter)  . when the thread which owns the critical section releases the critical section one of the waiting thread is signaled and scheduled. Programmers should make wise decision on when to use interlocked family functions and critical sections.  Above program caused severe CPU spike after we uncommented line “Sleep(10); //–>How Spinlock can cause CPU Spike” let us do the same implementation using Critical section in below program.

#include <windows.h>
#include <string>
#include <iostream>
#include <process.h>    /* _beginthread, _endthread */
long a=0;
long b=0;
int Threadcount=64;
int s=Threadcount;
bool d=FALSE;
CRITICAL_SECTION  gcs;

void Submain(void *x)
{
       for (int L=0;L<1000;L++)
              {

                     a=a++;
                     EnterCriticalSection(&gcs);
                     Sleep(10); //-->How Spinlock can cause CPU Spike
                     b=b++;
                     LeaveCriticalSection(&gcs);
              }

/*
    s=s-1;  //Simple synchronization technique. May be useful if you like to increase the thread count WaitForMultipleObjects support value defined for MAXIMUM_WAIT_OBJECTS 64
       if(s==0)
       {
              d=TRUE;
       }
*/
_endthread();
}

void main()

{

HANDLE *hThreads;
hThreads = new HANDLE[Threadcount] ;
InitializeCriticalSection(&gcs);
for (int i=0;i<Threadcount;i++)
{
hThreads[i]=  CreateThread(NULL,NULL,(LPTHREAD_START_ROUTINE  )Submain,  NULL,  0,  NULL);

              if (hThreads[i]==NULL)
              {
                     printf("\nThread creation failed for thread %d with error %d",i,GetLastError());
              }

}
SetLastError(0);

DWORD rw=WaitForMultipleObjects(Threadcount,hThreads,true,INFINITE);
DeleteCriticalSection(&gcs);
//while(!d); //Simple synchronization technique

printf("Value of a is:%d\n" ,a);
printf("Value of b is:%d\n" ,b);
system("pause");
}

 

After building the above program run the executable and you will notice that it doesn’t consume high CPU. Does this mean critical section is better than interlock functions? No. It depends.  In this exe lock is held for long time so critical section was ideal.  Assume each thread would have got access to the resource after spinning once (or) twice  then definitely interlock functions would have been an ideal choice because we would have avoided transition of each thread from user mode to kernel mode which is expensive. There is also a API called InitializeCriticalSectionAndSpinCount. What is the difference between InitializeCriticalSection and InitializeCriticalSectionAndSpinCount? InitializeCriticalSectionAndSpinCount Spins to acquire resource  n mumber of time and only if all attempts fail then the thread transits to kernel mode.

Thread deadlock: Similar to SQL Server locks what happens when two threads wait to acquire critical sections owned on resource owned by other? If there is no timeout threads will attempt to wait forever and will never get scheduled. In SQL Server we have deadlock monitor to detect this condition and rollback one of the transaction but windows doesn’t offer any such facility.

Orphan or unreleased critical section: When a thread takes critical section it is expected to release it, Assume a flaw in code or exception caused a thread to abort after taking a critical section and before releasing it, Critical section taken by the terminate thread is never destroyed and all the other threads will wait indefinitely on it. 

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in SQL Server Engine, SQLServer SOS | Tagged: , , , , , , , , | 2 Comments »

SQL Server Operating system (SOS) – Series 2

Posted by Karthick P.K on January 13, 2013

Context Switching:
When a thread is yielded from CPU windows stores the CONTEXT (current state such as CPU registers, program counters Etc) information of the current thread and loads the CONTEXT of the new thread which will run in the CPU.  
Why? A thread is yielded when executing an instruction, How the thread will resume from same point when it is rescheduled in CPU. Context information is stored while yielding and loaded when thread in rerun.
Note: yielding from CPU=Coming out of CPU

If there is high Context switching then system would spend more time on doing context switching than doing meaningful work.
 
Yielding:
When a thread moves out of CPU it is called as yielding.

When a thread can yield?
Thread which is running on CPU can yield out of the CPU under following key condition

Voluntary yield
Thread decides to yield by itself because of the logic in code executed by the thread. Generally a thread will voluntarily yield by calling Sleep(0) or SwitchToThread. When a thread voluntarily yields it is placed at the end of runnable list.
Ex: SQL Server thread will yield after sorting 64K of sort records.

Preempted
Thread which is running on the CPU will be forced to yield from CPU when a thread with higher priority is ready to run. When a thread is preempted it is placed in the beginning of the runnable list. 

Quantum end
All the threads which is executed in operating system will get a time slice called as quantum. When a thread completes its quantum it is yielded and next thread is run. If there is no other thread is ready to run than thread run for another quantum.

Termination
Thread is terminated when it finishes execution and destroyed by calling TerminateThread

Thread and process priorities

windows supports thread priority level ranging from 0 (Lowest) to 31 (Highest). If all the threads have same priority they are scheduled in  round robin basis, but in reality threads running on OS will have different priorities. Among all the threads which can be run windows scheduler picks thread with highest priority to run first. Priority of a thread can be changed using WINAPI SetThreadPriority, Similarly priority of a process can be set while creating the process  (WINAPI CreateProcess  dwCreationFlags ) ,using WINAPI SetPriorityClass after the process is created and by using tools like task manager.

Let us attach the debugger to SQL Server process and see how to view the threads, thread stack and how SQL Server threads wait with out being scheduled.  

Download the windows debugger from below link

Windbg 32-bit package:

http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi

Windbg X64 package:

http://msdl.microsoft.com/download/symbols/debuggers/dbg_amd64_6.11.1.404.msi

 

1. Start SQL Server in your test system.

2. Attach the debugger to SQL Server process. Refer below image. If you have more than one instance use the process id to attach with correct SQL Server process.

 

clip_image002

3. On command window type

.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

4. Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

5. Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

lmvm sqlservr

6. Type

!peb  to display the information about process from process environment block.

7.  Type ~ to display all the threads in the process. First column in the output represents thread ID.

8. To  look at the stack of a specific thread,switch to the thread using thread ordinal (or) thread ID.

Debugger command ~ displays all the thread’s of the process.

Thread ordinal: is decimal value used by debugger to identify the thread starts from 0 (First columns in below output).

Thread ID : Is the ID assigned to each thread by operating to system. You can switch to a thread using thread ID by debugger command ~~[ThreadID]s     (4th column in ~ output )

In the below image I have printed all the thread’s and stack of thread ordinal 8 which is scheduler monitor thread.

 

clip_image002[8]

9. Type  g in command prompt to resume the process.

10. Connect to SQL Server from management studio. Run select * from sysprocesses. All the sessions which has a non zero value for KPID has a valid windows thread associated with the session. When executing queries which choose parallel plan there will be more than one row for same session and each rows will have different KPID.

11. To look at the thread stack of a session which is currently executing a task or background process . Convert KPID value associated with session in sysprocess in Hexadecimal value and type below command window of debugger.

~~[Hex value of a thread]s

Type kC

12.  Let us create a small blocking scenario to understand how threads wait in SQL Server.

Session-1

create table a (A int)
go
insert into a  values (1);
go
begin transaction
update a set A =1+1

Session-2

select * from a

Session-2 will be blocked. Look at the stack of blocked thread

13. Run select * from sysprocesses where blocked<>0

Identify the KPID of the session which is blocked.

14. Convert KPID in to hexadecimal value

15. Break the debugger to execute the debugger commands (CTRL+B) or 7th Icon in menu bar.

16. Look at the stack of the thread which is waiting for lock (Blocked) by using the Hex value of KPID

~~[Hex value of KPID]s

kC

clip_image002[10]

 

17. Above thread from second session which we created is waiting for an event using WaitForsingleObject. When the first session releases the lock this thread will be signaled and resumes execution.

We will see the details about WaitForsingleObject ,Waitformultipleobjects, Event (Manual auto reset) etc. in more details in forthcoming blogs

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

 

Posted in SQL Server Engine, SQLServer SOS | Tagged: , , , , , , , , | 2 Comments »