MSSQLWIKI

Karthick P.K on SQL Server

Archive for February, 2013

SQL Server Operating system (SOS) – Series 3

Posted by Karthick P.K on February 11, 2013

Thread synchronization

When we discussed about thread I mentioned In multi-threaded applications each thread has to synchronize their activities among other threads. Sometimes a thread has to wait for other thread to complete before it can execute (Ex: SQL Server blocking)   sometimes a thread has to synchronize with other thread and continue execution (Ex: CX packet).  If we allow multiple thread access the same resource they might get corruption or inconsistency.

Windows offers different ways to synchronize multiple threads before we jump in to different synchronization techniques let us see why thread synchronization is very important using the small program below.

In the below program I have declared two global called a and b and set the values of this globals to 0 . We define the number of thread which we are going to create in global named Threadcount (64).

We create 64 thread in main function and each thread start executing a function called Submain. In Submain each thread increases the value of a and b  1000 times.

So ideally value of A and B has to be 64,000 at the end of program execution  (64 Threads *1000 increments).

Let us check what happens.

#include <windows.h>
#include <string>
#include <iostream>
#include <process.h>    /* _beginthread, _endthread */
long a=0;
long b=0;
long g_InUse = FALSE;
long g_fResourceInUse = FALSE;
int Threadcount=64;
int s=Threadcount;
bool d=FALSE;

void Submain(void *x)
{
       for (int L=0;L<1000;L++)
              {

                     a=a++;

                     while (InterlockedExchange (&g_InUse, TRUE) == TRUE)
                     {
                           Sleep(0);
                     }
       //Sleep(10); //-->How Spinlock can cause CPU Spike
       b=b++;

       InterlockedExchange (&g_InUse, FALSE);
       }

/*
s=s-1;  //Simple synchronization technique. May be useful if you like to increase the thread count WaitForMultipleObjects support value defined for MAXIMUM_WAIT_OBJECTS 64
       if(s==0)
       {
              d=TRUE;
       }
*/
_endthread();
}

void main()

{

HANDLE *hThreads;
hThreads = new HANDLE[Threadcount] ;
for (int i=0;i<Threadcount;i++)
{
hThreads[i]=  CreateThread(NULL,NULL,(LPTHREAD_START_ROUTINE  )Submain,  NULL,  0,  NULL);

              if (hThreads[i]==NULL)
              {
                     printf("\nThread creation failed for thread %d with error %d",i,GetLastError());
              }

}
SetLastError(0);

DWORD rw=WaitForMultipleObjects(Threadcount,hThreads,true,INFINITE);

//while(!d); //Simple synchronization technique

printf("Value of a is:%d\n" ,a);
printf("Value of b is:%d\n" ,b);
system("pause");
}

 
clip_image002

Why the value of a and b are different and why b is accurate while a is not. If you look at the program closely atomic access to global b is guaranteed using the InterlockedExchange function so only one thread could access global b any time while there was no synchronization for global a so the end value is incorrect.

Thread synchronization can be achieved in user mode or using kernel objects

User mode thread synchronization: Threads can be synchronized in User mode using interlocked family functions or using critical sections. User mode thread synchronization is faster than using kernal objects. In the above program we used interlocked family function to synchronize the threads to access of gloabal b.  interlocked family functions should be used with caution on multiprocessor system and should be avoided in uniprocessor machines.

Spinlock: A method by which we continuously check  if the resource is available. I the above program global a and b are resource. Since we guaranteed atomic access to global b. Only one thread can access it at any time so what about the other threads they continuously spin to check if the resource becomes available.  Look at the below portion of above program.  While loop checks the value of g_InUse. If the value is FALSE the resource was not is use and calling thread will set the value to TRUE so other threads cannot access it and continue the execution. Once it completes its task  i.e. incrementing the value of b it sets the value of g_InUse to false so others can access it.  If the value is false then some other thread is currently using the global resource b and the while loop continues to spin.

while (InterlockedExchange (&g_InUse, TRUE) == TRUE)
                     {
                           Sleep(0);
                     }
                     //Sleep(10); //-->How Spinlock can cause CPU Spike
                     b=b++;
              InterlockedExchange (&g_InUse, FALSE);
                    }

 

Incorrect use of spinlock can waste CPU and can cause extreme CPU spikes. In the above program uncomment the line “Sleep(10); //–>How Spinlock can cause CPU Spike” and build the exe and execute it. Look at your task manger and check the CPU utilization. It would be extremely high because each time a thread takes lock on  of g_InUse. It sleeps for 10 milliseconds, increments the value id b and then releases the lock. While the other threads continuously spins to check if the resources are available thus causing CPU spike. In real time a thread may not sleep after taking a lock but assume it is performing some task which takes time and other threads will keep spinning consuming CPU.

Critical section: Like interlocked family functions critical section is also used to guarantee atomic  access to a resource. Major difference between the interlocked functions and critical section is  when criticalsection is owned by other thread calling thread is immediately placed in waitstate, so thread transits from user to kernel mode  and this transition is expensive (about 1000 CPU cycles as per Jeffrey Richter)  . when the thread which owns the critical section releases the critical section one of the waiting thread is signaled and scheduled. Programmers should make wise decision on when to use interlocked family functions and critical sections.  Above program caused severe CPU spike after we uncommented line “Sleep(10); //–>How Spinlock can cause CPU Spike” let us do the same implementation using Critical section in below program.

#include <windows.h>
#include <string>
#include <iostream>
#include <process.h>    /* _beginthread, _endthread */
long a=0;
long b=0;
int Threadcount=64;
int s=Threadcount;
bool d=FALSE;
CRITICAL_SECTION  gcs;

void Submain(void *x)
{
       for (int L=0;L<1000;L++)
              {

                     a=a++;
                     EnterCriticalSection(&gcs);
                     Sleep(10); //-->How Spinlock can cause CPU Spike
                     b=b++;
                     LeaveCriticalSection(&gcs);
              }

/*
    s=s-1;  //Simple synchronization technique. May be useful if you like to increase the thread count WaitForMultipleObjects support value defined for MAXIMUM_WAIT_OBJECTS 64
       if(s==0)
       {
              d=TRUE;
       }
*/
_endthread();
}

void main()

{

HANDLE *hThreads;
hThreads = new HANDLE[Threadcount] ;
InitializeCriticalSection(&gcs);
for (int i=0;i<Threadcount;i++)
{
hThreads[i]=  CreateThread(NULL,NULL,(LPTHREAD_START_ROUTINE  )Submain,  NULL,  0,  NULL);

              if (hThreads[i]==NULL)
              {
                     printf("\nThread creation failed for thread %d with error %d",i,GetLastError());
              }

}
SetLastError(0);

DWORD rw=WaitForMultipleObjects(Threadcount,hThreads,true,INFINITE);
DeleteCriticalSection(&gcs);
//while(!d); //Simple synchronization technique

printf("Value of a is:%d\n" ,a);
printf("Value of b is:%d\n" ,b);
system("pause");
}

 

After building the above program run the executable and you will notice that it doesn’t consume high CPU. Does this mean critical section is better than interlock functions? No. It depends.  In this exe lock is held for long time so critical section was ideal.  Assume each thread would have got access to the resource after spinning once (or) twice  then definitely interlock functions would have been an ideal choice because we would have avoided transition of each thread from user mode to kernel mode which is expensive. There is also a API called InitializeCriticalSectionAndSpinCount. What is the difference between InitializeCriticalSection and InitializeCriticalSectionAndSpinCount? InitializeCriticalSectionAndSpinCount Spins to acquire resource  n mumber of time and only if all attempts fail then the thread transits to kernel mode.

Thread deadlock: Similar to SQL Server locks what happens when two threads wait to acquire critical sections owned on resource owned by other? If there is no timeout threads will attempt to wait forever and will never get scheduled. In SQL Server we have deadlock monitor to detect this condition and rollback one of the transaction but windows doesn’t offer any such facility.

Orphan or unreleased critical section: When a thread takes critical section it is expected to release it, Assume a flaw in code or exception caused a thread to abort after taking a critical section and before releasing it, Critical section taken by the terminate thread is never destroyed and all the other threads will wait indefinitely on it. 

If you liked this post do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group MSSQLWIKI

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Disclaimer
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.

Posted in SQL Server Engine, SQLServer SOS | Tagged: , , , , , , , , | 2 Comments »