MSSQLWIKI

Karthick P.K on SQL Server

Archive for the ‘SQL Server I/O’ Category

SQL Server Latch & Debugging latch time out

Posted by Karthick P.K on September 7, 2012

  

 

In a multithreaded process what would happens when a one thread updates a data or index page in memory while second thread is reading the same page?

What will happen when 1st  thread reads a data/index page in memory while 2nd thread is freeing the same page from memory?

Answer: We would end up with data or data structure inconsistency. To avoid inconsistency SQL Server uses Synchronization Mechanisms like Locks,Latches and Spinlocks.

 

We will discuss few key points about latches and how to debug latch timeout dumps in this blog.

What is Latch ?

They control the concurrent access to data pages and structures by multiple threads.  Latches provide physical data consistency of data pages and provide synchronization for data structures. Latches are not controllable by user like locks.

 

Types of the Latch:

Buffer (BUF) Latch

Used to synchronize access to BUF structures and their associated database pages.

 

Buffer “IO” Latch

A subset of BUF latches used when the BUF and associated data/index page is in the middle of an IO operation (Reading page from disk or writing page to disk).

 

Non-Buffer (Non-BUF) Latch

These are latches that are used to synchronize general in-memory data structures generally used by queries/tasks executed by parallel threads, auto grow operations , shrink operations etc. 

 

Latch modes

 

Keep (KP) Latches

Used to ensure that the page is not released from memory while it is in use. 

Shared (SH) Latches

Used for read-only access to data structures and prevent write access by others threads.

This mode allows shared access. 

SH is compatible with KP, SH, and UP.  It should be noted that although in general SH implies read-only access, it is not always the case. For buffer latches SH is the minimum mode required in order to read a data page.

Update (UP) Latches

Allows read access to the data structure(Compatible with SH and KP), but prevents other EX-latch access. 

Used for write operations when torn page detection is off and when AWE is not enabled.

Exclusive (EX) Latches

Prevents any read activity from occurring on the latched structure. EX is only compatible with KP.

Used during read IO during write IO when torn page detection is on or AWE is enabled.

Destroy (DT) Latches

Used when removing BUFs from the buffer pool, either by adding them to the free list or unmapping AWE buffers. 

 

 

Latch compatibility

  KP SH UP EX DT
KP Y Y Y Y N
SH Y Y Y N N
UP Y Y N N N
EX Y N N N N
DT N N N N N

 

 

How do you identify Latch contention?

 

Latch contention can be identified using below wait types in sysprocesses.

 

PAGEIOLATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on a physical I/O of a buffer pool page to complete. 

                                            1. PAGEIOLATCH_* are commonly solved by tuning the queries which are performing heavy IO (Commonly by adding, changing and removing indexes (or) statistics to reduce the amount of physical IO).

                                 2. Identifying if there is disk bottleneck and fixing them (Pageiolatch wait times (ex > 30 ms))

                      

                      

PAGELATCH_*: This waittype in sysprocesses indicates that SQL Server is waiting on access to a database page, but the page is not undergoing physical IO. 

1.       This problem is normally caused by a large number of sessions attempting to access the same physical page at the same time. We should Look at the wait resource of the spid. The wait_resource is the page number (the format is  dbid:file:pageno)

          that is being accessed. 

2.       We can use DBCC PAGE to identify object or type of the page in which we have the contention. Also it will help us to determine  whether contention  is for allocation, data or text.

3.       If the pages that SQL Server is most frequently waiting on are in tempdb database ,check the wait resource column for a page number in dbid 2. You may be facing tempdb allocation latch contention mentioned in    http://support.microsoft.com/kb/328551

4.       If the page is in a user database, check to see if the table has a clustered index on a monotonic key such as an identity where all threads are contending for the same page at the end of the table.  In this case we need to choose a different

          clustered index key to spread the work across different pages.

 

LATCH_*:    Non-buf latch waits can be caused by variety of things.  We can use the wait resource column in sysprocesses to determine the type of latch involved(KB 822101). 

1.       A very common LATCH_EX wait is due to running a profiler trace or sp_trace_getdata Refer KB 929728 for more information.

2.       Auto Grow and auto shrink.

 

 

 

When a latch is requested by thread and If  that latch cannot be granted immediately because of some other thread holding a incompatible latch on same page or data structure then  the requestor must wait for the latch to be grantable.  Warning messages like one below is printed in SQL Server error log and a mini dump with all the threads is captures if the wait interval reaches 5 minutes (waittime 300). The warning message differs for buffer and non-buffer latches.

 

 

844: Time out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p.  Continuing to wait.

 

846: A time-out occurred while waiting for buffer latch — type %d, bp %p, page %d:%d, stat %#x, database id: %d, allocation unit Id: %I64d%ls, task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Not continuing to wait.

 

847: Timeout occurred while waiting for latch: class ‘%ls’, id %p, type %d, Task 0x%p : %d, waittime %d, flags 0x%I64x, owning task 0x%p. Continuing to wait.

 

Break up of above warning

type

The latch mode of the current latch acquire request.  This is a numerical value with the following mapping:  0 – NL (not used); 1 – KP; 2 – SH; 3 – UP; 4 – EX; 5 – DT.

 

task

Task for which we are trying to acquire latch.

 

Waittime

The total time waited for this latch acquire request in seconds.

 

owning task

The address of the Task that owns the latch, if available.

 

bp (Buffer latches only)

The address of the BUF structure corresponding to this buffer latch.

 

page (Buffer latches only.)

The page id for the page currently contained in the BUF structure.

 

database id (Buffer latches only.)

The database id for the page in the BUF.

 

 

Like troubleshooting blocking issues in SQL Server when there is a latch contention or timeout dump identify the owner of latch and troubleshoot why the latch is held by the owner for long time.

 

When there is latch timeout dump you will see a warning message similar to one below. Warning error message printed in SQL server errorlog before the dump is very important to find the owner thread of latch.

 

{

 

2012-01-18 00:52:03.16 spid69      A time-out occurred while waiting for buffer latch — type 4, bp 00000000ECFDAA00, page 1:6088, stat 0x4c1010f, database id: 4, allocation unit Id: 72057594043367424, task 0x0000000006E096D8 : 0, waittime 300, flags 0x19,

owning task 0x0000000006E08328. Not continuing to wait.

spid21s     **Dump thread – spid = 21, PSS = 0x0000000094622B60, EC = 0x0000000094622B70

spid21s     ***Stack Dump being sent to E:\Data\Disk1\MSSQL.1\MSSQL\LOG\SQLDump0009.txt

spid21s     * *******************************************************************************

spid21s     * BEGIN STACK DUMP:

spid21s     *   02/28/12 00:32:03 spid 21

spid21s     * Latch timeout

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_COUNT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

Timeout occurred while waiting for latch: class ‘ACCESS_METHODS_HOBT_VIRTUAL_ROOT’, id 00000002D8C32E70, type 2, Task 0x00000000008FCBC8 : 7, waittime 300, flags 0x1a, owning task 0x00000000050E1288. Continuing to wait.

}

From the error message above we can easily understand we are trying to acquire latch on database id: 4, page 1:6088 (6088 page of first file) and we timed out because task 0x0000000006E08328 (owning task 0x0000000006E08328 in warning message)  is holding a latch on it.

Note: Task is simply a work request to be performed by the thread. (such as system tasks, login task, Ghost cleanup task etc.). Threads which execute the task will take required latches on need.

Let us see how to analyze latch timeout dump and get the owning thread of the Latch using the  owning task 0x0000000006E08328.

To analyze the dump download and Install Windows Debugger from This  link

Step 1:

Open Windbg .  Choose File menu –> select Open crash dump –>Select the Dump file (SQLDump000#.mdmp)

Step 2:

on command window type
.sympath srv*c:\Websymbols*http://msdl.microsoft.com/download/symbols;

Step 3:

Type .reload /f and hit enter. This will force debugger to immediately load all the symbols.

Step 4:

Verify if symbols are loaded for  SQL Server by using the debugger command lmvm

0:002> lmvm sqlservr
start             end                 module name
00000000`01000000 00000000`03679000   sqlservr T (pdb symbols)          c:\websymbols\sqlservr.pdb\21E4AC6E96294A529C9D99826B5A7C032\sqlservr.pdb
    Loaded symbol image file: sqlservr.exe
    Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe
    Image name: sqlservr.exe
    Timestamp:        Wed Oct 07 21:15:52 2009 (4ACD6778)
    CheckSum:         025FEB5E
    ImageSize:        02679000
    File version:     2005.90.4266.0
    Product version:  9.0.4266.0
    File flags:       0 (Mask 3F)
    File OS:          40000 NT Base
    File type:        1.0 App
    File date:        00000000.00000000
    Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

Step 5:

Use the below command to search thread stack to identify the thread which has reference to the owning task and it will be the thread which is owning the latch. Replace 0X0000000006E08328 with owning task in your errorlog

~*e .echo ThreadId:; ?? @$tid; r? @$t1 = ((ntdll!_NT_TIB *)@$teb)->StackLimit; r? @$t2 = ((ntdll!_NT_TIB *)@$teb)->StackBase; s -d @$t1 @$t2 0X0000000006E08328 

ThreadId:
unsigned int 0x93c
ThreadId:
unsigned int 0x9a0
ThreadId:
unsigned int 0x9b4
00000000`091fdaf0  06e08328 00000000 00000000 00000000  (……………
00000000`091fdcb8  06e08328 00000000 091fdd70 00000000  (…….p…….
00000000`091fded0  06e08328 00000000 06e0e798 00000000  (……………
00000000`091fdf38  06e08328 00000000 00000002 00000000  (……………
00000000`091fec60  06e08328 00000000 0168883a 00000000  (…….:.h…..
00000000`091ff260  06e08328 00000000 000007d0 00000000  (……………
00000000`091ff2d0  06e08328 00000000 00000020 00000000  (……. …….
00000000`091ff5f8  06e08328 00000000 800306c0 00000000  (……………
00000000`091ff6c0  06e08328 00000000 00000000 00000000  (……………
00000000`091ff930  06e08328 00000000 00000000 00000001  (……………
00000000`091ff9b8  06e08328 00000000 00000000 00000000  (……………
00000000`091ffa38  06e08328 00000000 00000000 00000000  (……………
00000000`091ffc10  06e08328 00000000 03684080 00000000  (……..@h…..
00000000`091ffc90  06e08328 00000000 00000000 00000000  (……………
ThreadId:
unsigned int 0x9b8
ThreadId:
unsigned int 0x9bc
ThreadId:
unsigned int 0x9c0
……………
…………..

 

Step 6:

From the above out put we see thread 0x9b4 has reference to the pointer of owning task and it will be the thread which is owning the latch. Let us switch to the thread(0x9b4 ) which is executing the owning task and

then go through the stack to see why the thread is owning the latch for long time.

Step 7:

   ~~[0x9b4]s      ==> Switching to the thread (Replace 0x9b4 with your thread id which has reference to the po
ntdll!ZwWaitForSingleObject+0xa:
00000000`77ef047a c3              ret

Step 8:

0:002> kC  ==>  Print the stack
Call Site
ntdll!ZwWaitForSingleObject
kernel32!WaitForSingleObjectEx
sqlservr!SOS_Scheduler::SwitchContext
sqlservr!SOS_Scheduler::Suspend
sqlservr!SOS_Event::Wait
sqlservr!BPool::FlushCache
sqlservr!checkpoint2
sqlservr!alloca_probe
sqlservr!ProcessCheckpointRequest
sqlservr!CheckpointLoop
sqlservr!ckptproc
sqlservr!SOS_Task::Param
::Execute
sqlservr!SOS_Scheduler::RunTask
sqlservr!SOS_Scheduler::ProcessTasks
sqlservr!SchedulerManager::WorkerEntryPoint
sqlservr!SystemThread::RunWorker
sqlservr!SystemThreadDispatcher::ProcessWorker
sqlservr!SchedulerManager::ThreadEntryPoint
msvcr80!endthreadex
msvcr80!endthreadex

From the above stack we can understand that the thread which is owning the latch is executing checkpoint and flushing cache (Dirty buffers) to disk. If flushing buffers to disk (checkpoint) is taking a long time, then obviously there is disk bottleneck.

Similarly for any other latch time out issues first identify the owner thread of latch, read the stack of owner thread to understand the task performed by owner thread and troubleshoot the performance of task performed by owner thread.

If you want to see the stack of thread which is waiting, then pickup the task (task 0x0000000006E096D8 )from latch timeout warning message in errorlog instead of owning task (task 0x0000000006E08328) and use the command mentioned in step 5.

I hope this post will help you to learn and debug the latch timeout issues.

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Related posts: Non-yielding IOCP Listener, Non-yielding Scheduler and non-yielding resource monitor known issues and fixes

 

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in Debugging, SQL General, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , , , , | 22 Comments »

SQL Server: Ghost records

Posted by Karthick P.K on August 30, 2012

 

What is Ghost record?

When a record is deleted from a clustered index data page or non-clustered index leaf page or a versioned heap page or a forwarded record is recalled, the record is logically removed by marking them as deleted but not physically removed from the page immediately.

Pages which are marked as deleted but actually not deleted physically are called Ghost Records.

 

Why?

Allowing delete or update operation run faster and to ensure faster rollback if the transaction is cancelled or roll backed.

 

Who will remove the records which are marked for deletion?

Ghostcleanuptask: SQL Server Ghostcleanuptask thread physically removes the records which are marked as deleted.

 

How Ghost cleanup task works?

1.       Ghostcleanuptask thread wakes up every 10 seconds.

2.       Sweep databases one by one starting from master.

3.       Skip the database if it is not able to take a shared lock for database (LCK_M_S) or database is not in Open read/write state.

4.       Scans the PFS pages of the current database to get the pages which has ghost records.

PFS Page: A PFS page occurs once in 8088 pages. SSQL Server will attempt to place a PFS page on the first page of every PFS interval(8088Pages). The only time a PFS page is not the first page in its interval is in the first interval for a file.

In this case, the file header page is first, and the PFS page is second. (Page ID starts from 0 so the first PFS page is at Page ID 1)

clip_image002_thumb

5.       Remove the records which are marked as deleted (ghosted) physically.

 

 

The GHOST_VERSION_RECORD is used in cases where versioned heap records, or BLOB fragments are deleted and when versioned forwarded heap rows are recalled back to their originating page. The records must be ghosted rather than deleted so the versioning chain can be maintained. The GHOST_VERSION_RECORDs cannot be expunged until all transactions that have references to the original data record have finished.

 

 

Let us see how the ghost records are marked for deletion and deleted later using below example.

 

Note: We will be using few trace flags for testing purpose. Use this trace flags and below steps only in your testing servers.

Trace flag 3604 Directs SQL Server to send output some of the command’s to client executing the command. Ex.  When we execute  DBCC PAGE or DBCC TRACESTATUS(-1) we will not see results in SSMS or any other client unless Trace flag 3604 is on

Trace flag   661 Disables ghost record cleanup task.  When this trace flag is enabled Ghost cleanup task will not run. 

Trace flag   662 along with 3605 prints the output of Ghost cleanup task to SQL Server error log ( DBCC TRACEON (662, 3605,-1))

 

 

 

Step1:

//Create a database named Ghost test

create database Ghosttest

go

use Ghosttest

go

 

Step 2:

//Create a table named ‘a’ with clustered and non-clustered index.

CREATE TABLE [dbo].[a](

      [a] [char](3000) NULL,

      [b] [char](3000) NULL,

      [c] [int] NULL

) ON [PRIMARY]

 

CREATE Clustered INDEX [CIX_a] ON [dbo].[a]

(

      [c] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

 

 

 

CREATE  INDEX [IX_a] ON [dbo].[a]

(

      [c] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

 

 

 

Go

 

Step 3:

//Insert 1000 records in to the table.

insert into  a values (‘Ghost’,‘Record’,1);

go 1000

 

Step 4:

//Enable trace flag 3604 to see the output of DBCC command from client like SSMS.

 

dbcc traceon (3604,-1)

 

go

 

Step 5:

// Disable Ghost record cleanup task to see how the ghost records are marked in data/index pages.   (Don’t enable in production)

 

dbcc traceon (661,-1)

go

 

Step 6:

//Delete the 1000 records which we inserted in table ‘a’ to generate ghost records (Records marked for deletion but not physically deleted).

 

delete from a

go

 

Step 7:

//Ghost cleanup task first scans the PFS pages in  current database to get the pages which has ghost records. Let us dump the PFS pages to see how the pages with ghost records are marked

// Watch the text which are in red or search for Has Ghost in the output of DBCC page.

DBCC PAGE(‘Ghosttest’,1,1,3)

Go

 

PAGE: (1:1)

 

 

BUFFER:

 

 

BUF @0x00000000FEC0ED80

 

bpage = 0x00000000CA024000           bhash = 0x0000000000000000           bpageno = (1:1)

bdbid = 9                            breferences = 0                      bcputicks = 0

bsampleCount = 0                     bUse1 = 62975                        bstat = 0xc0010b

blog = 0x79797979                    bnext = 0x0000000000000000          

 

PAGE HEADER:

 

 

Page @0x00000000CA024000

 

m_pageId = (1:1)                     m_headerVersion = 1                  m_type = 11

m_typeFlagBits = 0x1                 m_level = 0                          m_flagBits = 0x0

m_objId (AllocUnitId.idObj) = 99     m_indexId (AllocUnitId.idInd) = 0    Metadata: AllocUnitId = 6488064

Metadata: PartitionId = 0            Metadata: IndexId = 0                Metadata: ObjectId = 99

m_prevPage = (0:0)                   m_nextPage = (0:0)                   pminlen = 0

m_slotCnt = 1                        m_freeCnt = 2                        m_freeData = 8188

m_reservedCnt = 0                    m_lsn = (50:335:27)                  m_xactReserved = 0

m_xdesId = (0:0)                     m_ghostRecCnt = 0                    m_tornBits = 412090936

 

Allocation Status

 

GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED           PFS (1:1) = 0x44 ALLOCATED 100_PCT_FULL

DIFF (1:6) = CHANGED                 ML (1:7) = NOT MIN_LOGGED           

 

PFS: Page Alloc Status  @0x000000002205A000

 

(1:0)        – (1:3)        =     ALLOCATED 100_PCT_FULL                             

(1:4)        – (1:5)        = NOT ALLOCATED   0_PCT_FULL                             

(1:6)        – (1:7)        =     ALLOCATED 100_PCT_FULL                             

(1:8)        -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:9)        -              =     ALLOCATED 100_PCT_FULL                     Mixed Ext

(1:10)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:11)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:12)       -              =     ALLOCATED 100_PCT_FULL           IAM Page  Mixed Ext

(1:13)       – (1:14)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:15)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:16)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:17)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:18)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:19)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:20)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:21)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:22)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:23)       – (1:25)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:26)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:27)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:28)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:29)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:30)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:31)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:32)       -              =     ALLOCATED  50_PCT_FULL                     Mixed Ext

(1:33)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:34)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:35)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:36)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:37)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:38)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:39)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:40)       – (1:41)       =     ALLOCATED   0_PCT_FULL                             

(1:42)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:43)       -              = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:44)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:45)       – (1:46)       = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:47)       -              = NOT ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:48)       – (1:50)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:51)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:52)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:53)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:54)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:55)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:56)       – (1:61)       =     ALLOCATED   0_PCT_FULL                             

(1:62)       -              = NOT ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:63)       -              = NOT ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:64)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:65)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:66)       – (1:71)       =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:72)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:73)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:74)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:75)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:76)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:77)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:78)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:79)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:80)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:81)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:82)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:83)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:84)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:85)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:86)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:87)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:88)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:89)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:90)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:91)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:92)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:93)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:94)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:95)       -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:96)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:97)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:98)       -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:99)       -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:100)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:101)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:102)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:103)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:104)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:105)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:106)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:107)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:108)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:109)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:110)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:111)      – (1:116)      =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:117)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:118)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:119)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:120)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:121)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:122)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:123)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:124)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:125)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:126)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:127)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:128)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:129)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:130)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:131)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:132)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:133)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:134)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:135)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:136)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:137)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:138)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:139)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:140)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:141)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:142)      – (1:143)      =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:144)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:145)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:146)      -              =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:147)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:148)      -              =     ALLOCATED  50_PCT_FULL                     Mixed Ext

(1:149)      -              =     ALLOCATED   0_PCT_FULL           IAM Page  Mixed Ext

(1:150)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:151)      -              =     ALLOCATED 100_PCT_FULL                     Mixed Ext

(1:152)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:153)      – (1:158)      =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:159)      -              =     ALLOCATED   0_PCT_FULL                     Mixed Ext

(1:160)      – (1:383)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:384)      – (1:385)      =     ALLOCATED   0_PCT_FULL Has Ghost           Mixed Ext

(1:386)      – (1:391)      = NOT ALLOCATED   0_PCT_FULL                             

(1:392)      – (1:497)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:498)      – (1:499)      =     ALLOCATED   0_PCT_FULL                             

(1:500)      – (1:655)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:656)      -              =     ALLOCATED   0_PCT_FULL                             

(1:657)      – (1:663)      =     ALLOCATED   0_PCT_FULL Has Ghost                   

(1:664)      – (1:671)      = NOT ALLOCATED   0_PCT_FULL                             

 

Step 8:

//Dump one of the page which is marked as having ghost record in PFS page. I have picked Page 158 for example.

//Look at m_ghostRecCnt

 

dbcc page(‘Ghosttest’,1,158,3)

go

 

 

PAGE: (1:158)

 

 

BUFFER:

 

 

BUF @0x00000000FEC02CC0

 

bpage = 0x00000000C9E22000           bhash = 0x0000000000000000           bpageno = (1:158)

bdbid = 9                            breferences = 0                      bcputicks = 933

bsampleCount = 1                     bUse1 = 64101                        bstat = 0xc0010b

blog = 0x2159bb79                    bnext = 0x0000000000000000          

 

PAGE HEADER:

 

 

Page @0x00000000C9E22000

 

m_pageId = (1:158)                   m_headerVersion = 1                  m_type = 1

m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x0

m_objId (AllocUnitId.idObj) = 28     m_indexId (AllocUnitId.idInd) = 256 

Metadata: AllocUnitId = 72057594039762944                                

Metadata: PartitionId = 72057594038845440                                 Metadata: IndexId = 1

Metadata: ObjectId = 2105058535      m_prevPage = (1:157)                 m_nextPage = (1:160)

pminlen = 3508                       m_slotCnt = 2                        m_freeCnt = 1054

m_freeData = 7134                    m_reservedCnt = 0                    m_lsn = (37:160:37)

m_xactReserved = 0                   m_xdesId = (0:1983)                  m_ghostRecCnt = 2

m_tornBits = 602498023              

 

Allocation Status

 

GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED          

PFS (1:1) = 0x68 MIXED_EXT ALLOCATED   0_PCT_FULL                         DIFF (1:6) = CHANGED

ML (1:7) = NOT MIN_LOGGED           

 

Slot 0 Offset 0x60 Length 3519

 

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Record Size = 3519                  

Memory Dump @0x000000001CEBA060

 

 

//There are 2 ghost records in page 158 (m_ghostRecCnt=2)

 

Step 8:

//Additionally we can query sys.dm_db_index_physical_stats to find the number of ghost records.

 

SELECT object_name(object_id) as Name, record_count, GHOST_RECORD_COUNT,Version_ghost_record_count,INDEX_TYPE_DESC, ALLOC_UNIT_TYPE_DESC

FROM sys.dm_db_index_physical_stats (DB_ID(N‘Ghosttest’), NULL, NULL, NULL , ‘DETAILED’) where index_level=0

go

clip_image004_thumb

 

 

Step 8:

//Let us enable trace flag 662 and 3605  to print the output of ghost cleanup task to SQL error log

  

dbcc traceon (662,3605,-1)

go

 

Step 9:

//Let us disable trace flag 661 to start the Ghost cleanup task again and let us disable 3604  which we enabled to print output of DBCC command in client (SSMS).

 

dbcc traceoff (661,3604,-1)

go

 

Purging page Dbid 9, File 1, Page 144

Purging page Dbid 9, File 1, Page 146

Purging page Dbid 9, File 1, Page 153

Purging page Dbid 9, File 1, Page 154

Purging page Dbid 9, File 1, Page 155

Purging page Dbid 9, File 1, Page 156

Purging page Dbid 9, File 1, Page 157

Purging page Dbid 9, File 1, Page 158

Purging page Dbid 9, File 1, Page 160

Purging page Dbid 9, File 1, Page 161

 

We will see the pages purged by the ghost cleanup task in the SQL Server error log.

Step 10:

//  We can also capture Ghost cleanup task in profiler by selecting SQL Transaction and filter by object name like  “GhostCleanupTask”

image

 

 

If you liked this post, do like us on Facebook at https://www.facebook.com/mssqlwiki and join our Facebook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K |My Facebook Page |My Site| Blog space| Twitter

Posted in DBCC, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , , , | 15 Comments »

I/O requests taking longer than 15 seconds to complete on file

Posted by Karthick P.K on August 27, 2012

Do you see warnings like one below in your SQL Server error log?

SQL Server has encountered  x occurrence(s) of I/O requests taking longer than 15 seconds to complete on file .

The OS file handle is 0x000006A4. The offset of the latest long I/O is: 0x00000

(or)

BobMgr::GetBuf: Sort Big Output Buffer write not complete after n seconds.

This indicates SQL Server I/O Bottlenecks. SQL Server performance highly relies on the Disk performance.  SQL Server I/O Bottleneck can be identified through

1. PAGEIOLATCH_xx or WRITELOG wait types in Sys.Sysprocesses and other DMV’s

2. I/O taking longer than 15 seconds in SQL Server Error log.

{

SQL Server has encountered X occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [ ] in database [IOTEST (7). The OS file handle is 0x000006A4. The offset of the latest long I/O is:
0x000001

}

3. By looking at I/O latch wait statistics in sys.dm_os_wait_stats

{

Select  wait_type,         waiting_tasks_count,         wait_time_ms  from    sys.dm_os_wait_stats where    wait_type like ‘PAGEIOLATCH%’
order by wait_type

}

4. By looking at pending I/O requests and isolating the disks,File and database in which we have I/O Bottleneck.

{

select     database_id,     file_id,     io_stall,     io_pending_ms_ticks,     scheduler_address from    sys.dm_io_virtual_file_stats(NULL, NULL)t1,         sys.dm_io_pending_io_requests as t2
where    t1.file_handle = t2.io_handle

}

Following are common reasons for I/O Bottleneck in SQL Server.

1. SQL Server is spawning more I/O requests than what I/O disk subsystem could handle.

2 . There could be an Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem (or) Compression and  so the Disks are performing very slow and hence SQL Server is affected.

3. Some other process on the system is saturating the disks with I/O requests. Common application includes AV Scan,System Backup Etc. So I/O requests posted by SQL Server becomes slow.

 

How to  troubleshoot?

1.  Exclude SQL Server files from antivirus scan.

2. Do not place SQL Server FILES on compressed drives.

3. Distribute SQL Server data files and transaction log files across drives.

4. If the “I/O request taking longer” warning is for tempdb , Enable trace flag 1118 and increase the tempdb data files refer:http://support.microsoft.com/kb/2154845

5. If none of the above resolves the issue collect the below perfmon counters.

 

Perfmon counters can help us in understanding “If disk is slow” or  “SQL Server is spawning more I/O then what disk could handle” or “Some other process is saturating disk with I/O”

Note:It is important to get  throughput of the disk subsystem in MB/SEC before we look at disk counters. Normally it will be more than 150 MB for SAN disk and greater 50 MB for Single disk .When you look at the perfmon counter look at Max value.

 

Avg. Disk sec/Transfer –> Time taken to perform the I/O operation

Ideal value for Disk sec/Transfer is 0.005-0.010 sec. If you consistently notice this counter is beyond 0.015 then there is a serious I/O bottleneck.

1. Look for Disk Bytes /sec when Avg. Disk sec/Transfer  is greater than 0.015. If it is below 200 MB for SAN disk and Below 50 MB for Single disk then the problem is with I/O subsystem Engage hardware vendor.

2. If the Disk Bytes /sec  is greater than  200 MB for SAN disk or greater than 50 MB for Single disk when the  Avg. Disk sec/Transfer  is greater than 0.015. Look at the Process:IO Data Bytes/Sec for the same time and identify which process is spawning I/O. If the identified process is not SQL Server involve the team which supports that process. If the  the identified process is SQL Server tune SQL Server queries which are I/O intensive by creating dropping indexes etc.

 

Disk Bytes /sec  –> Total read and write to disk per second in bytes.

Collect the values for each logical disks in which SQL Server files are placed and look at the Max value for this counter ideally it has to be greater than the throughput of the disk subsystem. If you don’t have the throughput for the disk then this value to be greater than 200MB for SAN or greater than 50 MB for single disk.

If it is below the expected value you can consider that your disks are not performing well. Involve the hardware vendor. 

Important: Value for this counter will be low when there is no I/O happening on the drives. So you have to look at the this counter during the time you see I/O warnings or When Disk sec/Transfer >0.010 for the same drive. 

 

Process:IO Data Bytes/Sec –> Total read and write to disk per second in bytes by each process.

Collect this counter for all the processes running on the server. This counter will help us understand if any other process is saturating the disk with excessive I/O.

Example: Let us consider a disk with max throughput of 250MB per second. If antivirus is spawning 200MB of I/O per second and if SQL Server data files are placed in same drive and SQL Server is spawning 150MB obviously there will be I/O waits.

 

Buffer Manager: Page Read/sec + Page Writes/sec –>Total read and write to disk per second in bytes by SQL Server process.

Note: If you are analyzing the .BLG file collected and not live perfmon focus on Maximum value for each counter don’t look at average.  

 

If (Avg. Disk sec/Transfer> ==0.015 ) and ( (Disk Bytes /sec < 150MB (For San)) or (Disk Bytes /sec < 50MB (For Local) or (Disk Bytes /sec < Speed of disk as per Vendor ))

{

There is Issue with I/O subsystem (or) driver/firmware issue (or) Misconfiguration in I/O Subsystem.

}

If (Disk sec/Transfer > ==0.015 Consistently) and ( (Disk Bytes /sec >= 150 (For San)) or (Disk Bytes /sec >= 50MB (For Local) or (Disk Bytes /sec >= Speed of disk as per Vendor ))

{

Identify the process which is posting excessive I/O request using Process:IO Data Bytes/Sec.

If ( Identified process == SQLServer.exe )

{

Identify and tune the queries which is Spawning excessive I/O.

(Reads+Writes column in profiler, Dashboard reports or sys.dm_exec_query_stats and sys.dm_exec_sql_text

can be used to identify the query). Use DTA to tune the query

}

If ( Identified process != SQLServer.exe )

{

Engage the owner of application which is spawning excessive I/O

}

}

Many thanks to Joseph Pilov from whom I learned many techniques like the one above.

 

If you liked this post, do like us on FaceBook at https://www.facebook.com/mssqlwiki and join our FaceBook group https://www.facebook.com/mssqlwiki#!/groups/454762937884205/

Thank you,

Karthick P.K  |My Facebook Page |My Site| Blog space| Twitter

Posted in Performance, SQL Server Engine, SQL Server I/O | Tagged: , , , , , , | 22 Comments »

Multi Threaded OVELAPPED and Nonbuffered I/O Example

Posted by Karthick P.K on March 4, 2012

How to Read file using Multiple threads OVERLAPPED and Nonbuffered I/O

Multithreaded Overlapped I/O and Nonbuffered I/O example

Nonbuffered I/O :Allows application to bypass the Windows cache manager or disable system caching of data being read from or written to the file .So there is no intermediate buffer or cache and gives direct control over data I/O buffering to application.

Things to remember:

1. When we use non buffered I/O (FILE_FLAG_NO_BUFFERING in createfile) reads and writes has to be in multiple of bytes per sector.

2. GetDiskFreeSpace-Retrieves information about the specified disk, including Bytes per sector (dwBytesPerSector) of disk.

3. When using non buffered I/O (FILE_FLAG_NO_BUFFERING) file offset in the OVERLAPPED structure in Readfile/Writefile if specified, must be number of bytes that is an integer multiple of the Bytes per sector

4. Buffers used for read and write operations should be physical sector-aligned, which means aligned on addresses in memory that are integer multiples of Bytes per sector.

5. GetFileSizeEx-Retrieves the size of the specified file. When multiple threads are used to read or write file. Each thread needs to have its own overlapped structure and Event has to be created for hEvent member of each overlapped structure.

If the hEvent member of the OVERLAPPED structure is NULL, the system uses the state of the hFile handle to signal when the operation has been completed. This will cause confusion when multiple threads are using same file handle to read or write file.

It is safer to use an event object because of the confusion that can occur when multiple simultaneous overlapped operations are performed on the same file, named pipe, or communications device.In this situation, there is no way to know which operation caused the object’s state to be signaled.

#include "windows.h"
#include "stdlib.h"
#include <windows.h>
#include <string>
#include <winbase.h>
#include <iostream>
using namespace std;
#include <psapi.h>
#pragma comment(lib,"psapi.lib")
#include <time.h>

#define ThreadPerProc 1 //If you want more than 1 thread per processor increase this value.

int Dootherwork();

struct SUreadfile
{
int x;
DWORDLONG  dwOffset;
DWORDLONG  dwTotalBytespostedNOffset;

};
  SUreadfile *PSUreadfile;
  int Ureadfile(SUreadfile *PSUreadfile1);

  int  x=0;
  int  i=0;
  SYSTEM_INFO si;
  DWORD dwSectorsPerCluster;
  DWORD dwBytesPerSector;
  DWORD dwNumberOfFreeClusters;
  DWORD dwTotalNumberOfClusters;

  //
  LONGLONG BuffSize=0;
  OVERLAPPED *iAIO=NULL;
  DWORDLONG  dwTotalBytesposted=0;
  DWORDLONG  dwTotalBytespostedwithinoffset=0;
  HANDLE hIFile;
  HANDLE hOFile;
  HANDLE *hEvent;
 char dateStr [9];
 char timeStr [9];

int main(int argc, char* argv[])
{

  _strdate( dateStr);
  _strtime( timeStr );
  printf("\n Date:%s\t", dateStr);
  printf("Time is%s\t", timeStr);
  if (argc<3)
  {
		printf("Usage is: To stimulate asynch I/O. Accept two parameters Source file and Destination file");
		return 1;
  }

 //Open source file
 hIFile=CreateFile((LPCSTR)argv[1],GENERIC_READ,FILE_SHARE_READ,NULL,OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL
                           | FILE_FLAG_OVERLAPPED
                           | FILE_FLAG_NO_BUFFERING
                           ,NULL); //FILE_FLAG_OVERLAPPED  - to process input or output asynchronously
								   //FILE_FLAG_NO_BUFFERING-The file or device is being opened with no system caching for data reads and writes.
 if (INVALID_HANDLE_VALUE==hIFile)
		  {
			printf("Unable to open file %s.  Error=%d\n",argv[1], GetLastError());
			return 1;
		  }
//Create destination file
 hOFile=CreateFile((LPCSTR)argv[2],GENERIC_WRITE,FILE_SHARE_READ,NULL,CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL,NULL);
	 if (INVALID_HANDLE_VALUE==hOFile)
		  {
			printf("Unable to create file %s.  Error=%d\n",argv[1], GetLastError());
			return 1;
		  }

//GetSystemInfo- Retrieves information about the current system. Output is pointer to SYSTEM_INFO structure
 GetSystemInfo(&si);
 /*GetDiskFreeSpace-Retrieves information about the specified disk, including dwBytesPerSector . When using FILE_FLAG_NO_BUFFERING
 file offset in the OVERLAPPED structure and nNumberOfBytesToRead in Readfile
 if specified, must be   number of bytes that is an integer multiple of the dwBytesPerSector*/
 GetDiskFreeSpace(NULL,&dwSectorsPerCluster,&dwBytesPerSector,&dwNumberOfFreeClusters,&dwTotalNumberOfClusters);
 LARGE_INTEGER *fsize;
 fsize =new LARGE_INTEGER;

 dwBytesPerSector=dwBytesPerSector*4; //for larger files if you want to read data greater then dwBytesPerSector in one shot increase the size of dwBytesPerSector.

//GetFileSizeEx-Retrieves the size of the specified file.

 GetFileSizeEx (hIFile,fsize);
 if (fsize->QuadPart==0)
      {
      printf("\nUnable to get the size of file. Error:%d ",GetLastError());
      return 1;
      }
    else
      {
          printf("\nFile size is: %lld Bytes",fsize->QuadPart);

		  if (fsize->QuadPart > dwBytesPerSector)
          {
			  /*Later in this program we create 1 thread per processor and read the files parellely.
			  Below logic is used to calculate the bytes each thread will read
			  remember when we use non buffered I/O (FILE_FLAG_NO_BUFFERING) reads has to be in multiple of bytes per sector. So no of bytes each thread
			  will read has to be rouded to   bytes per sector
			  */

			  BuffSize=  fsize->QuadPart +  ( (si.dwNumberOfProcessors*dwBytesPerSector)  -  (fsize->QuadPart % (si.dwNumberOfProcessors*dwBytesPerSector))  );
			  BuffSize=BuffSize/si.dwNumberOfProcessors;
          }
          else
          {
              BuffSize=fsize->QuadPart; //If the file size is less then dwBytesPerSector there is no nead to create multiple threads
          }
      }

	iAIO =new OVERLAPPED[si.dwNumberOfProcessors*ThreadPerProc];// Create overlapped structure for each thread
	HANDLE *h;
    h = new HANDLE[(si.dwNumberOfProcessors*ThreadPerProc)]; // Pointer  array to hold thread handles
	PSUreadfile= new SUreadfile[256];
	int OffsetHigh=0;
	hEvent=new HANDLE[(si.dwNumberOfProcessors*ThreadPerProc)];
	/*Create event for hEvent member of each overlapped structure If the hEvent member of the OVERLAPPED structure is NULL, the system
	uses the state of the hFile handle to signal when the operation has been completed.
	This is will cause confusion when multiple threads are using same file handle to read or write file.
	It is safer to use an event object because of the
	confusion that can occur when multiple simultaneous overlapped operations are performed on the same file, named pipe, or communications device.
	In this situation, there is no way to know which operation caused the object's state to be signaled.*/

	//Create threads and and assign the starting file offset to each thread for read operation
	for ( i=0;i<(si.dwNumberOfProcessors*ThreadPerProc);i++)
			{
			ZeroMemory(&PSUreadfile[i],sizeof(PSUreadfile[i]));
 			PSUreadfile[i].x=i;
			ZeroMemory (&iAIO[i],sizeof(iAIO[i]));
			iAIO[i].Internal=0;
			iAIO[i].InternalHigh=0;
			iAIO[i].Offset=0;
			iAIO[i].OffsetHigh=0;
			iAIO[i].Pointer=0;
			iAIO[i].Offset= 0;
			PSUreadfile[i].dwOffset=dwTotalBytesposted;
			hEvent[i] = CreateEvent(NULL, TRUE, FALSE, NULL);
			if(hEvent)
			{
			 iAIO[i].hEvent = hEvent[i];
			}
			 else
			{
			 printf("\nCreate event failed with error:%d",GetLastError());
			}

			h[i]=CreateThread(0,0,(LPTHREAD_START_ROUTINE  )Ureadfile,(LPVOID)&PSUreadfile[i],  0,  NULL);

			if (!h[i])
			{
			printf("Thread creation failure :", GetLastError());
			}

			dwTotalBytesposted=dwTotalBytesposted+BuffSize;

		}

	WaitForMultipleObjects((si.dwNumberOfProcessors), h,TRUE,INFINITE);  // Wait for all the threads to complete
  _strdate( dateStr);
  _strtime( timeStr );
  printf("\n Date:%s\t", dateStr);
  printf("Time is%s\t", timeStr);

}

int Ureadfile(SUreadfile *PSUreadfile1)
{

	int z=PSUreadfile1->x;
	SIZE_T dwSize=dwBytesPerSector;
	if (PSUreadfile[z].dwOffset>MAXDWORD)
	{
		iAIO[z].Offset=(PSUreadfile[z].dwOffset%MAXDWORD)-(PSUreadfile[z].dwOffset/MAXDWORD); //Offset start from zero
		iAIO[z].OffsetHigh=PSUreadfile[z].dwOffset/MAXDWORD;

	/*iAIO[z].Offset (Offset of overlapped structure) is the low-order portion of the file position at which to start the I/O request.
	Data type of Offset is DWORD so the 	maximum size is 2^32 (4294967296-1). If you want to read the file from offset
	which is greater than 4294967296 bytes increase the OffsetHigh */
	}
	else
	{
	iAIO[z].Offset=PSUreadfile[z].dwOffset;
	}

	DWORD dwBytesRead=0;
	LONGLONG dwTotalBytesRead=0;
	BOOL	RF;
	BOOL	WF;
	wchar_t *IBuffer = (wchar_t *) VirtualAlloc(NULL,  dwSize, MEM_RESERVE| MEM_COMMIT,PAGE_READWRITE);

	 while (BuffSize>dwTotalBytesRead)
	{

		RF=0;

		RF=ReadFile(hIFile,IBuffer,dwSize, &dwBytesRead,&iAIO[z]); // pass  a pointer to an OVERLAPPED structure (iAIO)

		if ((RF==0) && GetLastError()==997)      //ERROR_IO_PENDING                 997L
        {

			/*bWait parameter of GetOverlappedResult can be set to TRUE or FALSE.
			If this parameter is TRUE, and the Internal member of the lpOverlapped structure is STATUS_PENDING, the function
			does not return until the operation has been completed. If this parameter is FALSE and the operation is still pending, the function
			returns FALSE and the GetLastError function returns ERROR_IO_INCOMPLETE	*/
            while( !GetOverlappedResult( hIFile,&iAIO[z],&dwBytesRead,TRUE))
                {

                    if (GetLastError()==996)//ERROR_IO_INCOMPLETE  (Not signaled)            996L
                    {
                    printf("\nI/O pending: %d .",GetLastError());
                    //Dootherwork();
					/*If the bWait parameter of GetOverlappedResult is set to false thread can do other work while the I/O is progressing.
					Change the bWait parameter to FALSE in GetOverlappedResult and un-comment Dootherwork function above to stimulate clean overlapped I/O.
					*/
                    }
                    else if  (GetLastError()==38) //ERROR_HANDLE_EOF                 38L
                    {
                    printf("\nEnd of file reached.");
                    break;
                    }
                    else
                    {
                    printf("GetOverlappedResult failed with error:%d,Offset:%d",GetLastError(),iAIO[z].Offset);
                    break;
                    }

                }

        }
        else if ((RF==0)  && GetLastError()!=997 &&   GetLastError()!=38 )
        {
            printf ("\nError reading file :%d offset-%d",GetLastError(),iAIO[z].Offset);
            return 0;
        }

		else if ((RF==0)  && GetLastError()==38 )
        {
            printf ("\nEnd of file reached file :%d offset:%d",GetLastError(),iAIO[z].Offset);
            return 0;
        }

		WF= WriteFile(hOFile,IBuffer,iAIO[z].InternalHigh,NULL,&iAIO[z]);  //Write the buffers which we read to new file.

		if (!WF)
		 {
			printf("\nWrite file operation failed. Error:%d",GetLastError());
		 }

	    dwTotalBytesRead=dwTotalBytesRead + iAIO[z].InternalHigh;

		//Increase the offset if we hit the max DWORD limitation
		if (iAIO[z].Offset+iAIO[z].InternalHigh > (MAXDWORD* (iAIO[z].OffsetHigh+1)))
		{
printf("\nThread Id-%d completed ReadFile operation from %lld till  %lld of %lld bytes",z,(iAIO[z].Offset-dwTotalBytesRead-iAIO[z].InternalHigh),iAIO[z].Offset+iAIO[z].InternalHigh,dwTotalBytesRead);
					iAIO[z].OffsetHigh++;
			        iAIO[z].Offset= (iAIO[z].Offset + iAIO[z].InternalHigh-MAXDWORD-iAIO[z].OffsetHigh); //Offset starts from zero
		}
		else
		{
		iAIO[z].Offset=iAIO[z].Offset+iAIO[z].InternalHigh;
		}

		if (dwSize >iAIO[z].InternalHigh)
		{
		  printf("\nEnd of file reached %ld-%ld",iAIO[z].InternalHigh,dwSize);
		 break;
		}

	}

	printf("\nThread Id:%d completed ReadFile operation  of %lld bytes",z,dwTotalBytesRead);
  return 1;
}

int Dootherwork()
{

	x=x+1;
	printf("\nWe are doing other work when overlapped I/O read is in progress-%d -Sleeping for 1 Milli second",x);
	Sleep(1);
	return 0;
}

Posted in Programming, SQL Server I/O | Tagged: , , , , | 6 Comments »

Asynchronous I/O example

Posted by Karthick P.K on February 15, 2012

How to Read file using asynchronous read operations – Overlapped I/O example

Asynchronous I/O facility in windows allows an application to initiate an I/O operation and continue other operation’s while I/O completes. This will improve the performance of an application because it allows the application to do multiple operations at once.

1. FILE_FLAG_OVERLAPPED switch is used in CreateFile to do Asynchronous I/O operation.

2. OVERLAPPED structure – when we call ReadFile/ReadFileEx or WriteFile/WriteFileEx we pass Pointer to an OVERLAPPED structure that specifies the starting position of I/O operation.

3. GetOverlappedResult – Status of a pending asynchronous operation can be checked using HasOverlappedIoCompleted or GetOverlappedResult Win32 API functions. bWait parameter in GetOverlappedResult can be set to true to wait infinitely till the Asynchronous I/O operation completes.

4. Createevent – Used to create an event object to assign for hEvent member of the OVERLAPPED structure passed into ReadFile/WriteFile and GetOverlappedResult. When multiple Asynchronous I/O happens   in parallel then each Asynch I/O must have its own OVERLAPPED structure

#include "windows.h"
#include "stdlib.h"
#include <windows.h>
#include <string>
#include <winbase.h>
#include <iostream>
using namespace std;
#include <psapi.h>
#pragma comment(lib,"psapi.lib")
#define BUF_SIZE 8192*100*100 // BuffSize for read file
int  x=0; 
int Dootherwork();

int main(int argc, char* argv[])
{
  if (argc<2)
  {
        printf("Usage is: To stimulate asynch I/O");
        return 1;
  }

HANDLE hIFile=CreateFile((LPCSTR)argv[1],GENERIC_READ,FILE_SHARE_READ,NULL,OPEN_EXISTING,FILE_FLAG_OVERLAPPED,NULL); //FILE_FLAG_OVERLAPPED-To process input or output asynchronously
    
 if (INVALID_HANDLE_VALUE==hIFile) 
          {
            printf("Unable to open file %s.  Error=%d\n",argv[1], GetLastError());
            return 1;
          }

     OVERLAPPED iAIO;  
     ZeroMemory (&iAIO,sizeof(iAIO));

      DWORD dwBytesRead=0;
      DWORD dwTotalBytesRead=0;
      BOOL RF;
      BOOL OLR=0;
      DWORD *lpFileSizeHigh;
      DWORD rfs; 
      
      lpFileSizeHigh = new DWORD;
      LARGE_INTEGER *fsize;
      fsize =new LARGE_INTEGER;
      rfs=GetFileSizeEx (hIFile,fsize);
      LONGLONG BuffSize=0;
      HANDLE hEvent;
      hEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
      if(hEvent)
      {
      iAIO.hEvent = hEvent; 
      }
      else
      {
      printf("\nCreate event failed with error:%d",GetLastError());
      }
  
    if (fsize->QuadPart==0)
      {
      printf("\nUnable to get the size of file. Error:%d ",GetLastError());
      return 1;
      }
    else 
      { 
          printf("\nFile size is: %lld Bytes",fsize->QuadPart);
          if (fsize->QuadPart > BUF_SIZE)
          {
          BuffSize=BUF_SIZE;
          }
          else
          {
          BuffSize=fsize->QuadPart;
          }
      }
    wchar_t *IBuffer = (wchar_t *)HeapAlloc(GetProcessHeap(),  HEAP_ZERO_MEMORY,  BuffSize);
  

  while (fsize->QuadPart>dwTotalBytesRead)
  {

      SetLastError(0);
      
      RF=ReadFile(hIFile,IBuffer,BuffSize, NULL,&iAIO); // pass  a pointer to an OVERLAPPED structure (iAIO)
        
      if ((RF==0) && GetLastError()==997)      //ERROR_IO_PENDING                 997L   
        {
            printf ("\nAsynch readfile started. I can do other operations now");
             
            while( !GetOverlappedResult( hIFile,&iAIO,&dwBytesRead,FALSE))
                {
                    if (GetLastError()==996)//ERROR_IO_INCOMPLETE  (Not signaled)            996L 
                    {
                    printf("\nI/O pending: %d .",GetLastError());
                    Dootherwork();
                    }
                    else if  (GetLastError()==38) //ERROR_HANDLE_EOF                 38L
                    { 
                    printf("\nEnd of file reached.");
                    break;
                    } 
                    else
                    {
                    printf("GetOverlappedResult failed with error:%d",GetLastError());
                    break;
                    }

                } 
    



        }
        else if ((RF==0)  && GetLastError()!=997 )
        {
            printf ("Error reading file :%d",GetLastError());
            return 0;
        }
      
         

        dwTotalBytesRead=dwTotalBytesRead + iAIO.InternalHigh;
        iAIO.Offset=dwTotalBytesRead;
        printf("\nReadFile operation completed for %lld bytes",dwTotalBytesRead);
    
  }
  printf("\nReadFile  completed.  %d bytes read",dwTotalBytesRead);

 ResetEvent(iAIO.hEvent);
 HeapFree(GetProcessHeap(),0,IBuffer);
 CloseHandle(hEvent);

 CloseHandle(hIFile);
 return 0;
}


int Dootherwork()
{

    x=x+1;
    printf("\nWe are doing other work when overlapped I/O read is in progress-%d -Sleeping for 1000 Milli second",x);
    Sleep(1000);
    return 0;
}

 

 

Thanks

Karthick P.K

Posted in Programming, SQL Server Engine, SQL Server I/O | Tagged: , , , , , | 2 Comments »