Part of 'Microsoft SQL Server 2000 RDBMS Performance Tuning Guide for Data Warehousing'

http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/rdbmspft.mspx

Optimizing Disk I/O Performance

When configuring a SQL Server that will contain only a few GB of data and not sustain heavy read or write activity, it is not as important to be concerned with the subject of disk I/O and balancing of SQL Server I/O activity across hard drives for maximum performance. But to build larger SQL Server databases that will contain hundreds of gigabytes or even terabytes of data and/or that can sustain heavy read/write activity, it is necessary to drive configuration around maximizing SQL Server disk I/O performance by load-balancing across multiple hard drives.

Optimizing Transfer Rates

One of the most important aspects of database performance tuning is I/O performance tuning. SQL Server is certainly no exception. Unless SQL Server is running on a machine with enough RAM to hold the entire database, I/O performance will be determined by how fast reads and writes of SQL Server data can be processed by the disk I/O subsystem.

Because transfer rates, I/O throughput, and other factors which may impact I/O performance are constantly improving, we will not provide specific numbers on what kinds of speed you should expect to see from your storage system. To better understand the capabilities you can expect, it is recommended that you work with your preferred hardware vendor to determine the optimum performance to expect.

What we do want to emphasize is the difference between sequential I/O operations (also commonly referred to as "serial" or "in disk order") in contrast to nonsequential I/O operations. We also want to draw attention to the dramatic effect read-ahead processing can have on I/O operations.

Sequential and Nonsequential Disk I/O Operations

It is worthwhile to explain what these terms mean in relation to a disk drive. Generally, a single hard drive consists of a set of drive platters. Each platter provides surfaces for read/write operations. A set of arms with read/write heads is used to move across the platters and read/write data from/to the platter surfaces. With respect to SQL Server, these are the two important points to remember about hard drives.

First, the read/write heads and associated disk arms need to move in order to locate and operate on the location of the hard drive platter that SQL Server requests. If the data is distributed around the hard drive platter in nonsequential locations, it takes significantly more time for the hard drive to move the disk arm (seek time) and to spin the read/write heads (rotational latency) to locate the data. This contrasts with the sequential case, in which all of the required data is co-located on one contiguous physical section of the hard drive platter, so the disk arm and read/write heads move a minimal amount to perform the necessary disk I/O. The time difference between the nonsequential and the sequential case is significant: about 50 milliseconds for each nonsequential seek in contrast to approximately two to three milliseconds for sequential seeks. Note that these times are rough estimations and will vary based upon how far apart the nonsequential data is spread around on the disk, how fast the hard disk platters can spin (RPM), and other physical attributes of the hard drive. The main point is, sequential I/O is good for performance and nonsequential I/O is detrimental to performance.

Second, it is important to remember that it takes almost as much time to read or write 8 kilobytes (KB) as it does to read or write 64 KB. Within the range of 8 KB to about 64 KB it remains true that disk arm plus read/write head movement (seek time and rotational latency) account for the majority of the time spent for a single disk I/O transfer operation. So, mathematically speaking, it is beneficial to try to perform 64-KB disk transfers as often as possible when more than 64 KB of SQL Server data needs to be transferred, because a 64-KB transfer is essentially as fast as an 8-KB transfer and eight times the amount of SQL Server data is processed for each transfer. Remember that read-ahead manager does its disk operations in 64-KB chunks (referred to as a SQL Server extent). The log manager performs sequential writes in larger I/O sizes, as well. The main point to remember is that making good use of the read-ahead manager and separating SQL Server log files from other nonsequentially accessed files benefit SQL Server performance.

As a rule of thumb, most hard drives can deliver performance that is as much as 2 times better when processing sequential I/O operations as compared to processing nonsequential I/O operations. That is, operations that require nonsequential I/O take twice as long to carry out as sequential I/O operations. What this tells us is that, if possible, you should avoid situations that may lead to random I/O occurring within your database. While it should always be the goal to perform I/O operations sequentially, situations like page splitting or out of sequence data do tend to cause nonsequential I/O to occur.

To encourage sequential I/O it is important to avoid situations that cause page splitting. It is also helpful to devise a well thought out data loading strategy. You can encourage data to be laid out sequentially on disk by employing a partitioning strategy that separates data and indexes. It is important that you set up jobs to periodically check for fragmentation in your data and indexes, and that you use utilities provided with SQL Server to resequence the data when it becomes too fragmented. More information about doing these operations appears later in this document.

Note: Logs generally are not a major concern because transaction log data is always written sequentially to the log file in sizes ranging up to 32 KB.

RAID

RAID (redundant array of inexpensive disks) is a storage technology often used for databases larger than a few gigabytes. RAID can provide both performance and fault tolerance benefits. A variety of RAID controllers and disk configurations offer tradeoffs among cost, performance, and fault tolerance. This topic provides a basic introduction to using RAID technology with SQL Server databases and discusses various configurations and tradeoffs.

Performance. Hardware RAID controllers divide read/writes of all data from Windows NT 4.0 and Windows 2000 and applications (like SQL Server) into slices (usually 16128 KB) that are then spread across all disks participating in the RAID array. Splitting data across physical drives like this has the effect of distributing the read/write I/O workload evenly across all physical hard drives participating in the RAID array. This increases disk I/O performance because the hard disks participating in the RAID array, as a whole are kept equally busy, instead of some disks becoming a bottleneck due to uneven distribution of the I/O requests.
Fault tolerance. RAID also provides protection from hard disk failure and accompanying data loss by using two methods: mirroring and parity.

Mirroring is implemented by writing information onto a second (mirrored) set of drives. If there is a drive loss with mirroring in place, the data for the lost drive can be rebuilt by replacing the failed drive and rebuilding the mirrorset. Most RAID controllers provide the ability to do this failed drive replacement and remirroring while Windows and SQL Server are online. Such RAID systems are commonly referred to as "Hot Plug" capable drives.

One advantage of mirroring is that it offers the best performance among RAID options if fault tolerance is required. Bear in mind that each SQL Server write to the mirrorset results in two disk I/O operations, once to each side of the mirrorset. Another advantage is that mirroring provides more fault tolerance than parity RAID implementations. Mirroring can enable the system to survive at least one failed drive and may be able to support the system through failure of up to half of the drives in the mirrorset without forcing the system administrator to shut down the server and recover from the file backup.

The disadvantage of mirroring is cost. The disk cost of mirroring is one extra drive for each drive worth of data. This essentially doubles your storage cost, which, for a data warehouse, is often one of the most expensive components needed. Both RAID 1 and its hybrid, RAID 0+1 (sometimes referred to as RAID 10 or 0/1) are implemented through mirroring.

Parity is implemented by calculating recovery information about data written to disk and writing this parity information on the other drives that form the RAID array. If a drive should fail, a new drive is inserted into the RAID array and the data on that failed drive is recovered by taking the recovery information (parity) written on the other drives and using this information to regenerate the data from the failed drive. RAID 5 and its hybrids are implemented through parity. The advantage of parity is cost. To protect any number of drives with RAID 5, only one additional drive is required. Parity information is evenly distributed among all drives participating in the RAID 5 array.

The disadvantages of parity are performance and fault tolerance. Due to the additional costs associated with calculating and writing parity, RAID 5 requires four disk I/O operations for each write, compared to two disk I/O operations for mirroring. Read I/O operation costs are the same for mirroring and parity. Read operations, however, are usually one failed drive before the array must be taken offline and recovery from backup media must be performed to restore data.

General Rule of Thumb: Be sure to stripe across as many disks as necessary to achieve solid disk I/O performance. System Monitor will indicate if there is a disk I/O bottleneck on a particular RAID array. Be ready to add disks and redistribute data across RAID arrays and/or small computer system interface (SCSI) channels as necessary to balance disk I/O and maximize performance.

Effect of On-Board Cache of Hardware RAID Controllers

Many hardware RAID controllers have some form of read and/or write caching. This available caching with SQL Server can significantly enhance the effective I/O handling capacity of the disk subsystem. The principle of these controller-based caching mechanisms is to gather smaller and potentially nonsequential I/O requests coming in from the host server (SQL Server) and try to batch them together with other I/O requests for a few milliseconds so that the batched I/Os can form larger (32128 KB) and maybe sequential I/O requests to send to the hard drives. In keeping with the principle that sequential and larger I/O is good for performance, this helps produce more disk I/O throughput given the fixed number of I/Os that hard disks are able to provide to the RAID controller. It is not that the RAID controller caching magically allows the hard disks to process more I/Os per second. Rather, the RAID controller cache is using some organization to arrange incoming I/O requests to make best possible use of the underlying hard disks' fixed amount of I/O processing ability.

These RAID controllers usually protect their caching mechanism with some form of backup power. This backup power can help preserve the data written in cache for some period of time (perhaps days) in case of a power outage. If the database server is also supported by an uninterruptible power supply (UPS), the RAID controller has more time and opportunity to flush data to disk in the event of power disruption. Although a UPS for the server does not directly affect performance, it does provide protection for the performance improvement supplied by RAID controller caching.

RAID Levels

As mentioned above, RAID 1 and RAID 0+1 offer the best data protection and best performance among RAID levels, but cost more in terms of disks required. When cost of hard disks is not a limiting factor, RAID 1 or RAID 0+1 are the best choices in terms of both performance and fault tolerance.

RAID 5 costs less than RAID 1 or RAID 0+1 but provides less fault tolerance and less write performance. The write performance of RAID 5 is only about half that of RAID 1 or RAID 0+1 because of the additional I/O needed to read and write parity information.

The best disk I/O performance is achieved with RAID 0 (disk striping with no fault tolerance protection). Because RAID 0 provides no fault tolerance protection, it should never be used in a production environment, and it is not recommended for development environments. RAID 0 is typically used only for benchmarking or testing.

Many RAID array controllers provide the option of RAID 0+1 (also referred to as RAID 1/0 and RAID 10) over physical hard drives. RAID 0+1 is a hybrid RAID solution. On the lower level, it mirrors all data just like normal RAID 1. On the upper level, the controller stripes data across all of the drives (like RAID 0). Thus, RAID 0+1 provides maximum protection (mirroring) with high performance (striping). These striping and mirroring operations are transparent to Windows and SQL Server because they are managed by the RAID controller. The difference between RAID 1 and RAID 0+1 is on the hardware controller level. RAID 1 and RAID 0+1 require the same number of drives for a given amount of storage. For more information on RAID 0+1 implementation of specific RAID controllers, contact the hardware vendor that produced the controller.

The illustration below shows differences between RAID 0, RAID 1, RAID 5, and RAID 0+1.

Note: In the illustration above, in order to hold four disks worth of data, RAID 1 (and RAID 0+1) need eight disks, whereas Raid 5 only requires five disks. Be sure to involve your storage vendor to learn more about their specific RAID implementation.

Level 0

This level is also known as disk striping because of its use of a disk file system called a stripe set. Data is divided into blocks and spread in a fixed order among all disks in an array. RAID 0 improves read/write performance by spreading operations across multiple disks, so that operations can be performed independently and simultaneously. RAID 0 is similar to RAID 5, except RAID 5 also provides fault tolerance. The following illustration shows RAID 0.

 

Level 1

This level is also known as disk mirroring because it uses a disk file system called a mirror set. Disk mirroring provides a redundant, identical copy of a selected disk. All data written to the primary disk is written to the mirror disk. RAID 1 provides fault tolerance and generally improves read performance (but may degrade write performance). The following illustration shows RAID 1.

rdbmsp03

Level 2

This level adds redundancy by using an error correction method that spreads parity across all disks. It also employs a disk-striping strategy that breaks a file into bytes and spreads it across multiple disks. This strategy offers only a marginal improvement in disk utilization and read/write performance over mirroring (RAID 1). RAID 2 is not as efficient as other RAID levels and is not generally used.

Level 3

This level uses the same striping method as RAID 2, but the error correction method requires only one disk for parity data. Use of disk space varies with the number of data disks. RAID 3 provides some read/write performance improvement. RAID 3 also is rarely used.

Level 4

This level employs striped data in much larger blocks or segments than RAID 2 or RAID 3. Like RAID 3, the error correction method requires only one disk for parity data. It keeps user data separate from error-correction data. RAID 4 is not as efficient as other RAID levels and is not generally used.

Level 5

Also known as striping with parity, this level is the most popular strategy for new designs. It is similar to RAID 4 because it stripes the data in large blocks across the disks in an array. It differs in how it writes the parity across all the disks. Data redundancy is provided by the parity information. The data and parity information are arranged on the disk array so the two are always on different disks. Striping with parity offers better performance than disk mirroring (RAID 1). However, when a stripe member is missing, read performance degrades (for example, when a disk fails). RAID 5 is one of the most commonly used RAID configurations. The following illustration shows RAID 5.

Level 0+1

This level is also known as "mirrored stripes." This level uses a striped array of disks, which are then mirrored to another identical set of striped disks. For example, a striped array can be created using four disks. The striped array of disks is then mirrored using another set of four striped disks. RAID 0+1 provides the performance benefits of disk striping with the disk redundancy of mirroring. RAID 0+1 provides the highest read/write performance of any of the RAID levels at the expense of using twice as many disks. The following illustration shows RAID 0+1.

Level 1+0

Similar results can be achieved using a slightly modified RAID approach known as "striped mirrors". Whereas RAID 0+1, described above, is essentially a mirror of striped sets, RAID 1+0 is a stripe that spans a mirrored set. Both RAID approaches offer the performance improvements of RAID 0 with redundancy characteristics of RAID 1 without requiring parity calculations. In deciding which approach is correct, bear in mind that RAID 1+0 often holds an edge in terms of fault tolerance and rebuild performance.

Online RAID Expansion

This feature allows disks to be added dynamically to a physical RAID array while SQL Server remains online. Additional disk drives are automatically integrated into the RAID storage. Disk drives are added by installing them into physical positions called hot plug drive slots, or hot plug slots. Many hardware vendors offer hardware RAID controllers that are capable of providing this functionality. Data is automatically re-striped across all drives evenly, including the newly added drive, and there is no need to shut down SQL Server or Windows. You can take advantage of this functionality by leaving hot plug slots free in the disk array cages. If SQL Server is regularly overtaxing a RAID array with I/O requests (this will be indicated by Disk Queue Length for the Windows logical drive letter associated with that RAID array), it is possible to install one or more new hard drives into the hot plug slots while SQL Server is still running. The RAID controller will move some existing SQL Server data to these new drives so data is evenly distributed across all drives in the RAID array. Then the I/O processing capacity of the new drives (75 nonsequential/150 sequential I/Os per second, for each drive) is added to the overall I/O processing capacity of the RAID array.

System Monitor and RAID

In System Monitor (Performance Monitor in Microsoft Windows NT 4.0), information can be obtained for both logical and physical disk drives. The difference is that logical disks in System Monitor are associated with what Windows reads as a logical drive letter. Physical disks in System Monitor are associated with what Windows reads as a single physical hard disk.

In Windows NT 4.0, all disk counters for Performance Monitor were turned off by default because they could have a minor impact on performance. In Windows 2000 the physical disk counters are turned on by default and the logical disk counters are turned off by default. Diskperf.exe is the Windows command that controls the types of counters that can be viewed in System Monitor.

In Windows 2000, to obtain performance counter data for logical drives or storage volumes, you must type diskperf -yv at the command prompt, and then press ENTER. This causes the disk performance statistics driver used for collecting disk performance data to report data for logical drives or storage volumes. By default, the operating system uses the diskperf -yd command to obtain physical drive data.

The syntax for Diskperf.exe in Windows 2000 is as follows:

diskperf [-y[d|v] | -n[d|v]] [\\computername]

Parameters

(none)

Reports whether disk performance counters are enabled and identifies the counters enabled.

-y

Sets the system to start all disk performance counters when you restart the computer.

-yd

Enables the disk performance counters for physical drives when you restart the computer.

-yv

Enables the disk performance counters for logical drives or storage volumes when you restart the computer.

-n

Sets the system to disable all disk performance counters when you restart the computer.

-nd

Disables the disk performance counters for physical drives.

-nv

Disables the disk performance counters for logical drives.

\\computername

Specifies the computer you want to see or set disk performance counters to use.

With Windows NT 4.0 and earlier, diskperf y was used for monitoring hard drives, or sets of hard drives and RAID controllers, that were not using Windows NT software RAID. When utilizing Windows software RAID, use diskperf ye so that System Monitor will report physical counters across the Windows NT stripesets correctly. When diskperf ye is used in conjunction with Windows NT stripesets, logical counters will not report correct information and should be disregarded. If logical disk counter information is required in conjunction with Windows NT stripesets, use diskperf y instead. With diskperf y, logical disk counters will be reported correctly for Windows NT stripesets, but physical disk counters will not report correct information and should be disregarded.

Note: The effects of the diskperf command do not take effect until Windows has been restarted (both for Windows 2000 and earlier versions of Windows NT).

Considerations for Monitoring Hardware RAID

Because RAID controllers present multiple physical hard drives as a single RAID mirrorset or stripeset to Windows, Windows reads the grouping as though it were a single physical disk. The resulting abstracted view of the actual underlying hard drive activity can cause performance counters to report information that can be misleading.

From a performance tuning perspective, it is very important to be aware of how many physical hard drives are associated with a RAID array. This information will be needed when determining the number of disk I/O requests that Windows and SQL Server are sending to each physical hard drive. Divide the number of disk I/O requests that System Monitor reports as being associated with a hard drive by the number of actual physical hard drives known to be in that RAID array.

To get a rough estimate of I/O activity for each hard drive in a RAID array, it is also important to multiply the number of disk write I/Os reported by System Monitor by either two (RAID 1 and 0+1) or four (RAID 5). This will give a more accurate account of the number of actual I/O requests being sent to the physical hard drives, because it is at this physical level that the I/O capacity numbers for hard drives apply. This method, however, will not calculate the hard drive I/O exactly, when the hardware RAID controller is using caching, because caching can significantly affect the direct I/O to the hard drives.

When monitoring disk activity, it is best to concentrate on disk queuing instead of on the actual I/O for each disk. Disk I/O speeds depend on the transfer rate capability of the drives, which cannot be adjusted. Because there is little you can do other than buy faster, or more, drives, there is little reason to be concerned with the amount of I/O that is actually occurring. However, you do want to avoid too much disk queuing. Significant disk queuing reveals that you have an I/O problem. Because Windows cannot read the number of physical drives in a RAID array, it is difficult to accurately assess disk queuing for each physical disk. A rough approximation can be determined by dividing the Disk Queue Length by the number of physical drives participating in the hardware RAID disk array for the logical drive being observed. It is optimal to attempt to keep the disk queue number below two for hard drives containing SQL Server files.

Software RAID

Windows 2000 supports software RAID to address fault tolerance by providing mirrorsets and stripesets (with or without fault tolerance) through the operating system when a hardware RAID controller is not used. You can set up RAID 0, RAID 1, or RAID 5 functionality using operating system procedures. Most large data warehouses use hardware RAID, but in the event that your installation is relatively small or you choose not to implement hardware RAID, software RAID can provide some data access and fault tolerance advantages.

Software RAID does utilize some CPU resources, because Windows has to manage the RAID operations that the hardware RAID controller would typically manage for you. Thus, performance with the same number of disk drives and Windows software RAID may be a few percent less than with hardware RAID, especially if the system processors are nearly 100 percent utilized for other purposes. By reducing the potential for I/O bottlenecks, Windows software RAID will generally help a set of drives service SQL Server I/O better than if the drives are used without software RAID. Software RAID should allow for better CPU utilization by SQL Server because the server will wait less often for I/O requests to complete.

Disk I/O Parallelism

An effective technique for improving the performance of large SQL Server databases that are stored on multiple disk drives is to create disk I/O parallelism, which is the simultaneous reading from and writing to multiple disk drives. RAID implements disk I/O parallelism through hardware and software. The next topic discusses using partitioning to organize SQL Server data to further increase disk I/O parallelism.