Introduction to SCSI Disk Performance
|Originally published July, 1995|
|¿ 1995, 2005 Carlo Kopp|
The performance of SCSI devices is an area which is generally not well understood in the wider computing community, which has in turn produced a situation where commercial hype very often becomes the deciding factor in determining equipment selection. This is unfortunate, because a little insight can often go a long way to resolving uncertainties in this important area of system performance.
This discussion will focus on the performance of SCSI disks, because these ultimately become a key constraint to overall system performance, unlike ancillary peripherals such as tape drives or CD-ROM drives. Because the latter devices are usually much slower than modern disks, both in access time and transfer rate, their performance will be constrained almost wholly by the device rather than the interface.
Earlier features examined the performance of Unix filesystems, and the inner workings of the SCSI bus. This feature will attempt to build on what was discussed in earlier issues, in the hope of providing the reader with a solid insight into the central technical issues.
SCSI Disk Performance
SCSI Disks are intelligent devices, as the disk drive has an embedded controller which is usually built with either a microprocessor or a custom controller chip. In practice most drives can be broken down into a sealed head/disk assembly (usually termed a HDA) and a "logic" board, which contains not only the electronics which control the motor and drive the heads, but also the aforementioned bus interface.
The last 5 years have seen considerable gains in the sophistication of the electronics embedded in a SCSI disk drive. Most change has occurred in the SCSI interface, which has in most instances become more intelligent, thereby supporting a much larger SCSI command set, and faster, by virtue of support for fast or fast and wide SCSI-2 protocol.
The area in the disk drive which has received less publicity than it perhaps should have is the now almost ubiquitous disk cache. The cache in a modern SCSI disk drive is analogous in function to the caches which have become commonplace on modern processor boards.
The central idea of all caching is to provide a small and very fast, in comparison with the main storage, memory. Whenever a block is read, the cache controller logic first looks into the cache, to see if the block is already sitting in cache. If it is in the cache, it can be read immediately, if not the drive must locate the block, and then read it. In the latter situation, a copy of the block is then tucked away in the cache.
The significance of the cache lies in its much faster access time. Finding a block on a disk drive platter surface is a long and tedious task, if one thinks in the timescales otherwise applicable to modern computer hardware. To seek the heads across the whole surface, say from the outermost to the innermost track, takes typically 20 to 30 milliseconds of time. The head must then sit over the track in question until the block in question passes beneath it, at which time the data is read off the surface serially. Accessing a cache, on the other hand takes tens to hundreds of nanoseconds, ie a factor of about 200,000 times faster. The time overhead of the SCSI bus transfer is not significant, in comparison the the HDA access time.
Drive Head Disk Assembly Performance
While the drive cache will in many instances successfully mask the performance limitations of the HDA, where the block is not cache resident the performance of the HDA dominates the access time of the drive. Drive manufacturers usually specify the following parameters when quantifying the performance of a drive HDA.
As is immediately obvious, the time to access an arbitrary block is dependent both upon the block's relative angular position on the surface, and upon how many tracks must be jumped to access the block. This effect is termed the geometrical dependency of access time. If the block is geometrically near to the heads, the access time will be much shorter than if it is far away.
The mechanical nature of the disk drive imposes fundamental constraints as to how much access time can be improved. Seek time can be shortened by using a more powerful actuation motor and lighter heads and their supporting arms. Improvements in this area have resulted in about a twofold improvement in seek times, over the last decade or so. Smaller drive geometries are the biggest factor here, as this allows both smaller head machinery as well as lesser distances to traverse.
Rotational latency can be improved by spinning the disk at higher RPM. Whilst most mainstream disks still turn at 3600 RPM, thus yielding a worst case latency of 16.6 milliseconds, drives which turn at 4500, 5500 or higher RPM are becoming more common. A notional 7,200 RPM drive would have one half the rotational latency of the 3,600 RPM drive.
In any case, performance improvements in this area are not much greater than what we have seen with the internal combustion engine over the last few decades, another almost ubiquitous mechanical contrivance. Newtonian physics have in both instances proven to be a very difficult barrier to surmount.
It follows therefore that conventional disk drives will always have access times of the order of milliseconds, unless some genius finds a method of spinning a drive at 100,000 RPM and building a set of heads 10 or more times lighter than what we are accustomed to seeing today. A interesting point to ponder are the effects of a 100,000 RPM drive falling out of balance, and the reader is left to calculate the number of Joules of energy released by such a device if it mechanically fails.
Most manufacturers will specify average or typical access times when stating the performance of a HDA. These numbers are alas only useful to our brethren in the sales community, because they never include a specification of the type of filesystem used and/or the access pattern of the data on the surface. Without the latter qualifiers, the average access time is about as useful a metric of drive performance as MIPS or MFLOPS are of a CPU. Sadly, the author continues to encounter individuals who proudly proclaim to have purchased a drive with an "access time of X milliseconds...".
In practice the only two metrics with any meaning are the track-to-track access time, and the operating RPM of the HDA. Should you select the drive with the best figures in these two areas, all other things being equal, then you are reasonably safe.
To complete this discussion of disk HDA performance we must still examine one parameter, which is head transfer rate or bandwidth. This is a metric of the rate at which data is read off the surface of the drive platter, and in the days of low drive densities and cacheless drives, it was a critical limitation to drive performance. Modern high capacity drives have transfer rates of the order of Megabytes/sec, which is very close to the burst SCSI transfer rate, and thus this usually invisible parameter is of little significance. Were it otherwise, an additional time to transfer data would have to be added to the access time.
The performance enhancing effects of a drive cache can be quite difficult to analytically model, as they are critically dependent upon the statistics of block accesses to the disk. As these are dependent upon both the filesystem design and the idiosyncrasies of the applications running, performance of any given drive type may vary significantly from system to system. A closer look will make this quite evident.
The basic metric of a cache's performance is the its hit rate. The cache hit rate is defined as the fraction of the total number of drive accesses which result in the cache returning the requested block, rather than the drive HDA doing so. With a high cache hit rate the average access time can be cut dramatically, given the disparity in the times it takes for the HDA to find a block, compared to the drive cache. The average access time then becomes:
<access time> = (cache hit rate)x(SCSI read time from cache) + (cache miss rate) x(<HDA access time>)
As the time it takes for the SCSI controller to read from the cache is both short and relatively consistent, a constant is employed. As the HDA access time is dependent upon the factors previously discussed, a mean figure (<>) is used. The cache miss rate is (1 - cache hit rate).
The hit rate equation is important because it graphically illustrates both the effects of cache performance as well as the interaction between the filesystem block placement strategy and the HDA's performance.
There are a number of possible strategies for managing a cache on a disc drive, and these are worth examining to gain insight into why cache performance can be good, bad or indifferent.
The simplest and most common strategy used for caching on block reads is that of read lookahead caching. In a read lookahead caching strategy, once the drive head is positioned over the track containing the sought data block, every block on the track or some portion thereof is stored in the cache, although only the block requested is returned to the reading party. Should the subsequent read involve the following block on the track, it has already been cached and thus can be returned to the reading party in much less than a millisecond. This strategy is very effective where the filesystem is capable of placing consecutive pages in a file into consecutive disk block addresses. The BSD FFS/UFS and newer BSD EFS both do exactly this, and thus gain accordingly. Where on the other hand the filesystem likes to fragment files all over the disk (eg DOS FAT), the caching strategy will break down and poor hit rates result.
A different but not uncommon family of caching strategies are adaptive caching strategies. An adaptive cache will maintain a long term history of which blocks are most frequently accessed, and the those at the top of the list are (pre-emptively) held in the cache with the reasonable expectation that they will be frequently accessed. The adaptive caching strategies tend to offer little to operating systems such as Unix, which have very good buffer caching mechanisms within the operating system. Conversely operating systems without proper caching and filesystem block placement strategies (eg DOS/FAT), can gain significantly from adaptive strategies, as the disk drive is in effect doing the job which should be done by the operating system. It is not unreasonable to say that the massed proliferation of DOS derivative systems with the questionable FAT filesystem has been the driving force in the development of SCSI drive caches.
A good way of generalising these two approaches to caching is to describe read lookahead strategies as locally optimising strategies, and adaptive caching schemes as globally optimising strategies. BSD FFS derived Unix filesystems implement both effective global and local block placement strategies, and as it turns out, the locally optimising read lookahead strategy meshes very well with the filesystem's behaviour, moreso in the EFS which itself executes clustered reads of multiple consecutive blocks.
Write caching on drives is another interesting area to examine. In the Unix environment write caching becomes particularly important, as the highly effective operating system buffer cache will soak up most of the block read traffic before it hits the disk drive. What this means is that in most Unix systems the physical disk traffic will be dominated by write traffic.
The idea of write caching is that the disk drive tucks the block received from the host into the cache, and then completes the SCSI transaction signalling the host that the block has been written. The drive logic then, at its own pace, flushes the block to the disk surface. A clever controller will accumulate several blocks to write, and then use an elevator algorithm to achieve efficiency in head seeks.
The observant reader will at this point ask the obvious question - what happens to my block of data if the power fails ? Well, unless the said disc drive uses a non-volatile cache memory, it is bye-bye data block ! A non-volatile cache allows the drive to flush pages to be written on power up after a power failure. In this fashion data integrity is retained, albeit in a cumbersome fashion.
Most drives the author has worked with in recent times employ conventional volatile caches. Fortuitously, write caching is a facility which the SCSI standard nominates as configurable, and therefore it can usually be turned off. While this incurs an evident performance penalty, this is a pain which must be accepted, particularly if a database or critical commercial system is involved. The alternative is an enabled cache and a UPS.
It is worth reiterating here that the SCSI command set has provisions for locking nominated block locations in cache memory, as well as a prefetch command which loads a block into cache in anticipation of an ensuing request to the location in question. Whether either are implemented depends on the drive manufacturer, whether they are usable depends on the writer of the device driver used.
|$Revision: 1.1 $|
|Last Updated: Sun Apr 24 11:22:45 GMT 2005|
|Artwork and text ¿ 2005 Carlo Kopp|