Disk arrays and data striping
A disk array organizes multiple independent disks into one
large, high performance disk. In addition, some array
configurations write backup parity information at the same
time. (This renders them fault-tolerant, in that they can
continue working even if a disk fails while in use.) Data
blocks are split up and written in parallel to the disks,
which speeds access.
The length of time taken to execute a read or write on a disk is determined by the time taken for the data area on the disk surface to pass under the read/write heads of the drive. Reading or writing an 8KB block takes eight times as long as a 1KB block. However, if the 8KB block is written to a disk array with eight disks, the data is split into eight stripes of 1KB, which are written to individual disks in parallel. In this way, disk arrays achieve a higher data transfer rate than non-parallel drives.
In practice, the expected linear scaling of throughput from using multiple disks is not achieved. This is because of seek-time, on-board disk caches and parity generation.
Data striping also results in uniform load balancing across all the disks on a system, eliminating disk hot spots. These arise when one disk is saturated with I/O requests while the rest lie idle.
However, when multiple disks are organized into arrays, the potential for data loss from disk failures is higher because the probability of a disk failure occurring in a given period is higher. For example, for a disk drive with a rated mean time before failure of 100,000 hours, there is a 50% probability of the drive failing in that length of time. However, for a disk array of 10 such drives, there is a 50% probability that one of them will fail in 10,000 hours (just over a year of 24-hour operation).