Redundant Array of Independent Disks (RAID) describes array configuration and applications for multiple inexpensive hard disks, providing fault tolerance (redundancy) and improved access rates (RAID concept). RAID provides a way to access multiple individual disks as if the array were one larger disk, spreading data access out over these multiple disks, reducing the risk of losing all data if one drive fails, and improving access time. RAID is commonly used in large file servers, transaction of application servers, where data accessibility is critical, and fault tolerance is required. Increasingly, RAID is also being used in desktop systems for CAD, multimedia editing and playback where higher transfer rates are needed. The capability of an array to tolerate hard disk faults depends entirely on the RAID level implemented. There are at least ten types of RAID, presenting a myriad of feature tradeoffs that must be appropriately mapped to critical implementation requirements.
Below are the ten major types of RAID used today and their key characteristics (RAID, TechTarget):
RAID 0. Has striping but no redundancy of data.
RAID 1. Also known as disk mirroring and consists of at least two drives that duplicate the storage of data. There is no striping.
RAID 2. Uses striping across disks with some disks storing error checking and correcting (ECC) information.
RAID 3. Uses striping and dedicates one drive to storing parity information. The embedded error checking information is used to detect errors. Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Input/output (I/O) operation addresses all drives at the same time,
RAID 4. Uses large stripes, which means records can be read from any single drive. All write operations have to update the parity drive, no I/O overlapping is possible.
RAID 5. Includes a rotating parity array so that all read and write operations can be overlapped. RAID-5 stores parity information but not redundant data (but parity information can be used to reconstruct data). RAID-5 requires at least three and usually five disks for the array.
RAID 6. Similar to RAID-5 but includes a second parity scheme that is distributed across different drives.
RAID 7. Includes a real-time embedded operating system as a controller, caching via a high-speed bus, and other characteristics of a stand-alone computer.
RAID 10. Offers an array of stripes in which each stripe is a RAID-1 array of drives.
RAID 53. Offers an array of stripes in which each stripe is a RAID-3 array of disks.
With RAID 0, data is striped across each disk during read/write operations, typically doubling disk access speeds (Achieving fault tolerance by using RAID). However it does not offer any fault tolerance, so that if a single disk in a RAID 0 array is lost, all data is lost and will need to be recovered from backup. For this reason RAID 0 might be a good option for high performance workstations, but is not appropriate for mission-critical servers.
RAID 1 allows two or more disks to mirror each other (Achieving fault tolerance by using RAID). This configuration produces slow writes, but relatively quick reads, and facilitates high data availability on servers because a single disk can be lost without any loss of data. When more than two disks make up the mirror, the RAID 1 array can lose multiple disks as long as a complete mirrored pair is not lost. On the downside, the amount of physical disk space required is twice the space required to store the data. Therefore, Level 1 is most often used for applications that require very high data availability.
Level 2 is no longer used today because it was made obsolete by the use of ECC within a hard disk (Single RAID levels). It was expensive and required many drives and a complex, specialized controller. The performance of RAID 2 was also low in transactional environments due to the bit-level striping.
The dedicated parity disk presents a performance bottleneck when using RAID 3, especially for random writes, because it must be accessed any time anything is sent to the array (Single RAID levels). In contract, RAID 5 improves write performance by using distributed parity. RAID 3 differs from RAID 4 only in the smaller size of the stripes sent to disks. RAID 3 is suitable for applications working with large files that require high transfer performance with redundancy, especially serving or editing large files and multimedia.
RAID 4 is like RAID 3 except that it uses blocks instead of bytes for striping, and like RAID 5 except that it uses dedicated parity instead of distributed parity (Single RAID levels). RAID 4 is typically used for the same applications as RAID 3 and RAID 4, but is not as commonly used because it is a compromise between these competing levels.
RAID 5 operates much more slowly than RAID 0 because a parity bit must be calculated for all write operations (Achieving fault tolerance by using RAID. RAID-5 volumes are well suited for reads and work well in large query or database mining applications where reads occur much more frequently than writes. RAID 5 is also useful when a high degree of fault tolerance is required without the cost of the additional disk space needed by RAID 1. And, a RAID 5 volume is significantly more efficient than a mirrored volume when larger numbers of disks are used. The space required for storing the parity information is equivalent to 1/number of disks; a 10-disk array uses 1/10 of its capacity for parity information. Further, the disk space that is used for parity decreases as the number of disks in the array increases.
Because RAID 6 calculates two sets of parity information for each parcel of data, it can handle the failure of any two drives in the array while other single RAID levels can handle at most one fault (Single RAID levels). This is the major differentiator between RAID 5 and RAID 6. RAID 6 is not frequently used because it is expensive and few companies are willing to incur the costs to insure against the rare event of two drives failing at the same time.
RAID 7 is a proprietary RAID design from Storage Computer Corporation (Single RAID levels). RAID 7 offers better random read and write performance than RAID 3 and RAID 4 because the dependence on the dedicated parity disk is greatly reduced by using additional hardware. On the negative side, RAID 7 is an expensive solution, made and supported by only one company, relegating it use to specialized high-end applications requiring top performance.
Finally, RAID 10 and RAID 53 are high-end expensive alternatives to RAID 1 and RAID 3, respectively, to achieve higher performance (RAID tutorial). RAID 1 has higher I/O rates than RAID 1 by striping RAID 1 segments RAID 3 provides higher data transfer rates than RAID 3 by taking advantage of RAID 3 array segments. RAID 10 and RAID 53 can be configured to tolerate the loss of multiple hard disks depending. For example, a RAID 10 array consisting of two pairs of mirrored drives striped together can tolerate the simultaneous loss of two of the four drives, as long as they are not in the same pair.
Three large vendors, EMC, HP and IBM, as shown in Table 1 dominate the external RAID market, cornering more than fifty percent of the market (EMC News Release).
The pricing environment is extraordinarily aggressive, with year-to-year price declines from 2000 to 2002 averaging forty percent (Seyrafu and Rakers, 2002). Pricing/MB in the high-end RAID market in 2002 was $.05-$.06 compared to $.13-$.17 only one to two years earlier.
Worldwide Disk Storage System External RAID Market Share