The post describes the history of RAID 5 and how it became obsolete at some point in time, just because HDD capacity grew at an enormous rate. It happened due to the chance of failure that grew to literal imminence when spinning disks reached TB scale, because the reading speed still had the same physical limits. Basically, creating a RAID 5 even with 1 TB disks would mean certain failure of the whole array and quite soon. The array technology was “saved” by an unlikely ally – the SSD. Being faster than hard disk drives in everything, they almost nullify the chance of the abovementioned failures. The post is written for everyday reader, not just engineers, and is quite comprehensive even without special knowledge and skills.
Technology often becomes obsolete these days, but sometimes a single change in it revives the principle and makes it relevant again. As HDD capacity grew, some parameters, such as seek speed, had mechanical limits and couldn’t follow. This created a few issues, which almost rendered RAID 5 useless.
IT technology storms forward like a jetfighter, growing performance, frequency and capacity in a geometric progression. Just 15 years ago, Megabytes sounded impressive and today we look at Terabytes as if they were always there. Such speedy development brings opportunities by the dozen every month into personal computing as well as businesses. The latter rely on extensive IT infrastructures more and more these days, as the data grows at a similar pace. This growth initiates further demand on IT development, which in its turn, expands data even more.
What worked great years ago just doesn’t fit the role today any more. With modern high-capacity HDDs, RAID 5 becomes catastrophically unreliable, because the chances of double failure become very real during rebuild time. This process takes much time because it involves reading literally all the data, so there’s a lot of random IO – something HDDs have troubles with. Mechanical limits held seek speed in place, while capacity grew, so the rebuilding took more and more time. Thus, high-capacity HDD stays in this high risk state for a long time, with higher workload and chances of total failure.
The other side of the problem is the Unrecoverable Read Error (URE), which is typical for consumer-grade high-capacity SATA. It will also cause the failing disk of a RAID to enter the vulnerable rebuilding state, while another URE occurring during the process will bring down the whole RAID, wreaking havoc in production. With the usual risk of unrecoverable read errors being one bit in 1015 for enterprise-class drives and one bit in 1014 for desktop-class drives, the whole situation becomes really shaky. Rebuilding 36 TB HDD makes the RAID a literal trouble magnet, presenting a rough 50% chance of complete failure. Consequently, the growth of HDD capacity has turned RAID 5 into a house of cards.
Using SSDs renders RAID 5 immune to the reliability issues, because they stay in the vulnerable state just a fraction of the time HDDs do. The key factors here are:
- SSD has aptitude for prompt random data access
- Smaller capacity, which means shorter recovery
- Read-modify-write sequence goes much faster on flash
- Bit rot eliminated thanks to better hash sums in drive controller level
Once the time of lost redundancy is almost nullified, there’s only a small and acceptable chance of RAID going down. The fact, that SSD is also prone to failing due to wearing out is irrelevant, because this issue is very predictable. Besides, overall SSD life can be prolonged by workload balancing, for example striping. Flash price seems to be the only drawback, but it’s gradually going down every year. Realistic studies predict flash cheaper than SAS HDD as soon as 2017.
Utilizing SSD renders RAID5 immune to parity RAID issues, because it minimizes rebuilding time to the point, where failure risk is almost nullified. While high-capacity HHD made RAID 5 unreliable, flash turned them relevant again.