Tuesday, August 14, 2007

My new disk has just eaten planet earth!

Here are some things that trouble me
-- Disk are really no more reliable than they were 3-5 years ago
-- Manufacturers are chasing capacity, 3x or 4x growth over the same period of time
-- The failure rate per MB stored grown as the same rate

I conclude that my data is being stored less reliable than 3-5 years ago. That really worries me. Sure, RAID-5, 5 and 6 will help. But it means that my data is much more vulnerable whilst the RAID group is being rebuilt, since the amount of time it takes to rebuild a RAID group with 1TB disks is proportionally longer than 73GB disks. I'm now in a much wider window to suffer a double or triple disk failure, because the disks are no more reliable. Yikes! Oh and a double disk failure. Well that's not the worst of my problems. Read error whilst re-building a RAID group. Yup, they happen all the time.

Given the price of disks, I am seriously considering RAID-1 everywhere. Why? Well it helps scale my I/O (see my other posts on supporting very large data warehouses), but it gets me away from RAID group reconstruction, I simply just need to re silver the disk, I don't need to recompute parity which kills my performance to the unaffected data.

So before you flame me, just think about why we had the other RAID schemes. It was to increase the reliability of unreliable disks (Redundant Array of Inexpensive Disks). RAID-4 and 5 saved capacity because they stored parity not a physical copy of the data. But I am faced with an excess of capacity, because I need to size my system for I/O not capacity. In the old days you sized it for capacity and as a function of that you got an excess (well normally at least) of I/O. What used to require a whole shelf of disks now can be stored on a single disk. Great for capacity and density, horrible for the fact that I now have 1/12th or 1/14th of the I/O because disks have essentially remained unchanged since the introduction of 15k RPM disks.

RAID-1. Here we go.

No comments: