Tuesday, August 14, 2007

Oh dear, you are out of capacity sir

I have to run very (and I mean close to 250TB) Data Warehouses. There are a huge number of challenges with these sort of environments, getting the data loaded, keeping the performance scalable and stable, backing up and restoring... its a much longer list!

So I have used Oracle, Sybase IQ, DB2, MySQL. I recently was introduced to Greenplum (I don't work for them), who provide an appliance (oh, how trendy) that combines a Sun "Thimper" (X4500 I think) which has 4 x AMD processors and 48 x 500GB disks, with a clustered version of PostgreSQL. I was intrigued. But then it made me think. I have 24TB of storage in 4U of rack space, that is really dense but I also have a large number of disks. Typically I am disk bound, so more dense disks makes a great deal of sense. I do end up wasting space (i.e. capacity) since I don't always need the storage BUT I do need the I/O. The push for greater capacity disk it not really helping me, so I really wish the disk manufactures would stop that race. Its just like the camera manufactuers putting more mega-pixels into your point-and-shoot. Above 6MP who cares right. Over 500GB on a disk, who cares if I ever want to do a mixed random rea and write workload (i.e. a database workload).

So why am I mad? All they are giving me is a higher rate of failure per MB of stored data, since the disks are no more reliable than they were, like in 1999

Here are some things that trouble me
-- Disk are really no more reliable than they were 3-5 years ago
-- The capacity has grown dramatically
-- The failure rate per MB stored is that same
-- Therefore the disk have a higher failure rate per MB stored

So Mr. Disk manufacturer, please learn that I need I/O and I need reliability. I don't need any more stinking MB per platter.

No comments: