Using data from millions of drive days in Google datacenters, a new paper offers production lifecycle data on SSD reliability. Surprise! SSDs fail differently than disks – and in a dangerous way. Here’s what you need to know.
Key conclusions
- Ignore Uncorrectable Bit Error Rate (UBER) specs. A meaningless number.
- Good news: Raw Bit Error Rate (RBER) increases slower than expected from wearout and is not correlated with UBER or other failures.
- High-end SLC drives are no more reliable that MLC drives.
- Bad news: SSDs fail at a lower rate than disks, but UBER rate is higher (see below for what this means).
- SSD age, not usage, affects reliability.
- Bad blocks in new SSDs are common, and drives with a large number of bad blocks are much more likely to lose hundreds of other blocks, most likely due to die or chip failure.
- 30-80 percent of SSDs develop at least one bad block and 2-7 percent develop at least one bad chip in the first four years of deployment.
Source:
SSD reliability in the real world: Google’s experience | ZDNet
See also:
The SSD Endurance Experiment: They’re all dead – The Tech Report