[time-nuts] Meaning of MTBF (was: Reliability of atomic clocks)

Jay Grizzard elfchief-timenuts at lupine.org
Tue Mar 29 11:10:58 EDT 2016


> It get's "interesting" when you look at the MTBF times on hard disks. Some
> of the figures quoted in hours related to an MTBF of over 100 years. From
> what I read before, this was based on you replacing the drive at the end of
> its service life (typically 3 years for consumer drives and 5 years for
> enterprise grade disks). So no individual drive was ever expected to last
> 100 years, but if you kept replacing the drives ever 3~5 years, the average
> time of an unexpected failure would be 100 years. I guess its a bit like a
> car - the engine might run for 250,000 miles, but if you never change the
> oil or the camshaft belt, it is not going to last.
> 
> I note Seagate have dropped the use of MTBF:
> 
> http://knowledge.seagate.com/articles/en_US/FAQ/174791en?language=en_US

The article you link here actually explains what MTBF on drives is
measuring -- and it has nothing to do with when you replace your drives.

MTBF is basically expressed as "1 failure per N power-on hours". So if
you have a MTBF of 100,000 hours and you have 100 drives running
continuously, you will (on average) have one failure every ~42 days (1000
hours). If you have 100,000 drives, you'll have (on average) one failure
every hour. MTBF does not address the expected life of any specific
drive in any way.

(It also does not address the bathtub curve that drive failures tend
to follow -- there's a high 'infant mortality' rate for new drives, 
then a number of years of service with a low failure rate, followed by an
increase in failure rate after some number of years.)

FWIW, there have been a few interesting things published on drive failure
rates. One of the most interesting is a study[1] Google published in 2007,
which drew some rather unexpected conclusions (e.g. drive temperature is
not associated with failure rate, except at the higher ends of the
temperature range). Backblaze (a cloud backup provider) also publishes
regular reports on drive reliability[2], and have been for a few years now.

-j

1. http://static.googleusercontent.com/media/research.google.com/en//archive/disk_failures.pdf

2. https://www.backblaze.com/blog/hard-drive-reliability-q4-2015/


More information about the time-nuts mailing list