[time-nuts] While we're discussing backups...

Tue Aug 26 19:25:56 EDT 2008

Robert Vassar said:
> There have certainly been some amusing replies.  My only point was
> that if it you are storing stuff on "spinning rust", you can't call
> it a backup if it's still spinning.  Power it off and de-cable it.
> How much further you go after that to protect it depends on your risk
> requirements.  I did like the zip-loc bag idea.

I've been trying to stay out of this, but I have some expertise digital
asset preservation, as it has been a recent research area of mine.

(Someone referred LOCKSS -- that's good work and a nice place to start; one
of its creators is a colleague of mine.)

A couple points are worth making:

Diversity of all kinds is good.  This would include geography, operating
system, media, administrative control, 'players' (i.e. someway to interpret
the bits) et al.

Extra copies are good (and, yes, you can use coding to avoid 100% overhead
for every copy), but you rapidly lose the benefit of the extra copies if you
do not actively repair them quickly (enough) after they fail.   

And here's the rub: a significant fraction of storage failures are latent --
they go undetected until you attempt to retrieve and 'perform' the asset.  

So to make sure your copies are good, you must audit them regularly.   Given
trust between administrative domains, this can be as simple as comparing
cryptographic hashes of the bits.   (There are also schemes that work
without assuming trust.)

I don't have the ref handy at the moment, but we have a model and math that
quantifies the issue around latent errors.

But don't audit too often, if the auditing mechanism causes wear.   How
often is often enough is left an exercise for the reader.

These are of course general principles.  You still need to look at your
threat model and the value of your data and make reasonable engineering
choices.

-ch