[kwlug-disc] So why not tar -cf tarball.tar a.xz b.xz c.xz, instead of tar -cJf tarball.tar.xz a b c ?

bob+kwlug at softscape.ca bob+kwlug at softscape.ca
Thu Nov 3 12:37:00 EDT 2016


To turn this on its head a bit.....

Are you lamenting the shortcomings of tar and compression or are you trying to solve for bit-rot within archives that are in this format?

If the objective is to factor and mitigate bit-rot within compressed tar files, perhaps we should look at the medium it's being stored on instead.

If the compressed tar sits on disk, then you have various options. ZFS and BTRFS have the notion of checksumming disk blocks plus redundancy and logic to "heal" bit-rotted sectors. So your compressed tar file _should_ never fault. Granted, you've traded your space gains with compression for losses with redundant data having to be being stored (ie: mirrored or erasure coded blocks). (I guess you're still benefiting since you're not having to keep redundancy for your uncompressed data.)

Now, the more interesting case IMO is bit rot on tape. I'm 100% sure that tape technology has its own error correction mechanisms in place but let's say you don't trust it because it's just a black box to you. Is there a way you could implement something like RAID5 to the stream of bytes being written to the tape?

Ie: tar cv some_source | gzip -9v | erasure_encode > /dev/st0
And: cat /dev/st0 | erasure_decode | gunzip | tar tv

Does pushing the problem down lower into the storage stack help?

Bob.

PS: I found this kinda interesting: https://en.wikipedia.org/wiki/Erasure_code ... Well, to be truthful, just the first paragraph defining it.






More information about the kwlug-disc mailing list