[kwlug-disc] So why not tar -cf tarball.tar a.xz b.xz c.xz, instead of tar -cJf tarball.tar.xz a b c ?

bob+kwlug at softscape.ca bob+kwlug at softscape.ca
Tue Oct 25 21:02:14 EDT 2016


I know this is not what you meant, but I submit this for consideration and discussion:

$ find . -type f -print0 | xargs -0 sha1sum > 00_tar.sha1sum
$ tar cvzf /dev/st0 .

Also with the notion that gzip itself will signal integrity of the entire archive. And in cases where it doesn't, you can always fall back on this: http://www.gzip.org/recover.txt


On a slightly different approach, I think that if you were to use the option

     -I, --use-compress-program PROG
           filter through PROG (must accept -d)

and write your own version of PROG that would augment a tar stream, you could inject your own checksums in there. You could even probably craft it such that you only compressed the actual files themselves and not the tar headers and meta info. So that lacking PROG, you can extract gzipped versions of the actual files. Although, tar itself might have some issues with the extra bytes in its stream if it keeps some sort of integrity checks on data length for files.

You would inject a hash on tar -c and verify on tar -x (by way of the -d being passed to PROG), although I'm not sure how you would signal a problem (maybe output to stderr from PROG??). 

My understanding of PROG is that it is just pipelined into the data stream just before the output handle of the tar command. I've never really played with this, so I'm just theorizing. It might be an interesting experiment.

(the other)Bob.


> -----Original Message-----
> From: kwlug-disc [mailto:kwlug-disc-bounces at kwlug.org] On Behalf Of B.S.
> Sent: Friday, October 21, 2016 10:07 AM
> To: Kwlug-Disc
> Subject: [kwlug-disc] So why not tar -cf tarball.tar a.xz b.xz c.xz,
> instead of tar -cJf tarball.tar.xz a b c ?
> 
> By itself, tar has no individual file integrity (checksumming) ability -
> albeit the entire tarball itself is checksummed when used, as
> traditionally, with compressors such as gzip.
> 
> I've read that tarballs can be fragile. And when damaged, there's no way
> to know which files remain undamaged. [Yet tar is the only archiver kept
> current with filesystems enhancements, such as ACLs, xattr, links,
> pipes, devices, etc.. Zip (also an archive) isn't, nor gzip, et al.]
> 
> Yes, md5's would also confirm integrity - while adding awkwardness and
> sidecar files to also track and validate.
> 
> So why is tar'ring gzip's instead of gzip'ping tarballs not more
> popular? [Google-fu fail.] Yes, it won't compress as much, but that
> seems a small price to pay for individual file integrity assurance.
> 
> 
> _______________________________________________
> kwlug-disc mailing list
> kwlug-disc at kwlug.org
> http://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org







More information about the kwlug-disc mailing list