[kwlug-disc] So why not tar -cf tarball.tar a.xz b.xz c.xz, instead of tar -cJf tarball.tar.xz a b c ?

B.S. bs27975.2 at gmail.com
Fri Oct 21 14:05:37 EDT 2016


On 10/21/2016 10:27 AM, William Park via kwlug-disc wrote:
> - It's easier to compress one file than lots of little ones

Not so, or chicken and egg - whether you compress lots of little files
and concatenate into one big one, or concatenate lots of little files
then compress into one big one, same difference. (More workload, but
same number of 'atomic' operations.)

> - Whether you add compressed or uncompressed files to tarball,
> "integrity" of tarball is the same.

OK, strictly speaking, but practically, per expressed intent, not so.

The issue is not the pass/fail integrity of the whole, but the pass/fail
integrity of each component. (Inherently if all components have
integrity the whole has integrity, but the reverse is not true.)

A defective tar doesn't mean all files within the tar are defective -
but without some form of checksum, there's no way to know which files
are/n't. (Which is the point.)

> - It's easier to extract file verbatim than do post-processing
> individually.

That's a practical issue, not a reason - and only because tar has made 
it atomic to 'concatenate & compress' and 'uncompress & split'. Instead 
of manually 'tar -cf tarball.tar files ; xz tarball.tar'.

Currently, to your point, the equivalent to 'xz files ; tar -cf 
tarball.tar files.xz' has no tar correspondent for --from-command='xz' 
which would make it just as easy.

In any case, it isn't a reason / doesn't explain why tar'ring gzip's (to 
gain confidence of individual file integrity) isn't more prevalent.


On Fri, Oct 21, 2016 at 10:06:34AM -0400, B.S. wrote:
> By itself, tar has no individual file integrity (checksumming)
> ability - albeit the entire tarball itself is checksummed when used,
> as traditionally, with compressors such as gzip.
>
> I've read that tarballs can be fragile. And when damaged, there's no
> way to know which files remain undamaged. [Yet tar is the only
> archiver kept current with filesystems enhancements, such as ACLs,
> xattr, links, pipes, devices, etc.. Zip (also an archive) isn't, nor
> gzip, et al.]
>
> Yes, md5's would also confirm integrity - while adding awkwardness
> and sidecar files to also track and validate.
>
> So why is tar'ring gzip's instead of gzip'ping tarballs not more
> popular? [Google-fu fail.] Yes, it won't compress as much, but that
> seems a small price to pay for individual file integrity assurance.





More information about the kwlug-disc mailing list