[kwlug-disc] So why not tar -cf tarball.tar a.xz b.xz c.xz, instead of tar -cJf tarball.tar.xz a b c ?

B.S. bs27975.2 at gmail.com
Sat Nov 5 00:33:19 EDT 2016


On 11/04/2016 01:59 PM, Chris Frey wrote:
> In your original example, the comparison was between tarring pre-compressed
> files in a plain .tar file vs. compressing a tar of uncompressed files.
>
> I was saying that if you can recover from both compressed and uncompressed
> tar files, the advantage of pre-compressing may not be so stark.
>
> In my brief reading and tests, It seemed possible to recover from both.
> In both cases, it appeared that I would need some special tools: ft for
> fixing tar, and gzrt for fixing gzip. (I haven't tested gzrt yet).
>
> So for pre-compressed plain tarballs, the recovery would be:
>
> 	1) use ft to fix the tarball
> 	2) use gzrt to try to recover something from the file
>
> For tar.gz, it would be:
>
> 	1) use gzrt to recover the plain tarball
> 	2) use ft to fix the tarball and extract good files

Ah, OK, I follow now.

But the paths having identical results is premised upon the efficacy of 
being able to recover (from either) - agreed A then B, or B then A, same 
difference. From what little I've seen so far, that premise is far from 
a given - a glitched tar you skip to the next file block and continue 
getting what you can out of it; a glitched zip is more likely to be 
completely unrecoverable due to the integrity info spread throughout a 
file, and a hole in the middle of a large tar.gz likely kills the entire 
tar.gz file, while a hole in the middle of a tar file (or a single gzip 
hole within a single file) only affects that file.

I will be most interested to hear any opinion you arrive at should you 
ever conclude your tests.

The other premise of this conversation, though, is the ability to have 
confidence in a file within a tar at any point in time - integrity 
confirmation being inherent to the compress process would be an 
advantage of tar'ring zips over zipping tars.

As said prior, you can get to the same place with md5sums (or sha1sums), 
though, and avoid compression entirely (and potential broken compress / 
recovery issues). Which arguably makes sense in the presence of a 
compressing filesystem such as btrfs. Except, as also noted prior, I am 
seeing some significantly better compression results by using such as zx 
over btrfs alone.

At least one article I read noted you could change the contents of a tar 
file (e.g. hex edit), and as long as you stayed within any one file, tar 
would never know. Bummer.

So, one way or the other, compressing or md5summing tars seems prudent.





More information about the kwlug-disc mailing list