[kwlug-disc] So why not tar -cf tarball.tar a.xz b.xz c.xz, instead of tar -cJf tarball.tar.xz a b c ?

bob+kwlug at softscape.ca bob+kwlug at softscape.ca
Wed Oct 26 11:24:55 EDT 2016


> > $ find . -type f -print0 | xargs -0 sha1sum > 00_tar.sha1sum $ tar
> > cvzf /dev/st0 .
> 
> Yep. Came to a similar conclusion myself, using --to-command.

Cool. I hadn't noticed this option before. Will have to grok that for a bit and see how it could be useful.





Re: recovering a gzip file....
> Saw that. Ain't pretty. 'fall back' is a very generous description. Step
> 1, get and modify the source ...

Lol! Yeah, you'd have to be pretty desperate at this point. Although I believe there was mention of something more automated in gzip v2.0 on that page.





> And ... that article exactly explains why we should be tarring zips, not
> zipping tars. Getting a zip back is problematic. Once your tarball zip
...
> 
> But there goes your integrity checks. (Not really - solution another day.)

Yeah, this always bothered me about tar | gzip but not enough to do anything about it. I guess I've never been burned by it yet to have spent cycles thinking about it.




> D'OH! My eyes have passed over that I don't know how many times. I've
> been saying there oughta be a --in-command or --from-command to match
> the --to-command. Double D'OH!

Did a quick experiment today as follows:

tar -cvf /tmp/1.tar --use-compress-program /tmp/PROG.sh  /some/dir

where PROG.sh was just 'tee /tmp/2.out'

What was cool was that 1.tar and 2.out were IDENTICAL!

I don't think it would take much to add some computation of the stream and data into the stream inside of PROG.sh to checksum the individual files. Just need to know enough about the data structures of a tar file to do this as it flies by.




> Two or three REAL beauties of tar:
> - understands most all metadata
> - REALLY good file set facilities. Perhaps superior, even.
> - --remove-files 'Pack it away' functionality. vs Duplicate file set
> facilities (e.g. to iterate gzip over all of them before handing them to
> tar), then manually remove them upon packing, tracking whether it's a
> file, a dir, a pipe, a link (hard or soft), a socket, a ...
> = Conversely, unzip'ping them upon extraction from tar.

Hear, hear! Tar, cpio and rsync. Essentials.




> > extract gzipped versions of the actual files. Although, tar itself
> > might have some issues with the extra bytes in its stream if it keeps
> > some sort of integrity checks on data length for files.
> 
> Shouldn't matter - it's just a stream of data to tar. Even if it does
> ... perhaps one can create a sidecar file on the fly with the associated
> data. One would have to craft an accompanying --to-command prog though -
> but if the one, the other should be little work. In the end, remaining
> user transparent. (Albeit apparent?) Transparent as in automated / hands
> free.

Hehe, I guess I'm going to be learning the structure of tar files soon. Now I'm curious.

Although, I think I'd prefer to have the check summing in-line rather than in a sidecar file. I'll need to think about that for a bit.





> At some point I'll post my conclusions on things. Been beating on things
> in various ways, but nothing is to the point of summarizing or posting
> on yet.
> 
> THANK YOU! Good talk!

Cool. Let me know if you discover anything interesting.

I'm going to try to find some documentation on the default tar format and play with PROG.sh as time permits. 

Although, I wonder if any of these:

     -H, --format FORMAT
           create archive of the given formatFORMAT is one of the following:

           --format=gnu
                 GNU tar 1.13.x format

           --format=oldgnu
                 GNU format as per tar <= 1.12

           --format=pax
                 POSIX 1003.1-2001 (pax) format

           --format=posix
                 same as pax

           --format=ustar
                 POSIX 1003.1-1988 (ustar) format

           --format=v7
                 old V7 tar format

have checksums or if you can add your own formats for something like compressing each file but not the meta-info.


The '-I PROG.sh' approach might be a valuable plug-in unto itself such that it could make a tar archive with compression potentially mostly salvageable and bakes in integrity checks.

BB







More information about the kwlug-disc mailing list