lab 39 - tar and gzip
NAME
lab 39 - tar and gzip
NOTES
In lab 38 I looked at file formats I could use to store a document repository. I kind of decided I needed an archive file of individually gzipped files. After a little further digging I find that the gzip file format (rfc) supports multiple gzip files concatenated together as a valid gzip file. A quick test on unix shows this to be true.
% cat file1 file2 | wc 2 8 35 % gzip < file1 > t1.gz % gzip < file2 >> t1.gz % gunzip < t1.gz |wc 2 8 35
But the same test on Inferno did not work. After a little hacking on /appl/lib/inflate.b I got it to work, although I'm not sure I haven't broken something else in doing so. So beware.
Appending to a gzip file is a nice feature. What about puttar? Can I append to a tar file?
% puttar file1 > t1.tar % puttar file2 >> t1.tar % lstar < t1.tar file1 1123551937 15 0
No. It stops reading after the first file. I looked at the code /appl/cmd/puttar.b and find it outputs zeroed blocks as a sort of null terminator for the file. I'm not sure if that's required to be a valid tar format file. The Inferno commands that read tar files don't seem to care since EOF works just as well. So I edited the file to not output zeroed blocks, and I renamed the command to putwar so not to confuse myself. Now I can append to a tar (war) file. What's more, I can put the gzip and tar together.
% putwar file1 |gzip > t1.tgz % putwar file2 |gzip >> t1.tgz % gunzip < t1.tgz |lstar file1 1123553153 15 0 file2 1123553156 20 0
I'll resurect gettarentry from last lab so I can apply a command to each file
% gunzip < t1.tgz |gettarentry {echo $file; wc} file1 1 4 15 file2 1 4 20
This is very close to what I want. I can process the whole archive in a stream, it is compressed, and if I know the offsets of each file I can jump directly to it and start the stream from there.
The remaining problems are that I don't know what meta information to store with the file, so I'm going with tar's information by default. Also, the tar format isn't handled by the sh-alphabet, which is a pity. But that doesn't matter because now I've got something concrete to play with which is good enough.
Time to really get processing some data.
Comments
A venti implementation in Limbo would be very cool.
what i'm trying to do is parallel programming using a lot of disks. i'm not sure how i'd go about setting up a venti to work across disks and have computations work independently on one another. so i'm doing this as something simpler.