Saturday, January 29, 2011

Corrupted zip file after using split and cat on Linux

Hello everyone,

I had to split this 2.6 GB zip-file in order to send it thru a slow uplink. I did this:

split -b 879m BIGFILE.zip

This created xaa, xab & xac which I uploaded to the remote server. After the transfer finished I verified each one of these 3 pieces with md5sum (both on my local system and on the server):

md5sum xaa
md5sum xab
md5sum xab

All of the 3 hashes were identical to that of the 3 ones on my system so the transfer went well. Now, on the remote system, when I do this:

cat xa* > BIGFILE.zip

...then I verify the hash of this BIGFILE.zip (on both systems):

md5sum BIGFILE.zip

...and both of them match.

Now comes the interesting part. When I try to list the contents of the zip file I get an error:

unzip -l BIGFILE.zip

I get:

Archive:  BIGFILE.zip
  End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of BIGFILE.zip or
        BIGFILE.zip.zip, and cannot find BIGFILE.zip.ZIP, period.

This is totally weird. I'm using the same version of "unzip" on both systems. When I use the "unzip -l" on my local system it works.

Thanks for any help. JFA

  • You should have used rsync instead. split is gheeetto. That said I have no idea what's wrong with your problem. Oh .. wait ... you can use rsync now. It will only transfer the difference between the files. Assuming you have ssh access on the remote machine:

    rsync -Pvz BIGFILE.zip remotehostname:/path/to/BIGFILE.zip
    

    ... and you're done.

    Dennis Williamson : The checksums indicate that the file was transmitted completely and correctly.
    From niXar
  • How did you transfer file files? If you did it via FTP ASCII mode will hose the files. You may be able to use the -F flag of unzip to correct this but don't bet on it.

    You may need to retransmit the files-I'd recommend doing it via scp

    Dennis Williamson : If the files got mangled during transmission then the checksums wouldn't match.
    From Josh Budde
  • Identical MD5 hashes suggest that the transfer has worked well.

    More than 2G filesize sounds suspiciously like some pointer size issue - maybe the zip in question doesn't handle that well? more than (ca) 2G would be a negative number in 32 bit... Can you unzip the file on the system where you zipped it? Do both systems differ? Is one 64bit, the problematic 32 bit? What are the filesystems on both systems? Can you find another zip utility?

    If you have a chance to retransmit the content, you might want to use tar.gz or keep file size lower than that value. gzip compressed content should handle this better. Zip stores the contents (index) at the end of the file.

    Edit: Yup, see here:

    In practice, the real limit may be 2 GB on many systems, due to UnZip's use of the fseek() function to jump around within an archive. Because's fseek's offset argument is usually a signed long integer, on 32-bit systems UnZip will not find any file that is more than 2 GB from the beginning of the archive [...]

    Dennis Williamson : +1 Brilliant. Great thinking!
    From Olaf

0 comments:

Post a Comment