About TT's Zip Archive Package
Version: 2.1 (15 Oct 2010)
This is a collection of REALbasic Classes to extract and create ZIP archives (known as PKZIP and Info-ZIP formats), for Mac OS X, Windows and Linux.
Written and improved since March 2003 by Thomas Tempelmann
For updates see http://www.tempel.org/RB/ZipPackage
For feedback and questions, write to: tt@tempel.org
This code is free for your own use, but you are encouraged to send me some money if you have a use for it (consider it shareware). See the end of this document for more information.
Current requirements: REALbasic 2008 Release 4 or later. The Einhugur e-CryptIt plugin is now optional, and must be acquired separately. Alternatively, and only for Windows, the "zlib" DLL needs to be acquired, e.g. from http://www.zlib.net/
Features
- Works on all platforms (OS X, Windows, Linux)
- Very fast (with the Einhugur plugin, tests on OS X show that it is just as fast as Apple's own zip tool)
- Can remove items from an archive and then even compact the archive to regain the free space.
- Special feature: ZipSnapshots allow you to make incremental backups of the same folder, by which unchanged files will not be stored multiple times. See the notes in the same-named class and the code in the demo app to learn how to use it.
- Can retrieve data even from corrupted archives (not well documented; contact me directly on this feature)
Restrictions
It implements only a subset of the entire ZIP archive definition, though:
- Can't do encryption (neither decryption).
- Supports only "stored" and "deflate" compression methods, which are the most common ones. This means that it can't read all possible ZIP archives because some of them use different compression methods, but if you're using any Zip creating tool, you usually have control over which methods should be used. And "deflate" is usually the most effective, anyways.
- No support for multi-segment archives.
- No support for the newer 64 bit Zip archive format. That means that individual files's original size and the entire archive file size are limited to 4GB each, and that an archive can only hold up to 65535 items.
The only thing to be sure about is that files created by this class can be read by any modern ZIP tool, such as ZipIt (for Mac OS), OS X's Archives, WinZIP (for Windows), Stuffit Expander (Mac and Windows) and, of course, by itself.
Programming information
To actually be able to create and read compressed items in and archive you will also need a plugin providing the so-called "ZLib" compression functions. You have two choices:
- The free zlib external library, always available on OS X and Linux, but needs to be added manually on Windows
- Einhugur's fast e-CryptIt Engine plugin (http://www.einhugur.com/, costs money)
Quick Demo
If you are using Windows, first download the "zlib compiled DLL" from http://www.zlib.net/ and place the contained file zlib1.dll either next to "TT's ZipArchiver.rbp" (only works with RB 2009r2.1 and earlier due to a change in REALbasic 2009r3) or copy it into C:\WINDOWS\system\. If you are using Mac OS X or Linux, you do not need to do this because there is already a zlib library installed on those systems.
Open the project called "TT's ZipArchiver.rbp" and run it. The program should open a window into which you can drop files to compress or decompress instantly. It also allows you to choose options such as the encoding for the file names (unfortunately, different tools on the various computers use different encodings - though this will not be of relevance as long as you're using files names that contain only ASCII characters. Once you use accented chars, symbols or even non-Latin scripts in your file names, you need to make sure that the encoding used for compression matches the one used to later read the archive again.
Overview of the classes
The only classes you need to look at are:
- ZipArchive - create/open/close archives, add and remove files
- ZipEntry - get information about items in an archive and extract them
- ZipExtraField - get or create some optional additional data for archive items
- ZipConfig - here you may configure a few options on how you want the code to behave
Or, if you only want to uncompress .zip files easily, look at:
- ZipExtractor - contains simple methods to unzip an archive
The files inside the ZipSupport folder are not of much interest to you, while the files in the StreamSupport folder may be of general use for you in other projects, as they provide generalized read and write functions for both data and resource forks.
There is also a folder called "AddWhenNotUsingEinhugurPlugin". As its name suggests, you need to add its contents to your project if you do not have those plugins installed (you then also need to change the value of the constant "HaveEinhugurPlugin" to false and make sure you have the "zlib1.dll" available if you're running on Windows.
Creating an archive and adding files to it
First, create a new ZipArchive instance. Then you Open the archive by specifying a FolderItem along with a Boolean that is TRUE to signal that you want to write to the archive. If the FolderItem exists, it will be opened as an archive (provided it's a valid archive file), and if no file exists yet, a new archive file will be created.
dim zar as ZipArchive
zar = new ZipArchive
if not zar.Open(aZipFolderItem, true) then
MsgBox "Error: " + zar.ErrorMessage
return
end
Next, you can either add single files or entire folders to the archive:
dim result as Integer
if not aFolderItem.Directory then
// add a single file:
result = zar.AddItemToRoot(aFolderItem, zar.MacBinaryNever)
if result <= 0 then
MsgBox "Error: " + zar.ErrorMessage
return
end
else
// add the contents of a folder to the root:
if not zar.AddFolderContents(aFolderItem, "", zar.MacBinaryNever, ZipArchive.MacAliasHandling.DropAll, false) then
MsgBox "Error: " + zar.ErrorMessage
return
end
end
Finally, close the archive:
if not zar.Close() then
MsgBox "Error: " + zar.ErrorMessage
return
end
Caution!
Since Zip archives may contain multiple files with the same file name (and path), the methods to add an item to the archive do not check if such an item already exists in the archive. If you want to avoid adding duplicates to your archive, you have to check if they already exist, remove them first, and also compact the archive thereafter, ideally (and since compacting takes a while, you should first remove all items, then compact, then add the new ones). Also note that removing a folder from the archive will not remove all the items that were added inside that folder. That's because a Zip file organizes the files as a simple sequence, each entry noting the path and file where it came belongs. I might eventually add code to do all this automatically for you. If you need such a feature or need help with finding and removing existing items, contact me.
MacBinary information
The MacBinary option allows you to preserve Macintosh-specific file information, mainly that's the resource forks (it does also preserve some minor attributes such as the File Creation Date, because standard Zip format only preserves the Modification Date).
A Mac file's Type and Creator codes will always be preserved, even if no MacBinary is used. You may want to pass MacBinarySmart or MacBinaryAlways instead of MacBinaryNever in order to better preserve Macintosh specific information (when running on Windows, they make no difference).
The "Smart" version will encode files only as MacBinary if the file contains a Resource Fork.
Note that many non-Macintosh Zip tools can not handle MacBinary encoding, which means that they would decode such files as a MacBinary file, hiding the data fork in them along with other information (some tools can decode this properly, though, like Stuffit Expander for Windows - they extract the data fork only, ignoring the resource fork).
Extracting all items from an archive
Create a new ZipArchive instance, then call Open with passing a FolderItem identifying the archive you want to extract from, and FALSE for reading from the archive (instead of TRUE for writing to it).
dim zar as ZipArchive
zar = new ZipArchive
if not zar.Open(theFile, false) then
MsgBox "Error: " + zar.ErrorMessage
return
end
You can now loop over all entries in the archive, get their "ZipEntry" instances, and then extract their files to a folder hierarchy of which you can provide the starting folder:
dim f as FolderItem, e as ZipEntry, i as Integer
for i = 1 to zar.EntryCount
e = zar.Entry(i)
f = e.MakeDestination(destFolder,false)
if f.exists then
// here you could ask the user whether to overwrite the
// existing file (but it could be even a folder!) or
// to skip the item or abort the entire process.
f.Delete // this will not work if it's a non-empty folder, though!
end
if not e.Extract(f) then
MsgBox "Extraction of """+e.RawPath+""" failed: "+e.ErrorMessage
return
end
next
There is a little caveat here, though: Some operating systems (e.g. OS X) update a folder's modification time to the current time when a new file is added to the folder. While a zip archive usually contains entries for folders and their original timestamps, those usually come first, i.e. before their contents. Hence, even if the above loop extracts a folder entry first, settings its original timestamp, successive files extracted into the folder will modify its timestamp again. There is a work-around for this, fortunately: First, extract only the files, and then perform another extraction loop in which only the folders are extracting, thus setting their timestamps after all modifications to the folder are completed. See ZipExtractor.ExtractAllSilently() for an example.
Finally, close the archive again (no need to check for errors since we did only read from the archive):
call zar.Close()
Hint: To delete a non-empty folder, see here: http://www.declaresub.com/wiki/index.php/Delete_a_non-empty_folder
More detailed programming information
There are many more options to store and retrieve items in a Zip archive:
- Read and write the global (archive-wide) comment
- Read and add comments about individual items
- Read and write the data and resource forks separately, which may be helpful if you want to access the resource fork on Windows where the file system does not support this implicitly.
- Learn about the size of the uncompressed data for both the items in an existing archive and for data you plan to add to an archive. Helpful for progress bars. See the demo project for an example.
- Verify the integrity of individual items in an archive without extracting them.
- Remove or exchange entries in an archive, and compact the archive after removing items.
- Use a simple undo technique to revert changes made to an archive (see Mark and Rollback)
- Specify the encoding used in the archive for file names and comments.
To learn about all these, look up the functions in the classes (mostly ZipArchive and ZipEntry) and see the comments at the top of them. That's their documentation.
Note that the methods starting with "z_" are private functions, the others are available for general use.
If you find bugs or make enhancements to this software, please contact me about them (send me your changed version), and I'll see to incorporate them into my next release to have others benefit from it, too.
Known limitations and problems
Compatibility & interchangeability considerations
There are many different Zip archiver tools around, and many have interchange problems when it comes to these particularities:
- Non-ASCII file names. The problem is that the Zip "standard" does not provide clear rules for encoding file name. That means that if a file name uses extended letters such as ç, é or even non-Roman scripts (Japanese etc.), the Zip archive does not include the information how these letters are encoded in the archive. Originally, the DOS character sets were used, but nowadays Apple's OS X uses UTF-8, while most Windows archives rather still use local system encodings (i.e. they depend on the language the Windows system uses). Here are some details on the uses:
- Apple's Zip tool in OS X 10.4 and later (as invoked by the Finder) always writes UTF-8 in the names in the directory. When reading an archive, it apparently also always assumes the names to be in UTF-8 format.
- ZipIt 2.2.2 is a bit more versatile: It has an option in its Preferences to choose whether to use Unicode file names or the older encodings understood better by Windows zip tools. When the option enabled, it will encode the name in UTF-8 and set the lowest byte of the "external permissions" 32 bit value to 1. When reading an archive, it checks this byte: Only when it's non-zero, it interprets the file name as UTF-8, otherwise it uses an old encoding (e.g. MacRoman). The problem is now that Apple's zip does set this same byte to zero. Therefore, non-ASCII file names, when created by Apple's zip and read by ZipIt, are not properly decoded. The other direction works, however, as Apple blindly always assumes UTF-8.
- This Zip Package will from version 1.3 on, to enable best interchangeability, write the lowest byte of the "external permissions" with the value 1 in order to please ZipIt (former versions of this code did not do this) if UTF-8 encoding is chosen. When reading an archive, it will also check for the special signatures the Apple Zip tool and ZipIt write to an entry in order to detect UTF-8 encoded file names automatically.
- Resource Forks, MacBinary format. These are still used on Mac OS systems. Early on, the semi-official pkzip spec has been extended with the description of two possible ways to store resource forks in a Zip archive, one used by Info-Zip (not supported by this Zip Package) and one used by ZipIt and Stuffit (which use the MacBinary format for it and which is supported by this Zip Pacakge). Unfortunately, when Apple added Zip archive support to the Finder in OS 10.4, it invented yet another, and badly documented, alternative: It stores the Resource forks and other Mac-specific information in Appleingle/AppleDouble encoded files which are then stored in separate files (e.g. in a directory structure called "__MACOS"). The problem is that such created archives are not understood by any other Zip tools as of now (AFAIK). This Zip Package, from version 1.3 on, is at least able to decode these Apple-created archives properly, but can't create them yet. On the other hand, Apple's zip tool can't handle MacBinary encoded files. That means that if you want to create Zip archives that contain Resource Forks and that can be unarchived by Apple's Finder (e.g. by a simple double click on the zip file), you have to archive them with Apple's zip tool! If you want this Zip Package to create the new format used by Apple, you have to add this code yourself or pay me (TT) at least a few hundred dollars, as I am otherwise not eager to do it.
Further reading
The "zip archive" standard is quite a loose one. There appear to be lots of zip programs around that do not follow the PKWARE-specs very well, and while I have tried to program this software as close to the specs as I could, you may run into compatibility problems when trying to open archives from or create archives for other zip implementations.
A universal, open-sourced, reference implementation that attempts to deal with all eventualities is the so-called info-zip code base, in C language, which is also used by Apple with OS X (even though Apple appears to use quite an outdated version of it). Here is the info-zip home page:
Info-Zip home
The original PKWARE specification for Zip archives is included with this package in the "Technical docs" folder, along with the MacBinary specs.
List of changes (Version History)
v1.0, 8 Apr 2003
v1.1, 25 Apr 2003
- Added methods to get and set the "OS Made By" property (ZipEntry: OSMadeBy, SetOSMadeBy; ZipArchive: DefaultOSMadeBy, SetDefaultOSMadeBy)
- Added methods to get and set the "Text File" flag (ZipEntry: IsTextFile, SetTextFileFlag)
- Fixed "ZipArchive.Compact": It damaged archives if duplicate ("fake") entries were used.
- Adding a snapshot won't include hidden items any more. A new "ZipSnapshots.IncludeHiddenItems" property can be set to have them added again.
- Added "ZipSnapshots.RemoveSnapshot".
- Internal fix: z_isFakeEntry does not test for "hdr.Long(20)<>0" any more since this test was not always correct.
- "ZipSnapshots.ExtractFromSnapshot" has a new parameter to specify how to deal with Alias files at the destination (note that snapshots do not contain alias files): They can be skipped (the item in the archive will not be extracted if an alias exists at its destination), followed (the file where the alias points to will be replaced) or overwritten (the alias file is replaced with the file from the archive).
- Fixed storing 0-length files: They are now using the compression mode "stored", not "deflated". This prevents Stuffit Expander and other zip tools from complaining.
- ZipEntry.ExtraField() does not return nil any more for empty fields, but returns a valid ZipExtrafield object. nil is only returned in case of an error.
- Fixed a problem where extracting or verifying certain Unix-creatied and other archives would caused a "Header mismatch" error.
- The "protected" flag is now ignored when extracting an item, unless the new method "ZipEntry.EnableFileLocking" is called before extraction.
- The "FInfo.fdFlags" in MacBinary headers are now supported, too.
v1.1.1, 18 June 2003
- Fixed a bug in ZipArchive.z_readDirectory() that made reading a Zip file's comment fail.
v1.1.2, 17 July 2003
- Fixed a bug in ZipEntry.z_unzip() that flagged some compressed items as being corrupt even though they were just fine. A case of having added more safety checks than necessary (and appropriate).
v1.1.3, 29 July 2004
- This release also includes Windows versions of the plugins, so that one can compile the Zip Package with a RB IDE running on Windows, too.
- Added a new Class called ZipExtractor to facility extraction of an entire archive and other common tasks.
- Improved detection of valid Zip archives by adding a check for a Zip header at the start of the file. This allows for quick checks whether any file is a Zip archive, even if it's not using the proper file type or extension.
- Increased the minimum and preferred memory amounts of the project files because the values were definetely too low for use in Mac OS 8+9.
v.1.2, 2004-2006
- These were internal versions, not officially released
- Can now read zips with > 65535 entries, but will not modify them (they stay read-only)
- ZipEntry.Extract: added setting CreationDate to avoid Norton and other tools to complain about creation > modification date
- ZipEntry.ApplyMacBinaryInformation: Fixed setting of creation date
- Added get/set for UnixPermissions
- Does not compile with RB 4.5 any more, it requires 5.5.5 now.
- Added support for reading Apple's zip files (as created by the Finder). They need special handling because resource fork information is being stored in a new, incompatible way: Resource and Finder Info data is stored in separated "__MACOS" folders. Note that this Zip Package can read and uncompress such archives, but not create or alter them (yet). On the other hand, Apple's zip tool can not read the Resource forks as written by all the other Zip tools such as Stuffit, ZipIt, this Zip Package and others. Shame on Apple's OS X team for not observing the long-existing de-facto standards!
v1.3.1, 22 March 2007
- Updated the classes to build with RB 2006 and later.
- Uses less plugins: Only the e-CryptIt (and "#TypeLib.rbx") plugin is now needed, none of the old TT plugins.
- Tested to work on Intel-Macs
- Converted all external items to use the modern naming convention with endings such as .rbo etc. in order to make them better usable in IDEs under Windows and Linux.
2 August 2007
May 2008
- Changed the parameters of the ZipProgressNotifier.ZipProgress callback method. Make sure to read the Note "Important change notice" if you're using this in your existing programs.
v2.0.0, 18 October 2009
- Unzip is now able to deal with zip files between 2GB and 4GB in size thanks to the use of Int64 and UInt32 types. Zipping was updated accordingly, too, but has not been tested for this yet.
- Added use of the free "zlib" library to replace the Einhugur eCrypt-It plugin. Note that using Einhugur's plugin makes this code faster, though (about 25-30% speed gain, i.e. it is then about as fast as other common Zip programs, at least on Mac OS X). To use "zlib", see the Note "Plugins you will need" in the ZipArchive class.
- Fixed a OS X Intel (vs. PowerPC) related bug: The FInfo (FinderInfo) flags were incorrectly stored in
MacBinary encoded files, leading to nonsense such as a hidden file becoming visible and getting
a (colored) label.
- ZipArchive.AddFolderContents() can now store Alias files instead of dropping them.
- Fixed "EmulateOSXRenaming": Extracting a name with a ":" in it turns it into "/" or "-" instead of
dropping that character.
- Improved ZipProgressWin's performance: it does not waste excessive time updating controls all
the time any more.
- Improved handling of "abort" flag so that the reported error is always "Aborted" and not some
irrelevant follow-up error.
- Bugfix: When extracting MacBinary-encoded files that had no Creator and FileType (i.e. their bytes
were all zero), these are correctly restored now - before they turned into "????" codes.
v2.0.1, 9 February 2010
- Removed a few unused variables, made it compile in console apps again, fixed some warnings, and fixed the parameter use in ZipArchive.AddItemByStreams. Thanks to Kem Tekinay for pointing them out to me.
v2.0.2, 25 February 2010
- CalcFolderSize function: Swapped useMacBinary and aliasHandling parametes to match the
order of AddFolderContents.
- Unix Permissions are now saved to the archive by default. This obsoletes the functions
PreserveUnixPermissions and PreservingUnixPermissions.
- Symlinks are now handled and stored properly on OSX and Linux.
- The DOS "hidden" attribute is now observed on all platforms.
- New parameter in AddFolderContents for choosing to include or exclude invisible items.
- Cleaned up the project so that only those files are now external that are part of the actually sharable code, meaning you can drop the entire "TT's ZipArchiver Classes" folder into your project now without getting compile errors.
- Removed GetUnixPermissions and ApplyUnixPermissions. Use ZipFolderItem.Permissions instead.
- Removed some obsolete Mac OS 9 code, including "Internet Config".
- Fixed FileReader.Skip.
- Added bs.Flush call to FileWriter.Flush.
- New internal classes: StringWriter, SymlinkReader, ZipFolderItem.
v2.0.3, 26 February 2010
- Removed the obsolete checks for "name overflows".
- Updated the requirements (i.e. RB 2008r4 or later) in this documentation.
v2.0.4, 26 February 2010
- Fixed a bug which made it crash on Mac OS X Tiger (10.4)
v2.1, 15 October 2010
- TestWindow improved: If a single folder is dropped for compression, only contents but not the folder itself are written to the archive, just like in Mac OS X.
- Previous version did write 4 additional bytes at end of each compressed item. This has been corrected.
- A few modifications to support writing conforming epub files.
Copyrights and Acknowledgements
e-CryptIt Engine copyright Björn Eiríksson (www.einhugur.com)
e-CryptIt Engine uses zlib code, copyright © 1995-2002 Jean-Loup Gailly and Mark Adler.
Original zip format REALbasic code was written by Carsten Friehe for the Mieze program (http://carsten-friehe.de/).
RB code improved and reorganized by Thomas Tempelmann (http://www.tempel.org/rb/) for public release.
Some of the design and error messages was influenced by Java 1.1's ZipFile and related classes.
My thanks go to Leonard Rosenthol (author of Stuffit Zip support and maintainer of MacBinary format) and Tom Brown (author of ZipIt) for providing helpful information.
Terms of use
This RB code, written by Carsten Friehe and Thomas Tempelmann, is given to the Public Domain, which means you can do whatever you want with it. It is, however appreciated if you would "tip" me by sending me a few dollars for my work.
It took me more than two full weeks to develop the original version of this code for the public - I myself could have done with much less for my own needs, but I wanted to provide this as a clean and complete solution so that others won't have to deal with this not-trivial task. And in the past many years I've added a lot of new features as well. So, if you benefit from my work, please acknowledge it with a little financial support for me.
Please visit the following web address to find out how to tip me:
http://tip.tempel.org/
Enjoy!
8 April 2003, 25 February 2010
Thomas Tempelmann