Zip is the name of a popular file compression algorithm, which lets you both combine multiple files into a single archive and store them on disk using less space. We’ll show you how you can open and read zip files in your Python scripts.
Python’s “batteries included” philosophy regarding libraries is well known. What it means to you is that the functions that perform the majority of basic operations you’ll need in your scripts are included in the standard library that comes with the Python distribution. Working with zip archives is no exception — functionality is included in the
If you’re going to be working more than just casually with zip archives, you won’t get too far without knowing the specifics of how files are arranged within archives. The best resource for this is the ZIP Application Note distributed by PKWARE, the company responsible for developing the compression algorithm. The application note contains a full file format specification for .zip archives.
For this article we’ll be working with a zip file called “test.zip” which contains two files: “file1.txt” and “file2.txt”. If you’re following along, you can download the file here.
First things first, we need to load the file and create a ZipFile instance:
import zipfile f = file("test.zip") z = zipfile.ZipFile(f)
NB: It’s important that if you modify a zip file, you always close the archive. Even though we won’t be changing the file in this article, closing the archive is a good habit to be in. The following code will close a ZipFile:
Now that we’ve got a ZipFile, what can we do with it? If all you need is to pull out the contents of the archive, then you can use the
>>> print z.read("file1.txt") File One Contents "Testing, testing, one two three."
There are two classes defined by the module you must use to read zip archives. The first, ZipFile, we’ve already dealt with. ZipFile deals with methods relating to the archive as a whole: opening and closing the archive, reading and writing from it and the list of files contained.
We’ve demonstrated opening, closing and reading from archives already — we’ll leave writing for another day. Let’s take a look at accessing the file list. The following is a short python script that accepts a number of zip archive names as arguments and then prints the contents:
import sys, zipfile for filename in sys.argv[1:]: z = zipfile.ZipFile(file(filename)) print "%s:" % (filename) for f in z.namelist(): print "\t%s" % (f) print ""
When we run it with our test archive as a parameter:
$ python printzip.py test.zip test.zip: file2.txt file1.txt
The important function here is the namelist function, which simply returns the filenames of the contents of the archives. You can combine this with read to dump the full contents of the archive to screen:
>>> for f in sorted(z.namelist()): print f print "=" * 10 print z.read(f) print "" file1.txt ========== File One Contents "Testing, testing, one two three." file2.txt ========== File Two Contents: "Rock and Roll."
The second class defined by the zipfile module is ZipInfo, and are used to store information about each single file contained within the archive. If you need more information about the archive’s contents than just the filenames then you need to inspect the ZipInfo file.
You can get an info file for any member by using the ZipFile.getinfo(name) method, or if you want the whole list, you can use the ZipFile.infolist() method. The following program prints the name, size and last modification time of each file in the archives given as command line arguments:
import sys, zipfile, datetime for filename in sys.argv[1:]: z = zipfile.ZipFile(file(filename)) print "%s:" % (filename) for i in z.infolist(): dt = datetime.datetime(*(i.date_time)) print "%s\tSize: %sb\tCompressed: %sb\t\tModified: %s" % \ (i.filename, i.file_size, i.compress_size, dt.ctime()) print ""
Then when we run the program:
$ python printzip.py test.zip test.zip: file2.txt Size: 37b Compressed: 37b Modified: Thu Oct 11 14:03:34 2007 file1.txt Size: 54b Compressed: 48b Modified: Thu Oct 11 14:03:12 2007
In Python, working with zip files is as easy as that. Building zip functionality into your own scripts allows you to access and store resources while using reduced disk space.