Developer

Reading zip archives in Python

Zip is the name of a popular file compression algorithm, which lets you both combine multiple files into a single archive and store them on disk using less space. We'll show you how you can open and read zip files in your Python scripts.

Zip is the name of a popular file compression algorithm, which lets you both combine multiple files into a single archive and store them on disk using less space. We'll show you how you can open and read zip files in your Python scripts.

Python's "batteries included" philosophy regarding libraries is well known. What it means to you is that the functions that perform the majority of basic operations you'll need in your scripts are included in the standard library that comes with the Python distribution. Working with zip archives is no exception — functionality is included in the zipfile module.

If you're going to be working more than just casually with zip archives, you won't get too far without knowing the specifics of how files are arranged within archives. The best resource for this is the ZIP Application Note distributed by PKWARE, the company responsible for developing the compression algorithm. The application note contains a full file format specification for .zip archives.

For this article we'll be working with a zip file called "test.zip" which contains two files: "file1.txt" and "file2.txt". If you're following along, you can download the file here.

First things first, we need to load the file and create a ZipFile instance:


import zipfile

f = file("test.zip")
z = zipfile.ZipFile(f)

NB: It's important that if you modify a zip file, you always close the archive. Even though we won't be changing the file in this article, closing the archive is a good habit to be in. The following code will close a ZipFile:


z.close()

Now that we've got a ZipFile, what can we do with it? If all you need is to pull out the contents of the archive, then you can use the read method:


>>> print z.read("file1.txt")
File One Contents

"Testing, testing, one two three."

There are two classes defined by the module you must use to read zip archives. The first, ZipFile, we've already dealt with. ZipFile deals with methods relating to the archive as a whole: opening and closing the archive, reading and writing from it and the list of files contained.

We've demonstrated opening, closing and reading from archives already — we'll leave writing for another day. Let's take a look at accessing the file list. The following is a short python script that accepts a number of zip archive names as arguments and then prints the contents:


import sys, zipfile

for filename in sys.argv[1:]:
        z = zipfile.ZipFile(file(filename))
        print "%s:" % (filename)
        for f in z.namelist():
                print "\t%s" % (f)
        print ""

When we run it with our test archive as a parameter:


$ python printzip.py test.zip 
test.zip:
        file2.txt
        file1.txt

The important function here is the namelist function, which simply returns the filenames of the contents of the archives. You can combine this with read to dump the full contents of the archive to screen:


>>> for f in sorted(z.namelist()):
	print f
	print "=" * 10
	print z.read(f)	
	print ""

file1.txt
==========
File One Contents

"Testing, testing, one two three."

file2.txt
==========
File Two Contents:

"Rock and Roll."


The second class defined by the zipfile module is ZipInfo, and are used to store information about each single file contained within the archive. If you need more information about the archive's contents than just the filenames then you need to inspect the ZipInfo file.

You can get an info file for any member by using the ZipFile.getinfo(name) method, or if you want the whole list, you can use the ZipFile.infolist() method. The following program prints the name, size and last modification time of each file in the archives given as command line arguments:


import sys, zipfile, datetime

for filename in sys.argv[1:]:
        z = zipfile.ZipFile(file(filename))
        print "%s:" % (filename)
        for i in z.infolist():
                dt = datetime.datetime(*(i.date_time))
                print "%s\tSize: %sb\tCompressed: %sb\t\tModified: %s" % \
                        (i.filename, i.file_size, i.compress_size, dt.ctime())
        print ""

Then when we run the program:


$ python printzip.py test.zip 
test.zip:
file2.txt       Size: 37b       Compressed: 37b         Modified: Thu Oct 11 14:03:34 2007
file1.txt       Size: 54b       Compressed: 48b         Modified: Thu Oct 11 14:03:12 2007

In Python, working with zip files is as easy as that. Building zip functionality into your own scripts allows you to access and store resources while using reduced disk space.

Editor's Picks

Free Newsletters, In your Inbox