After Hours

Reading ID3 tags with Perl's MP3::Tag module

So you have a huge (and hopefully legal) MP3 collection filling a few terabytes on your hard drive. All you need now is an index and a Web page to put it on. We'll show you how to use Perl's MP3::Tag module to extract track information from the ID3 tag encoded into each MP3 track.

So you've spent the last so many years building your (hopefully legal) MP3 collection, and have a few terabytes of music on your hard drive. All you need now is an index and, if you're a show-off, a Web page to put it on. No need to manually catalog each and every file you own. Instead, you can just reach for Perl's MP3::Tag module, which saves you time by automatically retrieving track information from the ID3 tag encoded into each MP3 track.

MP3::Tag is one of many Perl ID3-tag parsers available on CPAN; it's a little more full-featured than most, which is why I chose it for this tutorial. Written entirely in Perl, MP3::Tag can read the older ID3v1 tags as well as the newer ID3v2 tags, and also supports parsing the MP3 filename for track/title information. In addition to reading tags, MP3::Tag also can edit the content of an MP3 file tag or even create a new tag altogether.

MP3::Tag is licensed under the Artistic License and is maintained by Thomas Geffert. Detailed installation instructions are provided in the download archive, but by far the simplest way to install it is to use the CPAN shell:

shell> perl -MCPAN -e shell
cpan> install MP3::Tag

If you use the CPAN shell, dependencies will be automatically downloaded for you (unless you told the shell not to download dependent modules). This tutorial uses version 0.92 of MP3::Tag.

How ID3 tagging works

An ID3 tag is a field containing bibliographical information (like title, artist, genre, year of release, and album) about an MP3 audio track, usually embedded within the MP3 file itself. An MP3 player or cataloging application can scan an MP3 file for this tag and use the information inside it to automatically display the name of the artist and track while it is playing.

The first version of the ID3 standard, called ID3v1, used a fixed-length field of 128 bytes at the end of the MP3 file to store this information. This field, marked with the string TAG, typically contained the track title, the artist name, the originating album, the year of release, a comment, and the genre of audio.

ID3v1 was followed by ID3v2, a so-called informal standard that most current MP3 players are able to recognize. This newer version did away with the fixed-length limitations of the earlier specification, allowing many more attributes to be stored and also permitting longer field values. For more efficient use, ID3v2 tags usually appear at the beginning of the MP3 file instead of at the end.

The MP3::Tag Perl module can read and write both of these types of tags, making it easy for you to automatically build an HTML or text catalog of your digital audio collection, or to create an application to edit and manipulate the ID3 tags inside your MP3 files. Let's see how.

Reading ID3v1 tags

Consider the simple example in Listing A, which uses MP3::Tag to retrieve the ID3 tag information from an MP3 file. In this script, a new MP3::Tag object is instantiated by passing the MP3 filename to the object constructor. The object's get_tags() method is then used to scan the MP3 file and identify which tags are present and whether they are ID3v1 or ID3v2 tags.

If an ID3v1 tag exists, an $mp3->{ID3v1} object will be created. This object exposes properties for the artist, title, album, year, and genre encoded into the ID3 tag, and the corresponding values can be accessed using standard object->property notation.

Listing B is a sample of the output of the script in Listing A.

Reading ID3v2 tags

You can also use MP3::Tag to read ID3v2 tags. If an ID3v2 tag exists, the $mp3->{ID3v2} object will be created, and you can use this object to extract the relevant track information. Listing C contains an example script.

Version ID3v2 of the ID3 specification consists of a header and multiple "frames." These frames are nothing but pieces of data, which together provide detailed information about the audio track. To extract these frames from an ID3v2 tag, you must first use MP3::Tag's get_frame_ids() method to get a list of all available frames and then iterate over the collection with the get_frame() method to retrieve the content of each frame. The get_frame() method returns a key-value pair, which may either be the frame name and its value, or (for more complex frames) the frame name and a reference to a hash that contains more detailed information.

Listing D shows an example of the output.

Now that you've got the hang of reading, let's look at writing and exporting MP3 tag info.

Writing ID3 tags

Now that you've got the hang of reading ID3 tags, you can just as easily write new information to the ID3 tag with the MP3::Tag module. All you need to do is set new values for the various attributes, and then call the write_tag() method. This is illustrated in Listing E. Or, alternatively, do it all in one stroke with the all() method, as in Listing F.

Note: You can do this with ID3v2 tags as well. Take a look at the documentation for more information and examples.

Creating more informative playlists

How about using all this ID3 tag information for something practical? Let's assume that you have a playlist (maybe for an online radio station) containing a list of MP3 files in the commonly used M3U format, and you'd like to publish a Web page containing detailed track information for your listeners. All you need to do is have the MP3::Tag parse each of the files in the playlist, extract the relevant ID3 information, and build an HTML page from it. That's exactly what the script in Listing G does.

Here, Perl's file functions read the contents of the playlist file into an array, and a foreach() loop iterates over the array and extracts ID3 information from it. This information is then incorporated into an HTML table, which can be saved to a file and published to the Web. Figure A shows what this might look like:

Figure A

An HTML playlist extracted from a group of MP3 files

Creating an MP3 catalog

The last script we'll look at, Listing H, searches for MP3 files in one or more user-specified directories and extracts the ID3 information embedded inside them to create an HTML index containing file names, track names, artists, and genre information.

This script combines the very powerful File::Find module with the MP3::Tag module to search a list of directories and scan MP3 files in them for ID3 information. The main workhorse of the script is the find() method, which works like the UNIX find program and builds a file list by scanning the directories named in the @dirs array. Every time a file is found, the displayMP3Info() user-defined subroutine is invoked. This subroutine checks if the file has an .mp3 extension and, if so, scans it for an ID3v1 tag.

The information retrieved from the tag is then used to build an HTML table, which can be displayed on a Web page. A counter keeps track of how many MP3 files have been processed, and displays a summary at the end of script processing.

Important: Remember that the user whom the script runs as must have permission to enter and read the directories named in @dirs, or else the script will not function correctly.

Since the script in Listing H prints its output to STDOUT, it's a good idea to redirect the output to an HTML file, as in the sample invocation below:

$ ./ > catalog.html

The resulting file might look something like Figure B:

Figure B

Another HTML playlist extracted from a group of MP3 files

Editor's Picks

Free Newsletters, In your Inbox