So you’ve spent the last so many years building your
(hopefully legal) MP3 collection, and have a few terabytes of music on your
hard drive. All you need now is an index and, if you’re a show-off, a Web page
to put it on. No need to manually catalog each and every file you own. Instead,
you can just reach for Perl’s MP3::Tag module,
which saves you time by automatically retrieving track information from the ID3
tag encoded into each MP3 track.

MP3::Tag is one of many Perl ID3-tag parsers available on
CPAN; it’s a little more full-featured than most, which is why I chose it for
this tutorial. Written entirely in Perl, MP3::Tag can read the older ID3v1 tags
as well as the newer ID3v2 tags, and also supports parsing the MP3 filename
for track/title information. In addition to reading tags, MP3::Tag also can
edit the content of an MP3 file tag or even create a new tag altogether.

MP3::Tag is licensed under the Artistic License and is
maintained by Thomas Geffert. Detailed installation instructions are provided
in the download
archive
, but by far the simplest way to install it is to use the CPAN shell:

shell> perl -MCPAN -e shell
cpan> install MP3::Tag

If you use the CPAN shell, dependencies will be
automatically downloaded for you (unless you told the shell not to download
dependent modules). This tutorial uses version 0.92 of MP3::Tag.

How ID3 tagging works

An ID3 tag is a field containing bibliographical information
(like title, artist, genre, year of release, and album) about an MP3 audio
track, usually embedded within the MP3 file itself. An MP3 player or cataloging
application can scan an MP3 file for this tag and use the information inside
it to automatically display the name of the artist and track while it is
playing.

The first version of the ID3 standard, called ID3v1, used a fixed-length
field of 128 bytes at the end of the MP3 file to store this information. This
field, marked with the string TAG,
typically contained the track title, the artist name, the originating album,
the year of release, a comment, and the genre of audio.

ID3v1 was followed by ID3v2, a
so-called informal standard that most current MP3 players are able
to recognize. This newer version did away with the fixed-length limitations of
the earlier specification, allowing many more attributes to be stored and also
permitting longer field values. For more efficient use, ID3v2 tags usually
appear at the beginning of the MP3 file instead of at the end.

The MP3::Tag Perl module can read and write both of these types
of tags, making it easy for you to automatically build an HTML or text catalog
of your digital audio collection, or to create an application to edit and
manipulate the ID3 tags inside your MP3 files. Let’s see how.

Reading ID3v1 tags

Consider the simple example in Listing A, which uses MP3::Tag to
retrieve the ID3 tag information from an MP3 file. In this script, a new
MP3::Tag object is instantiated by passing the MP3 filename to the object
constructor. The object’s get_tags() method is then used to scan the MP3 file
and identify which tags are present and whether they are ID3v1 or ID3v2 tags.

If an ID3v1 tag exists, an $mp3->{ID3v1} object will be
created. This object exposes properties for the artist, title, album, year, and
genre encoded into the ID3 tag, and the corresponding values can be accessed
using standard object->property notation.

Listing B
is a sample of the output of the script in Listing A.

Reading ID3v2 tags

You can also use MP3::Tag to read ID3v2 tags. If an ID3v2
tag exists, the $mp3->{ID3v2} object will be created, and you can use this
object to extract the relevant track information.
Listing C contains an example script.

Version ID3v2 of the ID3 specification consists of a header
and multiple “frames.” These frames are nothing but pieces of data,
which together provide detailed information about the audio track. To extract
these frames from an ID3v2 tag, you must first use MP3::Tag’s get_frame_ids()
method to get a list of all available frames and then iterate over the
collection with the get_frame() method to retrieve the content of each frame.
The get_frame() method returns a key-value pair, which may either be the frame
name and its value, or (for more complex frames) the frame name and a reference
to a hash that contains more detailed information.

Listing D
shows an example of the output.

Now that you’ve got the hang of reading, let’s look at writing and exporting MP3 tag info.

Writing ID3 tags

Now that you’ve got the hang of reading ID3 tags, you can just as
easily write new information to the ID3 tag with the MP3::Tag module. All you
need to do is set new values for the various attributes, and then call the
write_tag() method. This is illustrated in Listing E. Or, alternatively, do it all in
one stroke with the all() method, as in Listing F.

Note: You can do this with ID3v2 tags as well. Take a look
at the documentation
for more information and examples.

Creating more informative playlists

How about using all this ID3 tag information for something
practical? Let’s assume that you have a playlist (maybe for an online radio
station) containing a list of MP3 files in the commonly used M3U format, and
you’d like to publish a Web page containing detailed track information for your
listeners. All you need to do is have the MP3::Tag parse each of the files in
the playlist, extract the relevant ID3 information, and build an HTML page from
it. That’s exactly what the script in Listing G does.

Here, Perl’s file functions read the contents of the
playlist file into an array, and a foreach() loop iterates over the array and
extracts ID3 information from it. This information is then incorporated into an
HTML table, which can be saved to a file and published to the Web. Figure A shows what this might look
like:

Figure A

An HTML playlist extracted from a group of MP3 files

Creating an MP3 catalog

The last script we’ll look at, Listing H, searches for MP3 files in one
or more user-specified directories and extracts the ID3 information embedded
inside them to create an HTML index containing file names, track names, artists,
and genre information.

This script combines the very powerful File::Find
module with the MP3::Tag module to search a list of directories and scan MP3
files in them for ID3 information. The main workhorse of the script is the
find() method, which works like the UNIX find program and builds a file list by
scanning the directories named in the @dirs array. Every time a file is found,
the displayMP3Info() user-defined subroutine is invoked. This subroutine checks
if the file has an .mp3 extension and, if so, scans it for an ID3v1 tag.

The information retrieved from the tag is then used to build
an HTML table, which can be displayed on a Web page. A counter keeps track of
how many MP3 files have been processed, and displays a summary at the end of
script processing.

Important:
Remember that the user whom the script runs as must have permission to enter
and read the directories named in @dirs, or else the script will not function
correctly.

Since the script in Listing H prints its output to STDOUT,
it’s a good idea to redirect the output to an HTML file, as in the sample
invocation below:

$ ./script.pl > catalog.html

The resulting file might look something like Figure B:

Figure B

Another HTML playlist extracted from a group of MP3 files