Recently, an old friend of mine rang me up to ask for help.
He’d been working as a journalist for many years, and had recently received
reprint rights to a number of his earlier columns. He was eager to publish his
past work on the Web; however, his columns were all saved as plain-text files
and he had neither the time nor the inclination to learn HTML and convert them
to Web pages. Since I was the only geek in his phone book, he’d called me to
see if I could help him.

“Let me take care of it”, I said. “Call me
back in an hour”, I said. And sure enough, when he called back a couple of
hours later, I had a solution waiting for him. It involved a little bit of PHP,
and it earned me his eternal thanks and a crate of wine.

So what did I do in that hour? That’s where this article
comes in. I’m going to show you how you can use PHP
to quickly transform plain ASCII text into perfectly readable HTML markup.

To begin, let’s look at an example of one of the raw text
files my friend wanted to convert:

Green for Mars!
John R. Doe

The idea of little green men from Mars, long a staple of science fiction, may soon turn out to be less fantasy and more fact.

Recent samples sent by the latest Mars exploration team indicate a high presence of chlorophyll in the atmosphere. Chlorophyll, you will recall, is what makes plants green. It’s quite likely, therefore, that organisms on Mars will have, through continued exposure to the green stuff, developed a greenish tinge on their outer exoskeleton.

An interview with Dr. Rushel Bunter, the head of ASDA’s Mars Colonization Project blah blah…

What does this mean for you? Well, it means blah blahblah…

Track follow-ups to this story online at http://www.mars-connect.dom/. To see pictures of the latest samples, log on to http://www.asdamcp.dom/galleries/220/

Fairly standard text: it has a title (or “slug”),
a byline, and many paragraphs of text. All that’s really needed to transform
this document into HTML is to use HTML line and paragraph break markers to
preserve the original layout on a Web page. Special punctuation characters need
to be converted into their HTML equivalents, and hyperlinks need to be made

Here’s the PHP code (Listing
) to accomplish all of the above:

Listing A

// set source file name and path
$source = “toi200686.txt”;

// read raw text as array
$raw = file($source) or die(“Cannot read file”);

// retrieve first and second lines (title and author)
$slug = array_shift($raw);
$byline = array_shift($raw);

// join remaining data into string
$data = join(”, $raw);

// replace special characters with HTML entities
// replace line breaks with <br />
$html = nl2br(htmlspecialchars($data));

// replace multiple spaces with single spaces
$html = preg_replace(‘/\s\s+/’, ‘ ‘, $html);

// replace URLs with <a href…> elements
$html = preg_replace(‘/\s(\w+:\/\/)(\S+)/’, ‘ <a href=”\\1\\2″ target=”_blank”>\\1\\2</a>’, $html);

// start building output page
// add page header
$output =<<< HEADER
.slug {font-size: 15pt; font-weight: bold}
.byline { font-style: italic }

// add page content
$output .= “<div class=’slug’>$slug</div>”;
$output .= “<div class=’byline’>By $byline</div><p />”;
$output .= “<div>$html</div>”;

// add page footer
$output .=<<< FOOTER

// display in browser
echo $output;


// write output to a new .html file
file_put_contents(basename($source, substr($source, strpos($source, ‘.’))) . “.html”, $output) or die(“Cannot write file”);

Let’s see how this works:

  1. The first step is to read the raw
    ASCII file into a PHP array. This is easily accomplished with the file() function, which turns every
    line of the file into an element of a numerically-indexed array.
  2. Next, the title and author lines
    (I assume these are the first two lines of the file) are extracted from
    the array into separate variables using the array_shift()
    function. The remaining members of the array are then concatenated into a
    single string. This string will now contain the entire body of the
  3. Special characters like ‘, < and > within
    the body are converted into their HTML equivalents using the htmlspecialchars() function. To preserve the
    original formatting of the article, line and paragraph breaks are
    converted into HTML <br /> elements
    with the nl2br()
    function. Multiple spaces within the article body are compressed into a
    single space using simple string replacement.
  4. URLs within the body are detected
    using regular expressions, and are surrounded by <a href=…></a>
    elements. This turns the URLs into clickable hyperlinks when the page is
    viewed in a Web browser.
  5. The output HTML page is then
    constructed using standard HTML rules. The article title, author and body
    are formatted using CSS style rules. Although this script doesn’t do it,
    this is the point at which you would customize the appearance of the final
    page, perhaps by adding graphical elements, colors or other whiz-bangs to
    the template.
  6. Once the HTML page has been
    constructed, it can be sent to the browser or saved to a static file with file_put_contents(). Note that when saving, the
    original file name is decomposed and a new file (named filename.html) is
    created for the newly-minted Web page. You can then publish this Web page
    to a Web server, save it to a CD-ROM or edit it further.

Note: When using
this script to create and save HTML files to disk, ensure that the script has
write privileges on the directory to which the files are being saved.

As you can see, assuming you have ASCII plain-text data
files in a standard format, you can convert them fairly quickly into usable Web
pages with PHP. And if you have an existing Web site into which you plan to
inject your new Web pages, it’s also quite easy to tweak the template used by
the page generator to match the look and feel of your existing Web site. So go
on, try it out for yourself!