Web Development

Script Teez: Creating multilingual applications with PHP

If you need to develop multilingual PHP applications, you're in luck! Vincent Danen takes you on a simple tour of how he used PHP to create his own multilingual Bugzilla-like application.

For the last two months, I have been developing a Bugzilla-clone in PHP called Anthill. Anyone who has ever looked at Bugzilla, a bug-tracking database written in Perl, can imagine that this is quite an undertaking. I'm glad to say that after two months of development, it's almost ready for its premiere. In fact, by the time you read this, the first version should be out.

Now, this Script Teez is not to blow my own horn in any way. What we are going to look at here is a challenge I faced in developing Anthill, and it may very well be a challenge you will face in developing your own PHP applications.

Aside from needing Anthill to work in nearly the same fashion as Bugzilla with the majority of features in Bugzilla also available in Anthill, I wanted Anthill to be multilingual. Where Bugzilla is written only for the English-speaking crowd, I wanted people to be able to run Anthill in French, Italian, German, and any other language I could get it translated into. This became quite the challenge. Let me illustrate how I overcame it.

The first thing you need to keep in mind is that making your PHP application multilingual is a two-step process. There are two tacks you can take: You can make it multilingual right from the beginning or you can go back when the code is nearly complete and make it multilingual. I took the latter route; you'll see why. So, the first step is to do the actual code-writing, with all strings in your native language. Once this is complete, you can look at making it multilingual. Let me stress right now that this is not the easiest job, but the benefit is well worth it. I know that Anthill can be used and appreciated by people in various countries, regardless of what language they use.

Typically, when you write something to the Web page, you will use straight HTML or a print() statement like this:
<? print("Hello world!"); ?>

To make this multilingual, you need to create a language file that contains all of the strings, like the one above, that are present in your application. In our case, we might use the following to make the above an independent string:
<? print(_HWORLD); ?>

You would define _HWORLD in your language file. In English, the value of _HWORLD would be "Hello world!" while in another language, it would be the other language's equivalent.

Your language file will look something like this. Let's call it english.lng:
<? // English language file english.lng
   setlocale('LC_ALL','en_US');

   define("_HWORLD", "Hello world!");
   define("_ASTRING", "This is another string");

?>


The define() function defines a global variable. The first part is the variable name; the second part is the value of the variable. Because this is a define(), you do not need to prefix the variable name with the $ character. These are persistent strings; they never change. Likewise, you will need to define your other language files that use the same variable name, but the string itself is translated. If you keep the variables the same across language files, you can see how
<? print(_HWORLD); ?>

is more portable and allows for a multilingual string instead of just printing "Hello world!" in English.

Now you need to determine how you are going to allow the administrator of the site to pick the default language. You can do this with a few lines of PHP code nested in a form. If you have an administrator page that allows the admin to select different options, you can add one for the language and have PHP dynamically tell the admin which languages are available by using something like this.

This might seem a little convoluted, but it's actually really simple. First, we have to open the directory that we store our language files in. Let's assume you've previously defined $root to be your Web root directory and $langdir to be the directory that contains your language files, relative from the root directory (typically /var/www/html). Here, we open that directory; for instance, if $root is "/var/www/html" and $langdir is "/backend/language," we open the directory /var/www/html/backend/language. Then we process each file in the directory by reading the directory with readdir().

We then assign to the variable $tmp the last four characters of the file represented by $file, which is the current file in the directory. The next step is to strip .lng from the file, which we do using eregi_replace(). Here, we replace “.lng” with "" and assign the value to $name. This will be the name of the language to use. Of course, this assumes that you give your language files meaningful names like english.lng and spanish.lng.

Finally, we make sure that the file we're looking at is indeed an appropriate file. We dismiss out of hand the files "." and "..", which are directories. We also make sure that the extension of the file is “.lng”, which should be the last four characters we stripped when we assigned $tmp. By using strtolower(), the language files can be in upper, lower, or mixed case and will still match our expression.

If we match those criteria, we have a legitimate language file. We then assign to the variable $foo the word selected to show the currently selected language in our list. We do this by retrieving the value of $config['lang'], which is a part of an array we could have used to retrieve configuration information from a SQL database.

Then we print the option to the Web page. For instance, if the current file were english.lng, the resulting HTML code shown to the browser would be:
<option value="english" >english</option>

If $config['lang'] contained the string “English," indicating that the current language in use on the site is English, the code shown would instead be:
<option value="english" selected>english</option>

Lastly, we close the directory we opened by calling closedir().

Now, to include the language file in the code, we have to retrieve the selected language from MySQL and return it in a variable. Let's take the above example and assume that $config['lang'] contains the currently selected language, English. At the start of every page, you will need to retrieve that data from the SQL database and include the appropriate language file or your pages will have no meaning whatsoever. This would be the code to actually include the dependent language file.

Again, we have previously defined $root and $langdir. We retrieve the language from the database and assign it to $config['lang'] and append ".lng" to the end because, if you look at the above form segment, the value of our <select> statement is the language without the extension. Translated, the above would look like this:
$langfile = "/var/www/html/backend/language/english.lng"

which is the language file to use.

This is quite an undertaking, but the end results are well worth the effort. The only serious drawback is that once you have converted all strings in your files to use define statements, your PHP code will be somewhat difficult to read. Reading code like this:
print("<p><font color=\"red\">" . _ENOBUG . "</font></p>");

can be a little difficult after a while, especially if you have a lot of strings. As a point of reference, the english.lng file in Anthill is over 10 KB, but then the entire application is over 300 KB of code.

Conclusion
This is a very quick look at how to make your site multilingual. If you want to take a real-world example, download the source code for Anthill and see how I have implemented it in full.

The only other problem you may encounter is getting people to translate your language file, but I'll let you deal with that one.

About

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

0 comments

Editor's Picks