Data Management

PostgreSQL database server: Does slow and steady win the race?

If you're more concerned with features and stability than performance in your database, PostgreSQL is for you. This rich, stable platform packs a slower but more powerful punch than MySQL. Find out how to put PostgreSQL to use.


When it comes to open source and freely available database servers, most people will point you to MySQL. MySQL is one of the most popular open source SQL database systems available for a variety of platforms, including Linux, Windows, and a number of *NIX variants. It's a fast and lightweight database server used as the back end for a lot of dynamic Web sites.

However, another contender in the open source database derby is PostgreSQL, which may not be as light on its feet as MySQL but offers more stability and a richer feature set. In this Daily Drill Down, I will first look at the differences between PostgreSQL and MySQL. Then, I'll show you how to build, configure, and use PostgreSQL.

How does PostgreSQL stack up against MySQL?
PostgreSQL is a more complex database server than MySQL, which results in some fairly radical differences between them. While the two are becoming more similar, PostgreSQL still has an edge in some respects. For one thing, PostgreSQL is a little hardier than MySQL and can handle about three times the load. This makes PostgreSQL a favorite for extremely high-traffic Web sites.

PostgreSQL also provides a richer feature set than MySQL and includes many powerful capabilities that may be required by the truly determined and hard-core developer. For instance, PostgreSQL has stronger transaction support than MySQL does (although this could be changing shortly). It also handles certain errors better. For instance, if you're doing an update/insert/delete sequence, and one of the sequences fails, PostgreSQL can roll back the entire operation and provide an error, leaving your database in an untainted state. With MySQL, this is not the case. It will not roll back and can leave your database in a haphazard situation that can be corrected only with additional programming checks or manual work.

However, the stability and advanced feature set do mean a cost in terms of performance. If you don't need the advanced features PostgreSQL provides or the stability and scalability it has over MySQL, you may prefer to stick with the fast and easy MySQL.

In a previous Daily Drill Down, I explained how to put MySQL to work. Now it's PostgreSQL's turn. Let's look at the steps involved in installing and configuring it.

Building PostgreSQL
We'll start by downloading the current version of PostgreSQL—which, as of this writing, is 7.1.2. You can obtain a listing of mirror sites from which to download it. The file you will be downloading is postgresql-7.1.2.tar.gz. Save this file to your /usr/local/src directory. It's a rather large download, around 8 MB, so when you're finished, change to your /usr/local/src directory and unarchive it as root using:
# cd /usr/local/src
# tar xvzf postgresql-7.1.2.tar.gz
# cd postgresql-7.1.2

Before beginning to build PostgreSQL, make sure that you meet the requirements. You will need about 30 MB for the source tree when compiling and about 5 MB for the final installation directory. An empty database takes roughly 1 MB of space, so be sure that you have plenty of room for your PostgreSQL databases.

The next step is to run the configure script. PostgreSQL installs into /usr/local/pgsql by default, so if you want to change this, you will need to use the —prefix option. For instance, if you wanted to install PostgreSQL into your main directory tree, you would use:
# ./configure —prefix=/usr

We'll install it into /usr/local/pgsql, however, just to keep your system clean and make it easy to remove if you need to. Just use the following configure command:
# ./configure —with-syslog —with-perl —with-python

This tells configure to build with syslog support so you can tell PostgreSQL to log to syslog at run time. We also tell configure to include Perl and Python support by building the perl and python interface modules. Of course, you must have Perl and Python both installed, and you must be the root user to install the interface modules.

The next step is to run the make command. PostgreSQL uses GNU make, so you can either type gmake or the more traditional make command next:
# make

Once this has completed, which can take anywhere from five minutes to half an hour, depending on the speed of your computer, you should run the regression tests. This verifies that PostgreSQL runs on your system the way it was meant to run. However, before you can do this, you must create the user postgres on your system. Using postgres, you'll be able to run the PostgreSQL server as an unprivileged user, which is more secure. To add the user, simply type:
# adduser postgres

Now that the user is available, you can run the regression tests by executing:
# su - postgres
$ cd /usr/local/src/postgresql-7.1.2
$ make check
$ exit

You may run into a few errors, but that's okay. I performed 76 tests on my system, and only one failed. If you have many more failures than that, I would try to rebuild PostgreSQL, but if you have one or two failures, you should be fine. The regression tests will warn you if an error is fatal. If all goes well, you should be able to install PostgreSQL now. To do this, issue another simple make command, once again as the root user, on the command line:
# make install

Once you install PostgreSQL, you need to configure it before it is ready for use.

Configuring PostgreSQL
The first thing you need to do is initialize the database. If you didn't install PostgreSQL into /usr/local/pgsql, replace the paths below with the appropriate ones. You will need to execute these commands as the postgres user, so su to that user and then execute the following commands:
# mkdir /usr/local/pgsql/data
# chown postgres /usr/local/pgsql/data
# su - postgres
$ cd /usr/local/pgsql/bin
$ ./initdb -D /usr/local/pgsql/data
$ ./postmaster -D /usr/local/pgsql/data >logfile 2>&1 &
$ ./createdb test
$ ./psql test

The first step here is to create the directory /usr/local/pgsql/data, which will contain all of your databases. Now, as root, change the ownership of that directory to the postgres user. You don't need to have your databases here, however, so if you prefer them in a more proper location like /var/lib, feel free to change /usr/local/pgsql/data to /var/lib/pgsql. Just must make sure that the postgres user has read and write access to that directory.

Next, su to the postgres user and run the initdb program in the /usr/local/pgsql/bin directory. The -D parameter tells initdb the directory where your databases are to be stored.

Now you can start postmaster, which is the actual PostgreSQL database server. Here we have done a quick and simple method of starting it. Using this command, postmaster will listen only to UNIX domain sockets and not TCP sockets. If you want postmaster to accept connections via TCP sockets (which is necessary if you want to use PostgreSQL with PHP), you should start postmaster using:
$ ./postmaster -i -D /usr/local/pgsql/data >logfile 2>&1 &

Make sure that logfile in this case can be written to by the postgres user. Postmaster will not run if it can't write to the file. You may want to use /home/postgres/logfile or any other file in a directory that the postgres user has permission to write in, as the log file.

Finally, when the server is running, use the createdb program to create the database test. Then, use psql, the command-line PostgreSQL client, to connect to the server and open test.

Now you will be able to play around with the PostgreSQL server. Being a SQL server, PostgreSQL understands regular SQL commands and syntax, so you can use regular SQL to manipulate your tables and databases.

Here are some other useful commands to know when using the psql command-line client:

\h Help with SQL commands
\? Help with internal slash commands
\q Quit client

If you want to stop the postmaster service, execute:
# kill -INT `cat /usr/local/pgsql/data/postmaster.pid`

PostgreSQL access controls
PostgreSQL offers some nice access control functions. In the data directory, /usr/local/pgsql/data for this installation, a file called pg_hba.conf exists. This file is the host access control file for PostgreSQL and will allow you to determine what hosts will be able to access the server for TCP/IP connections.

By default, PostgreSQL will allow only localhost access, which means that TCP/IP connections originating from the local machine will be allowed. If your database is on a system that's remote from your Web server, for instance, and you want your PHP scripts to access it, you'll need to modify this file. By default, local connections via UNIX domain sockets and TCP/IP sockets are allowed with the configuration shown in Table A.

Table A
Local all     trust
Host all 127.0.0.1 255.255.255.255 trust

Unfortunately, this is a very liberal configuration. The first command allows UNIX domain socket connections to all databases with no authentication. The second allows TCP/IP socket connections to all databases from the IP address and netmask 127.0.0.1/255.255.255.255 with no authentication. This can be a bad thing in a multiuser environment. A better example might be the one shown in Table B.

Table B
Local all     password passwd
Host all 127.0.0.1 255.255.255.255 password passwd

This tells PostgreSQL to authenticate users by their password. The passwords are matched against the file /usr/local/pgsql/data/passwd, which is created using the pg_passwd program. See below for more on this. Now, if you want to allow connections from the IP address 192.168.1.20, you might use the configuration shown in Table C.

Table C
Host all 192.168.1.20 255.255.255.255 password passwd

The pg_hba.conf file has some detailed comments on the different authentication methods you can use. You can use password for cleartext passwords, ident for matching user ident responses, or Kerberos validation if you compiled with Kerberos support. You can even force clients to connect using SSL encryption if you have the OpenSSL libraries installed and have compiled PostgreSQL with SSL support. (See the configure command for more information on how to enable this.)

Using passwords for your clients is a good idea, and PostgreSQL makes it pretty easy. It uses encrypted passwords in the same manner as /etc/passwd and /etc/shadow. However, the passwords are stored in the /usr/local/pgsql/data/passwd file and can be manipulated with the pg_passwd program. To create users who can access the database, you will need to use the createuser program, like this:
$ createuser vdanen
Shall the new user be allowed to create database? (y/n) n
Shall the new user be allowed to create more new users? (y/n) n
CREATE USER

Now you need to create a password for that user by using the pg_passwd program, like this:
$ cd /usr/local/pgsql/data
$ ../bin/pg_passwd passwd
Username: vdanen
Password:
Re-enter password:

Here, I enter the username vdanen and type in my password. The password file you specify to pg_passwd is necessary because you can name your password file or files anything you like. If you use the file passwd here, when using password matching, you will need to use the same file in your pg_hba.conf. For instance, if you wanted to have a password file called pg_passwd, you'd need pg_hba.conf to use what's shown in Table D.

Table D
Host all 127.0.0.1 255.255.255.255 password pg_passwd

You would also have to make sure that you passed the filename pg_passwd to the pg_passwd program as the parameter. Now if I want to connect to the server, I use:
$ psql -U vdanen test

This will connect me to the local PostgreSQL server using UNIX domain sockets. It also specifies the database to connect to, in this case the test database. Whenever you connect to the server, you must specify the database to use. If you don't, the server will assume you want to access a database of the same name as your username. To use TCP/IP, from another machine I would issue:
$ psql -U vdanen -h sqlserver test

where sqlserver is the hostname of the system running PostgreSQL. Of course, you have to make sure that the host you are connecting from is permitted by the access control methods previously discussed.

If you want to use the crypt authentication method, you will need to create the table pg_shadow in the database. This is what the encrypted passwords are matched against. The passwords are stored unencrypted in the database, so it's easy to manipulate. As the postgres user, log in to PostgreSQL and type:
test=# SELECT * FROM pg_shadow;

You should see a list of users available on the database server. By default, you should see postgres and the defined user. You'll want to update the database passwd column to the password for the user by issuing:
test=# UPDATE pg_shadow SET passwd='secret' WHERE usename='vdanen';

This is standard SQL syntax and basically tells PostgreSQL to update the database pg_shadow and set the passwd column entry to secret on the row where the column usename contains the value vdanen. If you were to again issue the SELECT command, you would see the password. Don't be alarmed. Only the postgres superuser can see this table, so other users won't be able to obtain the passwords for anyone else on the system. That table is completely restricted, and users won't even be able to view their own information from it. If users want to see their own information, they can do so by viewing the table pg_users, which contains all of the same fields, except that the passwd column will always be filled with asterisks so they won't know if the user has a password set or what it might be.

Conclusion
As you can see, PostgreSQL is not a difficult database server to set up. It requires a few more steps than MySQL, and some of the administrative tools aren't quite as intuitive, but you can see the potential and power here. PostgreSQL is a stable and secure database server that would work very well for large applications needing advanced functionality and for sites expecting a large number of hits that will increase connections to the database.

It would take a little getting used to, but you could even use PostgreSQL alongside MySQL, as they listen to different TCP/IP ports. This scheme would allow you to take advantage of the strengths and avoid the weaknesses of each database server. Of course, with every passing release, the two products are becoming more alike. As time goes on, PostgreSQL will improve its speed and MySQL will improve its scalability, making the choice even tougher, at least for those new to database servers.

About Vincent Danen

Vincent Danen works on the Red Hat Security Response Team and lives in Canada. He has been writing about and developing on Linux for over 10 years and is a veteran Mac user.

Editor's Picks