If you’ve heard that Linux comes up short in the database department, think again—Linux may very well be the best available platform for serving small to intermediate-sized databases on corporate intranets and the Web. In part 1 of this series of Daily Drill Downs, I’ll attempt to demystify Linux database servers. Later, you’ll learn how easy it is to get database servers up and running—including industrial-strength products such as Sybase ASE.
In this Daily Drill Down and those that will follow, I’ll let you in on some of the best-kept secrets about Linux: First, mere mortals such as you and I can indeed install and configure Linux database servers—and what’s more, it’s easy. Second, Linux is far and away the platform of choice for serving all but the most humongous databases; you’ll learn that there’s a fantastic selection of freeware and open source database servers available. With a modicum of effort, you can put together a Linux-based database server that will rival the performance of commercial systems that could cost tens or even hundreds of thousands of dollars—and you can do it for free.
Trust me, you don’t have to be one of those cone-headed, super-brilliant human beings from the far future to understand what database servers are all about and to get one up and running on Linux. If you’re savvy enough to compile your own software from source code or to get TrueType fonts working with StarOffice, you’ll have no trouble with database servers.
To be sure, database servers are complex programs, and you’ll need some specialized training and expertise if you want to take their capabilities to the max. What’s more, you’ll need Web server expertise to make data available to users—but that’s becoming increasingly easy to do. For most organizations, about 99 percent of the payoff comes from databases that exploit just one percent of the specialized knowledge that database experts possess.
In this Daily Drill Down, you’ll learn some of the fundamentals, beginning with the myth that the lack of database software is one of the things that’s holding Linux back. In fact, as you’ll learn, Linux is the platform of choice for database development—in large measure because so many incredibly powerful but free database servers are available for Linux. Here, I’ll introduce the basic concepts of database servers and survey the available software. As this series continues, you’ll learn how to get the leading packages installed, configured, and running on your Linux system.
Database programs and database servers
Chances are you’ve used a database application such as Microsoft Access, Lotus Approach, or Claris FileMaker Pro. These are easy-to-use programs that enable users to develop their own databases in a friendly, graphical environment. You may have heard that Linux comes up short in the database department because, in general, such programs aren’t available for Linux. Sure, StarOffice has a database module, but it’s the weakest part of the suite, and even StarOffice backers concede that it’s woefully short of its Windows-based competition. If all this is true, then what’s so great about Linux as a database platform?
Here’s the answer, and it’s simple. Programs such as Access, Approach, and FileMaker Pro amount to legacy technology. From the user’s perspective, they’re big, complex programs that require users to go through a steep learning curve. Furthermore, users who create their own Access databases on their PCs will accomplish little more than lock up their data in a format that is difficult to exchange with other users. Of course, programmers can customize these products so that users can access centrally served data, but there’s a price: Programs such as Access aren’t available for all platforms. If you’re working in a cross-platform environment and you develop an application using one of these products, you’ll shut out some of your users—or worse, play into the hands of vendors who would just love to hook you into buying more of the firm’s products.
As thousands of companies have already discovered, there’s a much better way. Instead of sticking users with fat, complex clients, you give them a Web browser—which, incidentally, they already know how to use. They use the browser to access a Web site, where they’re presented with a Web-based form. By means of simple and readily available scripts, the user’s form entries are translated into queries and relayed to database servers. A query is a search question phrased in a form the database server can understand. The database server is a daemon, which is a program that runs in the background, waiting for queries. When the query is received, the database server daemon digs up the requested data and relays it back to the application that requested the information—in this case, the Web server. Finally, the Web server displays the requested data in the form of a Web page that’s generated on the fly.
Consider the advantages of the latter approach. Anyone who knows how to use a Web browser can access the data. Since Web browsers run on all the platforms currently in use, you’re making the data available to everyone in your organization—and that’s true even if they’re traveling and accessing a password-protected page using a PC in some distant hotel. Using the same techniques, you can make data available on the Web. Of course, it’s still possible to use a client other than a Web browser to access the data made available by a database server. As you’ll see later in this Daily Drill Down, you can obtain GUI-based utilities for Linux that enable users to originate queries without having to learn a query language. You can also obtain GUI programs that enable users to design and administer their own databases without needing to perform any programming.
To sum it all up, if someone tells you that Linux comes up short in the database department, don’t buy it. As you’ll see, the truth is exactly the opposite. Linux is the ideal platform for serving data on local area networks, intranets, and the Web—and what’s more, you can do so for free.
Understanding database servers
In this series, I’ll focus on SQL database servers that use the relational model. If you’re unfamiliar with the terms SQL and relationalmodel, you’ll find a short explanation here; otherwise, skip to the next section.
SQL (Structured Query Language) is a query language that enables users (or applications) to request information from the database server. Whether you type such queries manually or use a front-end program (such as a Web browser) to initiate such queries, the queries must conform to the syntax of the server’s query language. Database servers differ in their support for query languages. Although other query languages have their defenders, SQL is by far the most popular.
Databases differ in their data architectures as well. Again, although other options are available, by far the most popular database architecture is the relational model (for more information, read “Overview of the relational model” at the University of Texas at Austin Computation Center Web site). In brief, the relational model was developed to solve the central problem of earlier database architectures, in which errors arose because the same data had to be entered more than once. To eliminate data redundancy, the relational model calls for the use of two or more databases, called tables, to store data; information in the various tables can be related by means of a common field. Suppose, for example, that you’re tracking the dynamics of a local real estate market. You have one table for recent sales and another for agents; the agents table includes each agent’s contact information. Since most agents make more than one sale, a single-database architecture would force you to type the same information more than once, raising the potential for data input errors. With a two-table design, information about each agent is entered only once. There’s more to the relational model, but that’s the basic idea.
Free and open source database servers
The commercial database server market is dominated by offerings from the likes of Oracle, IBM, and Informix. All three firms have rolled out offerings for Linux, but they’re pricey, to put it mildly. Licensing terms can become burdensome, especially when usage increases. The more popular your database gets, the more you’re in hock. And if you’re underreporting your usage, you could be in line for a lawsuit. As a platform for database servers, what makes Linux especially appealing is the availability of three full-featured, high-performance database servers, and for a very attractive price tag indeed: free. And don’t think you’re getting what you pay for. These are outstanding, full-featured database servers, rivaling the feature sets and performance capabilities of the big-time commercial products. What’s more, you can use these products without fear of exposing your organization to rising costs or a lawsuit brought on by licensing infractions.
Although dozens of freeware and open source database servers are available for Linux, three outstanding products lead the field: Sybase ASE, MySQL, and PostgreSQL. Here, I’ll introduce each of these products; later in this series, I’ll show you how to get each of them running on your Linux system.
Sybase Adaptive Server Enterprise (ASE)
Ranking well below market-leading IBM and Oracle, Sybase is experiencing declining market share—which may very well explain why it’s making such a strong investment in Linux. By any standard, Sybase ASE is an outstanding database server, comparable to Oracle and IBM’s DB2 in its ability to dish out data at world-class speeds. But there’s a key difference: An older version of Sybase ASE, version 184.108.40.206, is available for a very attractive price: $0.00. That’s right: It’s free. You can download Sybase ASE 220.127.116.11 from the Sybase Web site , install it on your Linux system, and use it to serve data in any type of organization, including a for-profit business.
Sybase ASE is a full-featured product that includes not only the database server, but a host of utilities, including the Adaptive Server Enterprise runtime software and development environment, a backup server, and runtime libraries that enable programmers to develop customized clients.
Using Sybase ASE version 18.104.22.168 carries some risk. This copyrighted program’s license expressly states that Sybase is free to change the product’s licensing terms at any time, and at its sole discretion. Although it’s quite unlikely that Sybase would pull the rug out from under users who have developed substantial applications using version 22.214.171.124, the license does specify that the company can do so. If you’re concerned about the consequences, you may wish to consider GPL-ed packages such as MySQL-GPL or PostgreSQL.
Pronounced My-Ess-Que-Ell, MySQL is a database server and a collection of middleware and client utilities made available by a Swedish firm, T.c.X DataKonsultAB. For non-Microsoft systems, MySQL is generally available for free use, although the licensing terms are complex and require some study. Fortunately, the firm has released a version called MySQL-GPL under the terms of the GNU General Public License. MySQL is well supported with a variety of tools, including some GUI-based ones, for query generation, report generation, and database administration.
MySQL is designed to handle as many as 100 simultaneous accesses and to dish out data at lightning speed. It’s also designed to handle enormous databases; there’s one MySQL system that’s handling a database with 100 gigabytes of mission-critical data. But there’s a tradeoff. To achieve its excellent performance, MySQL doesn’t include certain SQL functions that degrade performance, including transaction processing. Although it’s possible to implement transaction processing with MySQL using various middleware approaches, this database server probably isn’t the best choice if you’re setting up a transaction-oriented system, such as an online retail service.
Arguably the most feature-rich database server that’s available under open source licensing, PostgreSQL offers many advanced features, including an object-oriented design and transaction processing. An outgrowth of U.S.-funded research at the University of California, Berkeley, PostgreSQL had to shake off its former reputation as an impractical, academic’s plaything—a reputation that’s no longer accurate. In addition to the server’s impressive capabilities, a wide variety of PostgreSQL-related tools are available, including a KDE query tool and extensive online documentation . Professional support is available from a commercial offshoot, PostgreSQL, Inc .
With its advanced features and the availability of high-quality, commercial support, PostgreSQL is a good choice for businesses that are planning to develop transaction processing systems, whether they’re accessible by means of an intranet or the public Internet.
Bryan Pfaffenberger, a UNIX user since 1985, is a University of Virginia professor, an author, and a passionate advocate of Linux and open source software. A Linux Journal columnist, his recent Linux-related books include Linux Clearly Explained (Morgan-Kaufmann) and Mastering Gnome (Sybex; in press). His hobbies include messing around with his home LAN and sailing the southern Chesapeake Bay. He lives in Charlottesville, VA. If you’d like to contact Bryan, send him an e-mail.
The authors and editors have taken care in preparation of the content contained herein, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for any damages. Always have a verified backup before making any changes.