In July 2001’s “A crash course in SQL,” I explained how to query a database using a few basic SQL statements, and we invited your questions for future lessons. This week, I’ll answer the question raised most often in your e-mails: How do you query two or more tables at once?
A quick review
Here’s how I sum up rudimentary SQL at the command line: You use a SELECT statement to tell the SQL engine, “Show me some records out of some table.” In the first lesson, we learned three things:
- How to specify which columns we wanted to see in the results
- How to determine which records we wanted in the result set
- How to arrange the records in the result set in a certain order
The basic, single-table SELECT command looked like this:
SELECT [some or all of the columns] FROM [a table] WHERE [the records meet some condition] ORDER BY [at least one column]
For example, if you wanted to see an alphabetical list of the names of customers who are at least 21 years of age, you’d use a statement like this one:
SELECT Lname,Fname,Age FROM Customers WHERE Age>20 ORDER BY Lname
To see the same list, sorted in descending order by age, you’d use a statement like this:
SELECT Lname,Fname,Age FROM Customers WHERE Age>20 ORDER BY Age DESC
Once you understand how to query a single table, querying two or more tables is a snap.
Don’t miss an issue
Subscribe to Jeff Davis’ Help Desk TechMail and get a bonus of Jeff’s picks for the best resources on the Web.
Selecting records from two tables
In the basic SELECT command, you tell SQL which columns to display from the source table. The format doesn’t change much when you’re querying two tables instead of one, but there are some new rules:
- You must list the columns you want to see from each table and label those column references appropriately.
- You must list the tables from which the columns come.
- Your WHERE clause should, as a general rule, “connect” the tables by requiring a matching value in at least one key column from each table.
Here’s what a typical SELECT command looks like when two tables are involved:
SELECT table1.column1, table1.column2, table2.column1, table2.column2 FROM table1, table2 WHERE (table1.column1=”somevalue”)
You use dot notation in the table and column references to eliminate ambiguity in case the two tables have columns with the same name.
In this sample WHERE clause, the expression (table1.column1=”somevalue”) identifies the records that you want to find in table1. The second condition of the WHERE clause, (table1.keyfield=table2.keyfield), is the tricky part. It says, “but only if there’s a matching record in table2.”
To illustrate how requiring a match in a key value works, consider this scenario, based on my work for a nonprofit agency.
- In this case, table1 is Mailinglist, and table2 is Donations.
- Mailinglist contains the columns: DIN (donor identification number), Lname, Fname, and Phone, among others.
- Donations contains the columns: DIN, Date, Fund, and Amount, among others.
- Mailinglist contains many thousands of records. It is used primarily for direct-mail campaigns.
- Donations contains a list of all of the monies received by the agency. Auditors compare the records in this table to the agency’s bank records.
- All records in Mailinglist and Donations contain a DIN. However, Mailinglist has a one-to-many relationship with Donations. That is, a person can have only one DIN, and the Mailinglist table has no two records with the same DIN.
- The key value that connects the tables is DIN. The Donations table has many records that contain the same DIN, but that DIN matches only one record in the Mailinglist table.
The director of the agency says, “Give me a list of all donations over $500, including the name and telephone number of the people who donated.” SQL makes it easy to deliver the first part of that request.
SELECT * FROM Donations WHERE Amount >500
To deliver the second part of the request, however, you must query not only Donations for donations over $500, but you must also query the Mailinglist table to pull the names and phone numbers. Here’s one way to generate those results:
SELECT Mailinglist.Lname, Mailinglist.Fname, Mailinglist.Phone, Donations.DIN, Donations.Date, Donations.Fund, Donations.Amount
FROM Mailinglist, Donations
WHERE (Donations.Amount >500)
AND (Donations.DIN = Mailinglist.DIN)
ORDER BY Mailinglist.Lname, Mailinglist.Fname
The first list contains explicit references to the columns we want to see in our result set (three columns from Mailinglist, four columns from Donations). The first part of the WHERE clause specifies only records in Donations where the entry in the Amount field is over 500. By requiring that SQL find a matching DIN in the Mailinglist table, the second condition of the WHERE clause ensures that we’ll get the correct name and telephone number from the Mailinglist table.
Some versions of SQL support aliasing your tables so you don’t have to type the whole table name each time you refer to one of its columns. To do so, you enter the alias immediately after the table’s name. For instance, in our sample query, to alias the Mailinglist table as “m” and the Donations table as “d,” the SELECT statement would look like this:
SELECT m.Lname, m.Fname, m.Phone, d.DIN, d.Date, d.Fund, d.Amount FROM Mailinglist m, Donations d
WHERE (d.Amount >500)
ORDER BY m.Lname,m.Fname
If your version of SQL supports aliasing, use it as long as you feel comfortable with it. Even though it requires more keystrokes, I prefer to spell out my table references, just to make it harder to type the wrong alias.
I hope this crash course answers the questions raised in the first segment. We’ve just scratched the surface of the kinds of queries you can execute on two or more tables. In future columns, I’ll show you some advanced tips for your WHERE clauses.
Insert comments here