Data Management

Data-mining tips for the SQL command line

Do you know how to thin-slice your SQL database from the command line? Use these syntax shortcuts to get the facts you need.


When you use a SQL command line, don't settle for SELECT statements with WHERE clauses that leave you scrolling through rows of data you don't need. This week, I'll show you how to use wild cards in your WHERE clauses to create queries that return records based on "fuzzy" text matches. I'll also look at two keywords that can help you perform "power searches" on your tables.

It's like, the wild card factors
In "Fine-tune WHERE clauses in SQL," I explained how to locate records in which at least one column contained a value greater than, equal to, or less than another value. When I wrote those WHERE clauses, I used operators like =, >, <, >=, and <=.

In addition to those mathematical operators, most implementations of SQL support two wildcards, the percent sign and underscore (% and _). In SQL syntax, the percent sign is a special character, a wild card that can represent any number of characters. The underscore character is a wild card that can represent any single character.

By using those wild cards with the LIKE keyword, you can perform fuzzy queries on the records in your database. Table A shows the wildcards you can use in SQL. I threw in the corresponding DOS wild cards for reference.

Table A
  Match any single character Match any string of characters
SQL _ %
DOS ? *
SQL wild card characters let you customize your searches for records in the same way that DOS wild cards help you customize searches for files.

In a WHERE clause, the LIKE keyword triggers a fuzzy lookup. To illustrate, suppose you're in charge of auditing all of the customer records whose last names begin with P. Assuming there's a field name called Lname in your table, your SELECT statement would look something like this:
SELECT * FROM Customer WHERE Lname LIKE P%

The only records in the record set resulting from this query will be for customers whose last names begin with the letter P.

This wild card comes in handy when you're not sure about the spelling of a customer's name. If you think the last name could be spelled Spaulding or Spalding, you might issue a command in the form:
SELECT * FROM Customer WHERE Lname LIKE Spa%

That command will pick up records where the last name is Spaulding, Spalding, Spanning, and any other last name that begins with the letters Spa.

Sometimes, the only thing that's variable about a string is a single character. In those cases, you'll use the underscore (_), SQL's other wild card character. For instance, the command
SELECT * FROM Customer WHERE Lname LIKE L_

will display records that contain entries in the Lname field like La, Le, Li, Lo, Lu, and Ly.

Customizing your query with the IN keyword
The IN keywords provides another great tool for data mining. To understand its use, consider the mathematical operators =, >, and <=, which allow you to select records based on broad-based rules such as exactly equal to this, greater than that, or less than or equal to some other value.

Instead of matching records based on simple value comparisons, the IN keyword lets you specify a custom list of values to be used when SQL queries your table. For example, suppose you're asked to generate a list of all of the records for customers who live in the following ZIP codes: 48012, 49014, and 49088.

Based on what you learned in SQL 101, you might write your SELECT command like this:
SELECT * FROM Customer WHERE Zip="48012" OR Zip="49014" OR Zip="49088"

However, that command requires 70 characters (including spaces). And if you want to add additional ZIP codes to this custom list, you'll have to add additional OR clauses. To save keystrokes, use the IN keyword to write the same command with only 61 characters:
SELECT * FROM Customer WHERE Zip IN ("48012","49014","49088")

Then, when you want to add additional ZIP codes to your custom query, you simply add them to the parenthetical list of options. By the way, we used quotation marks to delimit the ZIP codes because they're stored in text (character) fields in our table. If you use the IN keyword to list values that may appear in a numeric field, enter those values without the quotation marks.

Let BETWEEN do your math
The BETWEEN keyword works much the same as IN. If you're looking for records based on values that fall within a certain range, you can save keystrokes by using BETWEEN. For example, if you want to locate records with values in the Current_balance field between 1000 and 2000, inclusive, you could use a statement in the form:
SELECT * FROM Customer WHERE Current_balance >=1000 AND Current_balance<=20000

However, that statement requires 79 keystrokes. Using the BETWEEN keyword, your statement would look like this:
SELECT * FROM Customer WHERE Current_balance BETWEEN (1000,2000)

which cuts the command down to 64 keystrokes.

Share Quick Lessons
To comment on these tips or to share one of your own favorites, start a discussion thread or drop us a note.

 

Editor's Picks