Data Management

Automate the process of trimming table data in SQL Server 2005

Arthur Fuller recently instructed a colleague on how to automate the process of trimming table data in SQL Server 2005. Find out how he is able to turn an onerous task into a breeze.

A friend and colleague recently faced an interesting problem. Each month he receives a huge amount of data from a client, which he then must massage before it's usable. The tables contain millions of rows, and the structure of the tables may vary from month to month. To make matters worse, much of the data contains leading or trailing blanks or both.

After my friend did the job by hand the first time around, he quickly discovered the tedium of a manual approach. He wondered whether there was a way to automate the process of trimming all the data—the problem is that he cannot know beforehand how many columns there are or the names of the columns. Fortunately, it is possible to automate this process.

In SQL Server 2005, there are a number of very useful system views that reside in the INFORMATION_SCHEMA schema. The view of interest in this situation is INFORMATION_SCHEMA.COLUMNS.

Get SQL tips in your inbox
TechRepublic's SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system.
Automatically sign up today!

All of these system views are hidden from your scrutiny, so you can't simply right-click and Script To a query window. Microsoft did this on purpose so it is free to change the innards. If you know something about the system tables, you can guess how the view is built, but for our purposes here, the innards are not important.

You can, however, obtain a wealth of information about the columns in your tables. Here are some of the interesting ones:

  • Table_Catalog: name of the database
  • Table_Schema: Schema
  • Table_Name: name of the table
  • Column_Name: name of the column
  • Ordinal_Position: column number
  • Column_Default: default value if any
  • Is_Nullable: contains Yes or No
  • Data_Type: indicates the data type of the column

For a list of the columns on any given table, your query would look something like this:

USE AdventureWorks
SELECT*FROMINFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME ='ProductDescription'

Try this command on one of your databases, substituting its name and a table of interest. Run it to get an appreciation of its output.

Now I'll use SQL to generate the UPDATE statements I need to trim all the data in the table. I achieve this by assembling a literal for each column in the table of interest. I'll skip all the data types that are irrelevant to our needs and concentrate on the char and varchar types. See Listing A. The output of this command is a series of UPDATE statements similar to what appears in Listing B. All you have to do is collect the output and run the block of statements as a query.

In the examples above, I was interested in only one table, but if I want to process all the tables within a schema, I would simply modify the WHERE clause to interrogate the Schema_Name column instead. See Listing C. Check out Listing D to see the output of this command.

This reveals a small problem. As denoted by my naming scheme, a view got included, and it makes no sense to try to update a view when I am already updating its underlying tables. So I need a way to skip the views. Looking only at the INFORMATION_SCHEMA.COLUMNS, there is no way to avoid this because views are not distinguished from tables.

Another view, INFORMATION_SCHEMA.Tables, comes to the rescue. I just need to add an EXISTS predicate to the above query. See Listing E. This version strips out the views from the list to process.

My colleague is now a happy camper. The solution is simple to use and very flexible. He can use it on a single table, or all the tables in a schema, or even on all the tables in a database, just by modifying the WHERE clause.

As you can see, SQL Server 2005 makes it far easier to work with table and column information than was the case in SQL Server 2000. I have also demonstrated how to generate SQL code using SQL. Now you know how to turn an onerous task into a breeze.

Miss a tip?

Check out the SQL Server archive, and catch up on the most recent editions of Arthur Fuller's column.

8 comments
thisisfutile
thisisfutile

Good information and explained very well. Thanks for the examples.

slawrence
slawrence

How would you automatically run the results instead of collecting them and running them manually? Maybe put that SELECT statement into a cursor, loop through the cursor returning the UPDATE statement to a local variable(e.g. @MyUpdateQuery), and then run EXECUTE sp_executesql @MyUpdateQuery?

chapman.tim
chapman.tim

One problem...all of the views you mentioned also exist in SQL Server 2000, so it is not a new 2005 thing. They exist in the master database for every instance....you can even create your own if you want to. Here are a list of the ones that exist in SQL 2000 (all are in the INFORMATION_SCHEMA schema: OBJECT_COLUMNS SCHEMATA TABLES TABLE_CONSTRAINTS TABLE_PRIVILEGES COLUMNS COLUMN_DOMAIN_USAGE COLUMN_PRIVILEGES DOMAINS DOMAIN_CONSTRAINTS KEY_COLUMN_USAGE REFERENTIAL_CONSTRAINTS CHECK_CONSTRAINTS CONSTRAINT_TABLE_USAGE CONSTRAINT_COLUMN_USAGE VIEWS VIEW_TABLE_USAGE VIEW_COLUMN_USAGE ROUTINES PARAMETERS ROUTINE_COLUMNS Tim

artful
artful

Thank you for responding. I'm glad you found the tip useful. Few readers bother to point out what they liked, only what they didn't like. Thanks! I appreciate it. Arthur

artful
artful

That's one way to do it, although it's a little messy for my states, but that's strictly personal. Another way is to create a temporary stored procedure and build it up with the collection of statements. That might a good idea for a forthcoming tip. Although I haven't used this technique a lot, I did work on a project a while ago where it was used extensively, so I've got some great examples handy. Thanks for responding, Arthur

chapman.tim
chapman.tim

You wouldn't necessarily have to use a cursor to execute those statements. If you wanted to use dynamic sql, you could just print all of the statements out, and execute them. Of course, a cursor would work too...it just depends on what you need to do.

artful
artful

I learn something new every day. Perhaps I failed to notice them until SQL 2005 because of new layout of the object tree brought them to my attention. Thanks for pointing this out, Tim. Arthur

chapman.tim
chapman.tim

No problem at all. You can look at the design of the queries in the master db to see what they are doing. It helps a lot when you are trying to write solutions using custom queries from the system tables (which MS recommends you don't do, but I prefer using them).

Editor's Picks