Data Management

Modeling columns as rows

The logical consequent of normalization is that columns should be modeled as rows only theoretically. In practice, where the number of columns may remain constant for years, this refinement is unnecessary. But in some situations, you should think about representing these attributes as rows rather than columns.

If you think about it, the logical consequence of normalization is that you should model a table's columns, aside from the Primary Key (PK), as rows, not columns. In many applications, this step is unnecessary, but in some, you have little choice.

Let me present an example from a large database that I helped remodel. The application was an enterprise solution tailored to pulp and paper mills. Even though all clients are interested in certain attributes of a given product, each client might have an interest in an attribute that no one else cares about.

One solution is to include a lot of user-definable columns in the Products table, and then provide a means for the client to "name" said columns, indicate their data type, and so on. There are quite a few commercial products (e.g., Act!) that use this approach.

Our team decided not to go that route; instead, we chose to model all the "columns" as rows. Then, we gathered the collection of rows corresponding to any given product and assembled them as virtual columns.

Here's a simple example to show you how to do this. We'll begin with a table called Things, and then add another table called Properties. Finally we'll build an associative table called ThingProperties. Listing A contains a script to create them.

We'll populate the Thing table with the values X, Y, and Z. Then, we'll populate the Properties table with the values COLOR, DESIGNER, SHAPE, and SIZE. Listing B contains sample values for the ThingProperties table:

In a simple case, such as listing the properties and their values for any given thing, it's a simple SELECT. Let's suppose, however, that we want a list of all the small red triangular things. The table above reveals that only X and Y have the three desired property values. How do we get this list? The answer is to open the table once for each property that is specified, which in this case is three. And then, join the second and third instances to the first on the thing column. Listing C has the SQL that makes it work.

The great thing about this SQL statement is its formulaic nature. For each property value requested, open the table once. This statement is easy for a front-end application to manufacture. Alternatively, you could turn the statement into a collection of stored procedures, each taking n pairs of properties and values; then, have your front end invoke the appropriate one, depending on the number of pairs passed.

While not every application requires this kind of flexibility, if you find yourself having to deliver one, this is how to do it.

TechRepublic's free SQL Server newsletter, delivered each Tuesday, contains hands-on tips that will help you become more adept with this powerful relational database management system. Automatically subscribe today!

0 comments

Editor's Picks