Business Intelligence

Define your complex business entities through data modeling

Getting a handle on the data relationships in large systems can be tricky. While data modeling may be losing popularity, it could be the key to tackling complex data. Here's a look at this process, along with some rules for developing effective models.

While gathering business requirements and defining process modeling are both important aspects of project management, another type of modeling may be needed on large, complex projects—data modeling. Although you may not perform this modeling yourself, you should still have a grasp of its concepts. The applications you develop read or manipulate that data, so it's important to understand a little bit about how the relationships between data entities are defined by these models.

Back in the days of Dabs
When I came through the programming ranks on the mainframe, there were flat files and databases (okay, there were VSAM files, too). Flat files were under the control of the programmer, and they could be created at will. Databases, however, were too complex for mere programmers to understand and manipulate. They required data analysts and database administrators (Dabs). Data analysts would work with you to logically model the data, and then the Dabs would implement the resulting model in a physical database.

In today’s client/server and Web development environments, the databases are much more within the control of the developers. We use SQL Server and Oracle in companies that are too small to afford data specialists, so we do the database work ourselves. Perhaps that is why a lot less data modeling seems to be going on today than in the past. However, there is still value in understanding the concepts and knowing when data modeling should be used. It may be very appropriate for large, complex applications or applications where the databases get high hit concentrations.

Two main purposes of data modeling
Formal data modeling serves two fundamental purposes. First, it provides a precise language and syntax to represent the relationships between data entities. If the modeling is broad enough, all the important data the organization uses can be represented this way. As an example, how difficult would it be for your organization to agree on the precise definition of a customer? At a company I used to work for, it took months to define what a customer was, agree on the common attributes, and gain a shared understanding of how the customer related with various other entities. When the definition was complete, half the applications in the company were inaccurate since they defined a customer in different terms. However, from that point on, everyone could rally around the common definition.

The other main purpose of data modeling is to define entities and relationships in ways that can be used to store the underlying business data. This allows us to define files and databases so that business applications can correctly process the information. For example, you may discover that many customer attributes are relatively static. You might also realize that one customer may generate many orders. These simple facts allow you to create two database tables—one to hold the customer attributes and one to hold order information. Each can be keyed on a common customer number. This saves storage space, allows for faster processing, and makes the data easier to maintain.

Some simple data modeling rules
Here are some rules for data modeling that will help achieve clarity and precision:
  • Provide a clear name for all fundamental entities. Unlike process modeling, which tends to be action oriented, data modeling is more about nouns. Some entities are derived from other entities. Focus first on the fundamental entities and then the derived ones. As in our example, one fundamental entity might be the customer.
  • Include attributes that describe the entity. For a customer entity, this might include a description, a current numbering scheme, physical characteristics (if appropriate), address information, etc.
  • Look for relationships between entities. This is the key part of data modeling, as you define how the entity interacts or relates with other entities. For instance, a customer can generate one or more orders. The customer need not have an associated order. An order must have an associated customer. A customer may also be a vendor. A customer may have one salesperson assigned. There are generally accepted ways to show these relationships. The first major definition is whether the relationship is one-to-one, one-to-many, or many-to-many. (There are others, but these are the most common types.)
  • Diagram the relationships using a precise nomenclature. One of the purposes of data modeling is to provide clarity and precision. This is not possible if you describe the models ambiguously. Fortunately, a standard set of diagramming techniques emerged long ago to visually represent the model. Although there are a few major diagramming processes, each with its variations in how it represents the relationships, all the processes represent similar underlying concepts. For instance, a box usually represents an entity; lines that connect two entities generally show a relationship; and a crow’s foot on one end of the relationship line shows a one-to-many relationship.

Depending on the particulars of your project, putting together an effective data model might require some specialized skills. If your project requires complex data modeling, you should probably bring in a specialist. But even if you do, you're not necessarily off the hook. Whatever the model ultimately looks like, you will probably have input into its design, and you'll have to work with the result. So it's still worth your time to understand why the model exists and how you can use it to your benefit.

It's all about the data
Some people take the organization of their company’s data for granted, but large systems can result in such complex relationships that the specialized skills of a data analyst or data architect are required. Even in that case, however, it is still important that developers have a fundamental understanding of not just a company’s data but also its relationships and how to represent those relationships. Here are some points to remember about data modeling:
  • Data modeling may be less prevalent today, but it is still valuable on large or complex projects where the data requirements are vague.
  • The purpose of data modeling is to precisely define the fundamental entities of the business and the relationships between them. Data modeling also provides the precision and detail required to correctly build data structures and databases to store the business information.
  • A complete visual syntax allows the data models to be clearly diagrammed. Other modeling specialists can tell right away how the data relates by viewing the resulting diagrams.
  • Expert data modeling requires specialized skills and training, and normal mortal developers may not feel comfortable tackling the process.

Project management veteran Tom Mochal is director of internal development at a software company in Atlanta. Most recently, he worked for the Coca-Cola Company, where he was responsible for deploying, training, and coaching project management and life-cycle skills for the IS division. He's also worked for Eastman Kodak and Cap Gemini America and has developed a project management methodology called TenStep. He holds a B.S. in computer science from Iowa State University.

0 comments

Editor's Picks