Data Management

Achieve the benefits of normalized data even in a small Microsoft Access database

The process of normalizing data will benefit even the smallest database project. See how we improve a TechRepublic member's wedding plans with some tips for Microsoft Access.

Grow your Microsoft Access skills with TechRepublic's Fast Track: Microsoft Access CD-ROM. Learn to use and maximize all that Microsoft Access has to offer with 75 complete lessons covering Access 97, 2000, and 2002 (XP).

TechRepublic member Navjun recently asked the following question in the TechRepublic Technical Q&A forum:

"I'm making a guest list for my wedding and I have done the following. One table with all the names and address and how many people are in each family and a field that connects to a second table with just table numbers in it. So this way I could go to this certain family and choose a table number for them to sit, so I'm able to choose the same table number for more than one family. Now I would like to make a query that would tell me how many people are at each table and warn me if I go over a certain limit."

Based on Navjun's description, I'm surmising that his database structure looks something like this:



























Closer to normalization

While this setup is serviceable for Navjun's purposes, a few slight changes to the structure and the addition of another table will make the database more in line with the rules of normalization, as well as make it easier to use for wedding invitations, catering, thank-you cards, personalized table settings, etc.

Additionally, Navjun may have large amounts of data in his current database that he isn't interested in retyping. Therefore, my proposed restructuring can be accommodated with little heartburn.

Let's take a look at Figure A and see what changes we can make.

Figure A


The first change is to create a new table called member_table. This table will be used to store the actual members of each family and will have a many-to-one relationship with the family_table. Don't forget to create the family_id field that will act as a foreign key to the family_table and allow us to relate the two tables.

Next, we make a minor change. We eliminate the number_in_family field in the family_table. We do this because we can now calculate the number of family members by counting the number of records in the member_table that share a common family_id.

That's all the changes required to get more functionality out of Navjun's original data structure. One could argue that you could further break down both the family and member tables because the structure above doesn't allow multiple address information per family, or that some of the information in the family table could be kept at the member level, etc. But this application is not a contacts database per se, and the structure above suits the needs of the wedding and then some.

One other caveat: The current structure assumes that families will be assigned as a group, and members cannot be split among tables (at least not without first designating them as a separate family).

Form some queries

Now that we have the structure, we'll take a look at the queries necessary to get our desired result—a list of the table assignments with a warning indicator for any table that is over capacity. In order to get our desired result, we'll need two queries to base our final query on. The first is a query that counts the number of members for each family and also displays the table to which they're assigned. Figure B shows the results of this query.

Figure B

Query results

The SQL used to derive the results looks like this:

SELECT DISTINCTROW [Family_Table].[Family_ID], [Family_Table].[Family_Lastname], [Family_Table].[Table_ID], Count(*) AS [Count Of Member_Table]FROM Family_Table INNER JOIN Member_Table ON [Family_Table].[Family_ID]=[Member_Table].[Family_ID]GROUP BY [Family_Table].[Family_ID], [Family_Table].[Family_Lastname], [Family_Table].[Table_ID];

In the SQL code, we're selecting distinct rows (one row for each Family_ID) from the family table, and then displaying the contents of the Family_ID, Family_Lastname, and Table_ID columns.

Additionally, we're using the expression Count(*) AS [Count Of Member_Table] to count the number of records that make up each family from the member table that we've linked to by an inner join on Family_ID. Lastly, we added a Group BY statement to group our records.

We now need a query that combines information from the family query above and the tables_description table to give us the results set shown in Figure C.

Figure C

Results set

Here's the SQL that was used to derive the results set in Figure C:

SELECT [Tables_Description].[Table_ID], [Tables_Description].[Table_Type], [Tables_Description].[Capacity], [Family_Summary_Query].[Family_ID], [Family_Summary_Query].[Family_Lastname], [Family_Summary_Query].[Count Of Member_Table]FROM Family_Summary_Query INNER JOIN (Tables_Description INNER JOIN Family_Table ON [Tables_Description].[Table_ID] =[Family_Table].[Table_ID]) ON [Family_Summary_Query].[Family_ID] =[Family_Table].[Family_ID];

Now that we have our subqueries, we can use them in our final query to get our list of table assignments. Figure D shows the results set we're trying to achieve.

Figure D

Final results set

Here's the SQL used to get the final results set:

SELECT DISTINCT td.Table_ID, td.Table_Type, td.Capacity, fsq.Family_ID, fsq.Family_Lastname, fsq.[Count of Member_Table] AS Member_Count, td.memcnt, IIF(td.memcnt>td.capacity,'*','') AS OverCapacityFlagFROM Family_Summary_Query AS fsq, [select table_id,table_type,capacity,sum([count of Member_table]) as memcntFROM [Tables_Description Query]Group BY table_id, table_type, capacity]. AS tdWHERE fsq.table_id = td.table_id;

As you can see in Figure D, our query gives us the results that Navjun has asked for: a list that shows each table (Table_ID); its total capacity (Capacity); which families are assigned to it (Family_ID and Family_Lastname); the total number of members from each family assigned to the table (Member_Count); and the total number of people assigned to each table (memcnt).

We also have a flag that's set to indicate which tables are overbooked. So, by looking at the query results, we can quickly see which tables we need to address and which families (because of size) might be causing the problem.

Note that some tables are duplicated because we're showing the various families assigned to the tables. Had we left out that information, we would have only one row per table, but we would have had to refer to a second report to see which families were at each table.

In order to dissect the query above, I need to point out a shortcut. Aliases were used for some of the query names. For example, td is used as a shortcut for table_description_query, and fsq stands for the Family_Summary_Query.

As in the queries above, we're selecting distinct rows. This time, they're coming from the td and fsq queries where the table IDs are identical (WHERE fsq.table_id = td.table_id). We're also using an immediate if statement to create our over-capacity flag (IIF(td.memcnt>td.capacity,'*','') AS OverCapacityFlag). Finally, we're grouping by table_id, table_type, and capacity.

I hope you don't find this to be a terribly long-winded explanation of how to accomplish what Navjun was asking. But this article does provide a solution that Navjun can implement quickly without reengineering the whole process. It also shows you the power of data normalization in database design.

Editor's Picks

Free Newsletters, In your Inbox