Normalizing foreign data for Access

Getting flat-file data into an Access database is simpler than you think. A good grasp of the data and a few append queries are usually all it takes.

istock-952679588accessexcel.jpg
Image: metamorworks, Getty Images/iStockphoto

Foreign data usually isn't normalized, so you can't import data and go right to work. Instead, you must divide values among normalized tables. Doing so seems like a huge job, but most of the time, a few queries and a solid grasp of the data itself are all you need.

In this article, I'll walk you through the process of normalizing the data in a flat Excel file. Once inside Access, you'll run append queries to copy data and key values from the original data set into normalized tables. I'll also throw in a couple of business rules to complicate things.

I'm using Office 365 (desktop) on a Windows 64-bit system. This process is valid in older versions, but the instructions might be different. This article assumes you have basic knowledge of Excel, Access, and normalization rules. Access is a relational database that requires normalized tables. If you don't know what normalization is, you might want to read the following articles:

Strictly speaking, you don't need to know the rules of normalization to follow, but knowledge of the rules will help clarify the process; without that foundation, you might feel a bit lost. The demonstration download contains the data in an Excel workbook and the database.

SEE: Cost comparison calculator: G Suite vs. Office 365 (Tech Pro Research)

Knowing the data

We'll use the Excel data set shown in Figure A. It's a simple data set of memberships by members and the personal information that impacts that membership. Even by Excel standards, storing two names in the same record, as shown, is problematic, but doing so in Access is a serious violation of normalization rules, which you must apply in order to work efficiently in Access.

Figure A

accessexcelnormalizationa.jpg
We'll load this data into normalized Access tables.

To begin, you might think members are the main entity. It's an easy choice, but here's where knowing your data makes or breaks things. In this case, a business rule allows membership to have multiple members. If you don't know that from the get-go, you might make some bad decisions. Memberships are the main or parent table, and members will be the child table.

Figure B shows the initial normalization scheme. (You're looking at a view of the normalized tables in Access Design view.) Most of the information—card number, address, and so on—belong to the membership. The members' table contains names, a primary key, and a foreign key (the primary key in MembershipParent) that relates members to their respective membership records. Access queries will pull the two tables together into meaningful information.

Figure B

accessexcelnormalizaionb.jpg
Normalize the data early in the process.

If you're familiar with relationships, you might think we have a simple one-to-many relationship: A single membership can have multiple members. However, we have a second business rule: We track old membership records (each membership lasts one year from the date of purchase). That means members can have more than one membership: The active one and multiple inactive ones. There are two ways to go: Archive the old membership records or support the inactive memberships alongside the active memberships using a many-to-many relationship. Let's do the latter.

We need a third table (Figure C), known as an associate table, to accommodate the many-to-many relationship. An associate table stores foreign key values from two related tables, which complicates the process of getting the Excel data into normalized Access tables.

Figure C

accessexcelnormalizaionc.jpg
The many-to-many relationship requires a third table.

Before you go further, this is the best time to clean the data. To learn more about that, you can read 9 ways to clean foreign or imported data (TechRepublic). For instance, the data set contains null values for membership 104. The member lists only one name—more on that later. It's enough to know that you need to take this step with real-world data before continuing.

Getting the source data into Access

Once you have a normalization scheme, you're ready to move the data into Access. You'll start an import at the External Data tab inside Access. However, our data set is so small, we can bypass an import task and paste the data into Access. Select the Table in Excel and press Ctrl+c to copy it to the Clipboard. Open Access, right-click in the Navigation pane and choose Paste.

At the prompt, identify whether the first row is a header row (that's Yes for the example data set). Click OK when Access confirms the import was successful. Access does a good job of discerning the appropriate data types, as you can see in Figure D. In addition, Access used the sheet name to name the new table; you can rename this table if you like.

Figure D

accessexcelnormalizaiond.jpg
Load the Excel data into Access.

Dividing the data

We're not ready to divide the data yet; first, we need to add a primary key to each record in the original data set. Think of it as a ribbon that binds all the fields in each record. Technically, Access uses a primary key to uniquely identify each record. A primary key in a related table is a foreign key.

The original card number values appear to be unique; a safer route, for our purposes, is to add an AutoNumber column to the table. To do so, open the Data set table in Design view and do the following:

  1. Find a blank row and enter a meaningful name, such as MembershipPK, into Field Name. I inserted a row at the top, but you can add the AutoNumber field to the end of the field list.
  2. From the Data Type drop-down list, choose AutoNumber.
  3. With the new row selected, click Primary Key in the Tools group (Figure E).

Figure E

accessexcelnormalizaione.jpg
Add an AutoNumber field and make it the primary key.

You can read more about the surrogate/natural key issue by reading the download 10 tips for choosing between a surrogate and natural primary key (TechRepublic).

It's time to start populating our normalized tables. This is where you'll need basic Access skills: You need to build the three tables sketched earlier. When working with an existing database, the tables may already exist though; when that's the case, you can skip the normalization process because that work's already been done for you.

We'll begin with MembershipParent. Specifically, we'll use an append query to copy all fields except the name fields and the last membership number from Dat set to MembershipParent as follows:

  1. On the Create tab, click Query Design (Queries group).
  2. Use the resulting Show Table dialog to add Data set to the query.
  3. Move the following fields to the query grid: MembershipPK, Membership Number, Membership Type, Number Children, Purchase Date, Street Address, City, State, Zip Code, Phone, and Email (Figure F). This is the source table.
  4. Click Append in the Query Type group.
  5. In the resulting Append window, select MembershipParent (the destination table) from the Table Name dropdown (Figure G), and click OK. Access will add an Append To row to the query grid.
  6. Map the Append To fields to the source fields in Data set (Figure H shows most of the fields).
  7. Click Run in the Results group and click Yes to append the rows. The MembershipParent table now has four records. None of those records include member names.

Figure F

accessexcelnormalizaionf.jpg
Add fields to the query grid.

Figure G

accessexcelnormalizaiong.jpg
Specify the destination table.

Figure H

accessexcelnormalizaionh.jpg
Map the fields in both the source and destination tables.

The most common cause of an error at this point is mismatched data types. If that happens, cancel the process, and compare data types. Make corrections as necessary and try again. It's easy to make mistakes and end up repeating yourself. One way to avoid errors is to select View in the Results group before running the query. This is especially helpful when you add criteria to the task. You can see the results without running the query and committing any changes—it's a trial run.

Now we're ready to move on to the members. Because each record in the source table contains both names, we'll run two append queries to copy both set of names into MemberChild. We'll begin with the first set as follows:

  1. On the Create tab, click Query Design (Queries group).
  2. Use the resulting Show Table dialog to add Data set to the query.
  3. Move the following fields to the query grid: 1st Last Name, 1st First Name, and MembershipPK (Figure I). The primary key value from the original data set will become a temporary foreign key.
  4. Click Append in the Query Type group.
  5. In the resulting Append window, select MemberParent from the Table Name dropdown, and click OK.
  6. Match up Data set's fields to MemberParent.
  7. Click Run in the Results group and click Yes to append the rows. The MemberChild table has four records. You moved the first set of names for each record in the main data set to the members table.

Figure I

accessexcelnormalizaioni.jpg
Create the query by dragging fields to the query grid.

Remember, some records have two names. Repeat the append query, but in step 2, add 2nd Last Name and 2nd First Name to the grid (Figure J). After clicking Run, MemberChild will have eight records. Now you have all of the members from the original data set in a normalized table.

Figure J

accessexcelnormalizaionj.jpg
Add the second set of names.

Viewing Figure J, you can see the problem with nulls—the append query included the empty set for Harkins because that record in the data source has a primary key (MembershipPK). For now, it's enough to know this sort of thing happens; you can remedy it by removing the record, or not—depending on your data and business rules. For our purposes, it does no harm, so we won't deal with it. However, it could generate erroneous data later. For example, a simple count of members would be off by at least one.

Most of the time, you can stop here because the most common relationship is a one-to-many. You could start building select queries and start working. However, if you try to run a query to pull together card numbers and members, as is, the query won't work; we need to accommodate the many-to-many relationship by populating an associate table.

The good news is that the associate table already exists—sort of. Using the MemberChild table, we can quickly build the associate table with a final append query. MemberChild contains both keys in the MembershipFK and MemberPK fields. We're one append query away from being done.

Build this append query as you have before, using Figure K as a guide. The source table is MemberChild. You're appending to MembershipMemberMMAssociate (Figure C). MemberPK will become a foreign key for the members and MembershipFK will become a foreign key for the memberships. Run the query to populate the associate table.

Figure K

accessexcelnormalizaionk.jpg
Run this last append query to populate the associate table.

With all three tables in place, you can start building queries to turn this data into meaningful information. For example, using Figure L as a guide, you can build a simple select query to display a list of members and their respective card numbers. You can trace all eight records to the original data set. You still have the empty record for card number 104; you'll need a business rule to deal with it at some point.

Figure L

accessexcelnormalizaionl.jpg
Start building queries!

From Excel to Access

Our goal was to import and normalize foreign data. It won't always be this easy, and quite often, the data will need more specialized attention once it's in Access. Consider running the Analyze Performance tool; it will help you improve your initial schema choices.

There's still lots to do. You'll want to create a lookup table for the membership types. You could create a ZIP Code lookup table. It isn't necessary, but with that in place, you could automatically fill city and state fields by entering a ZIP Code, reducing data entry. You'll need to support those inactive memberships; remember, members can have more than one inactive membership.

Send me your question about Office

I answer readers' questions when I can, but there's no guarantee. Don't send files unless requested; initial requests for help that arrive with attached files will be deleted unread. You can send screenshots of your data to help clarify your question. When contacting me, be as specific as possible. For example, "Please troubleshoot my workbook and fix what's wrong" probably won't get a response, but "Can you tell me why this formula isn't returning the expected results?" might. Please mention the app and version that you're using. I'm not reimbursed by TechRepublic for my time or expertise when helping readers, nor do I ask for a fee from readers I help. You can contact me at susansalesharkins@gmail.com.

See also

By Susan Harkins

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.