Using Excel's Text To Columns feature to clean up foreign data

Work smart when cleaning up foreign data. Text To Columns can divide data and get rid of delimiters.

Occasionally, we have to work with foreign data. If we're lucky, Excel imports it cleanly and we start using it right away. Often, the data isn't immediately usable. For instance, the records shown below aren't, not yet. First, we need to divide the first and last names into autonomous units. Second, the quotes must go. Don't be discouraged - you're going to be pleasantly surprised at how easy it is to scrub this data clean.

We'll use the Text To Column feature to separate the first and last names into two columns. To do so, select the data in A1..A3. Click the Data tab and then click the Text To Columns option in the Data Tools group. In Excel 2003, Text To Columns is on the Data menu. Excel does a good job of assessing what the data needs. In this case, the values are delimited. That means, a character separates the values. Specifically, a semi-colon separates the first and last names. For this example, you don't need to change any options, so click Next.

The default delimiter is the Tab character (but this feature remembers your last setting, so it might be different). Check Semicolon and uncheck any others (if necessary). When you do, Excel previews the change. Pay attention to this display because it'll alert you to problems. In this case, Excel doesn't update the last record. In addition, a lot depends on your text qualifier. If set to none, Excel retains the quote characters.

If set to the double quote character ("), Excel removes the quotes, but still fails to divide the last record properly. We're certainly a lot closer to being done than we were before though.

The problem is the delimiter in the last record. If you look closely, it isn't a semi-colon at all. It's a colon character. Most of the time, you won't run into this kind of obstacle (mixed delimiters). I threw it in to enhance the lesson; the lesson is to watch what's going on in the preview pane.

Fortunately, this feature's flexible enough to handle more than one delimiter. Check the Other option, enter a colon, and watch what happens in the preview pane. Excel properly divides the first and last names in all of the records and the quotation marks are gone! Click Finish to see the foreign data divided and cleaned of the text-qualifying characters (the quotation characters).

You probably weren't expecting to tidy up the data with just one feature, and it would work just as  quickly and easily for thousands of records. Of course, you could use Find and Replace to remove the quotation marks before or after dividing the data, but you don't have to because Text To Columns takes care of them for you. I'm betting you didn't know that this feature did double-duty!

Example Excel worksheets are provided to help explain this tip.


Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

Editor's Picks