Software

Using Excel's Text To Columns feature to clean up foreign data

Work smart when cleaning up foreign data. Text To Columns can divide data and get rid of delimiters.

Occasionally, we have to work with foreign data. If we're lucky, Excel imports it cleanly and we start using it right away. Often, the data isn't immediately usable. For instance, the records shown below aren't, not yet. First, we need to divide the first and last names into autonomous units. Second, the quotes must go. Don't be discouraged - you're going to be pleasantly surprised at how easy it is to scrub this data clean.

We'll use the Text To Column feature to separate the first and last names into two columns. To do so, select the data in A1..A3. Click the Data tab and then click the Text To Columns option in the Data Tools group. In Excel 2003, Text To Columns is on the Data menu. Excel does a good job of assessing what the data needs. In this case, the values are delimited. That means, a character separates the values. Specifically, a semi-colon separates the first and last names. For this example, you don't need to change any options, so click Next.

The default delimiter is the Tab character (but this feature remembers your last setting, so it might be different). Check Semicolon and uncheck any others (if necessary). When you do, Excel previews the change. Pay attention to this display because it'll alert you to problems. In this case, Excel doesn't update the last record. In addition, a lot depends on your text qualifier. If set to none, Excel retains the quote characters.

If set to the double quote character ("), Excel removes the quotes, but still fails to divide the last record properly. We're certainly a lot closer to being done than we were before though.

The problem is the delimiter in the last record. If you look closely, it isn't a semi-colon at all. It's a colon character. Most of the time, you won't run into this kind of obstacle (mixed delimiters). I threw it in to enhance the lesson; the lesson is to watch what's going on in the preview pane.

Fortunately, this feature's flexible enough to handle more than one delimiter. Check the Other option, enter a colon, and watch what happens in the preview pane. Excel properly divides the first and last names in all of the records and the quotation marks are gone! Click Finish to see the foreign data divided and cleaned of the text-qualifying characters (the quotation characters).

You probably weren't expecting to tidy up the data with just one feature, and it would work just as  quickly and easily for thousands of records. Of course, you could use Find and Replace to remove the quotation marks before or after dividing the data, but you don't have to because Text To Columns takes care of them for you. I'm betting you didn't know that this feature did double-duty!

Example Excel worksheets are provided to help explain this tip.

About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

11 comments
bkfriesen
bkfriesen

Great article about an incredibly useful feature. I've been using the feature for a bit where I have multiple rows in a single cell that have been created using the keystroke combination ALT/ENTER to go to a new row. It turns out that keystroke combo inserts the ASCII CR (Carriage Return)character. You can enter the CR character as your delimter with ALT 010.

awgiedawgie
awgiedawgie

Thanks! A useful little tidbit for those who weren't aware of it. Personally, I've been using text-to-columns for years to parse data I've exported from other programs - usually lists of full-path filenames. The fun part was when I finally broke down and upgraded from 2000 to 2010 and had to figure out where the feature was relocated in the ribbon bar :) I'm surprised, however, that you didn't extend the lesson to include all the pages of the feature. Like where you can format the data, or choose to skip a column of data entirely, so you don't have to mess around deleting columns later.

modeler4
modeler4

wow, it's a sad state of affairs when we need a lesson to watch a MS preview pane. Best point is that you can pick more than one delimiter. Thanks for that tip. Does 2010 take it further?

jgarcia
jgarcia

I use to save the foreign data as txt with notepad, and then load the txt in excel and parse the fields in columns. This is faster because it allows you to choose more than one field separator and you don't have to create temp files. Nice! Thank you.

dparks1940
dparks1940

Excellent tip! Thanks for posting. dp

rm
rm

Select Data, Text to Columns Please mention Libre and Open Office whenever there is a shared function with Excel!

C Ragsdale
C Ragsdale

I use this procedure a lot. It's great for working with mixed lists of data. There are also ways to reverse the process by using the concatenate formula.

mslizny
mslizny

I have not previously used this feature, although I use "Table to Text" often in MS Word. Also it helps that you pointed out that more than one condition of separating data can be chosen in the dialog box.

ssharkins
ssharkins

I don't think the feature's been improved in 2010 -- could be wrong, but have noticed anything new. I don't think it's sad though -- this feature isn't used near as much as it should, and it's more powerful than folks realize -- just letting people know. A lot of users never really have the time to learn how to use these apps efficiently -- that's where I come in. :)

Steve__Jobs
Steve__Jobs

Multiple text qualifiers would be a good start.