Software optimize

Tutorial: Excel parsing expressions

Parsing data can be a difficult task, and it gets even harder if the entries aren't consistent. Use Excel's string functions to parse inconsistent data.

You probably wouldn't store first and last names in the same cell, but you might have to work with a legacy workbook that does. Or, you might import data from a foreign source where the names are combined into one field. Fortunately, Excel has several string functions, Right(), Left(), Find(), Len(), and Mid() that can parse the name components into separate entries.

First, the easy part; parse the component to the left using the simple expression:

=LEFT(A2,FIND(" ",A2)-1)

It makes no difference whether the component is the first or last name. In the case of Robin Banks, the FIND() function returns the value 6, but the expression subtracts 1 from the results. Consequently, the expression extracts the first five characters. If you want to extract the space character, omit the -1 component.

The inconsistency of the entries—some have middle initials and some don't—makes extracting the last name, a bit more complicated. You might try the following expression, but as you can see, it doesn't work as expected:

=RIGHT(A2,LEN(A2)-FIND(" ",A2,FIND(" ",A2)+1))

If the entry doesn't contain two space characters, the second FIND() returns an error value. Use the following expression instead:

=IFERROR(RIGHT(A2,LEN(A2)-IFERROR(FIND(" ",A2,FIND(" ",A2)+1),FIND(" ",A2))),A2)

IFERROR() handles the errors, but the logic is similar.

There's one last step--returning  the middle initial:

=MID(A2,FIND(" ",A2)+1,IFERROR(FIND(" ",A2,FIND(" ",A2)+1)-FIND(" ",A2)-1,0))

If there's no middle initial, this expression returns an empty string instead of an error.

TechRepublic's Microsoft Office Suite newsletter, delivered every Wednesday, is designed to help your users get the most from Word, Excel, and Access. Automatically sign up today!

It's worth mentioning that the Text To Columns feature is an expression-less solution if the entries are consistent. In addition, to learn more about using string functions, read Save time by using Excel's Left, Right, and Mid string functions. Finally, IFERROR() is new to Excel 2007. The logic for these expressions is the same in 2003, but use ISERROR() to handle the error values.

About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

22 comments
dale.danell
dale.danell

Nice to have the routines to get this done. However, is not this another overdue update from Microsoft??? Ideally, the user should only have to highlight the column/row of cells desired for parsing, click on a button, and Excel would insert new columns/rows to the right/below and populate these new cells with the separated strings. Once the data is separated out, the user can delete and format at will to massage their spreadsheet into something useable. This is a very routine exercise, and should not require remembering and "developing" the algorithm each time, to accomplish the task.....

RU7
RU7

Use Substitute to determine how many spaces first. Addressing only normal order names, i.e., NOT "Last, First" format. Not addressing prefix, e.g., Mr., or suffix, e.g., Jr. Formats allowed: FirstName FirstName LastName FirstName MiddleInitial. LastName FirstName MiddleName LastName With name string in column A and data starting in row 2, In FIRST NAME column: =IF(LEN(A2)-LEN(SUBSTITUTE(A2," ",""))=0, A2, LEFT(A2,FIND(" ",A2))) In MIDDLE NAME column: =IF(LEN(A2)-LEN(SUBSTITUTE(A2," ","")) < 2, "", MID(A2, FIND(" ", A2) + 1, FIND(" ",A2, FIND(" ", A2)+1) - (FIND(" ", A2) + 1))) In LAST NAME column: =IF(LEN(A2)-LEN(SUBSTITUTE(A2," ",""))

marjfahe
marjfahe

If the ONLY variation in the original column is that sometimes it is First & Last, and other times it is First, Middle, & Last, I have a relatively simple fix for that: Start by using Text To Columns (for this example, we'll say this ends up in columns B-D), then set up 3 additional columns (E-G) for First, Middle Last. Obviously, for First, simply use the value in column B. For Middle, use this formula: =IF(D2="","",C2) For Last, use this formula: =IF(D2="",C2,D2) Here's the way my example ended up looking: For Robin Banks, my Text To Columns was: Robin Banks "blank" - First was Robin, Middle was "blank", Last was Banks For Dan D. Lyons, my Text To Columns was: Dan D. Lyons First was Dan, Middle was D., Last was Lyons If this is your only variation, this is a very simple fix, as you can make use of Text-To-Columns, then only have two formulas to manipulate for Middle & Last Hope this helps somebody.

pweegar
pweegar

It would be nice if the author(s) would state which versions the Tutorial was aimed at. Not every one has the latest and greatest. Also, IF you use the code provided above as part of a module, you will probably need to alter the security level of the module. Otherwise, you won't be allowed to run the code (at least in Excel 2003). One last note, you will need to add the code to every workbook?

zgozvrm
zgozvrm

For those not using Excel 2007, you can simulate the IFERROR() function using VBA: Function IFERROR(exp As Variant, err As Variant) As Variant If ISERROR(exp) Then IFERROR = err Else IFERROR = exp End Function

ppg
ppg

If you are willing to use a macro you can use the InStr function to find the position of the first space(SL) in the cell and the InStrRev function to find the first space counting from the end of the string (SR). Then everything before SL is the first name, everything after SR is the last name and everything between them is the middle initial. This allows you to deal with two or more initals. If you needed to do this often you could convert the macro to a user defined fucntion.

DaveUnger
DaveUnger

Very interesting and instructive, as usual, Susan. Thanks. I often use the Split function (in combination with Replace & Trim if necessary) to achieve similar results.

ssharkins
ssharkins

I mentioned Text to Columns in the original post -- I hope I didn't edit it out by accident! Thank you for the reminder.

ssharkins
ssharkins

I try to specify when instructions are different from version to version -- I'm now trying to track 2003, 2007, and 2010. The expression logic in this article is the same for all versions. However, as I stated, if you apply the expression logic to 2003, you'll need to rethink the error handling. We can work through those expressions if 2003 users would like to do that.

ssharkins
ssharkins

PPG, if you already have such a macro, would you be willing to share it?

Rick_from_BC
Rick_from_BC

I've had to deal with similar situations, and beat my head against all the variables. I often export the 'offending' column(s) to Word. I use Word's TABLE functions to regularize the data and use the GREP functions to split the data as needed. I can then re-import the normalized data back to the spreadsheet. Playing with GREP in Word has been a huge help in using Excel. I often wish there was more access to GREP in Excel, but as long as the ability to move data between the programs is easy, I can cope.

basil.cinnamon
basil.cinnamon

Now, here's a VBA challenge for all you geeks: Names would typically come in forms like Tim Brooks Tim E. Brooks Brooks, Tim Brooks, Tim E. Brooks, Tim E. F. and maybe a few others. It would be great to have a PARSENAMES(NAMERANGE) function to which you could feed the specified NAMERANGE (e.g. A2:A50), that would scan the entire NAMERANGE, decide how many columns are needed (e.g., 2 if only first and last name, 3 if middle initial, etc.), then parse each cell in NAMERANGE into those columns according to how the data are entered (e.g., if cell contents begin with WORD+COMMA, assume it's LASTNAME-COMMA-SPACE-FIRSTNAME, etc.) One could use a CASE function to address each possible case for how the data are entered. I'd love to take a crack at it, but ain't got no time!

ssharkins
ssharkins

Parsing is one of those subjects that has lots of possibilities -- the needs are unique and therefore, many of the solutions are just as unique!

TobiF
TobiF

In Excel 2003 and 2007 I don't see any function named "Split" in the standard installation. Maybe you have some nice add-on?

marjfahe
marjfahe

...but, I added the next steps to break out the first, middle, last names with the 2 additional formulas, hoping it would help anyone.

steve.lehnhard
steve.lehnhard

I have '10 at home, but at work I am limited to 2003. Any help there would be great.

ppg
ppg

I put together a "Proof of Concept" of my idea as a Function before I posted so here it is with the following disclaimers. It probably needs more error checking and I have only run it in Excel 2007. Usage in the spreadsheet is fairly obvious" =PARSENAME(NameString,Mode) NameString is the cell reference containing the name you want to parse Mode is a string specifying which part you want to return ("First" for first name, "Middle" for middle initial(s), "Last" for last name). Mode is not case sensitive The function has a special check if there is no space in the name so you can even include Madonna in your list. Just open the Visual Basic editor and cut and paste the following into a new module. Phil ---------------------- Function ParseName(strVal As String, mode As String) Dim sL As Integer, sR As Integer sL = InStr(1, strVal, " ") If (sL = 0) Then ' no spaces sL = 1 sR = 1 Else sR = InStrRev(strVal, " ", -1) End If Select Case UCase(mode) ' Evaluate Number. Case "FIRST" ParseName = Trim(Left(strVal, sL - 1)) Case "INITIAL", "MID", "MIDDLE" ParseName = Trim(Mid(strVal, sL + 1, sR - sL)) Case "LAST" ParseName = Trim(Mid(strVal, sR, Len(strVal))) Case Else ParseName = "#VALUE!" End Select End Function

TobiF
TobiF

Mr, Mrs, Ms, Dr, jr, sr, III etc. :)

basil.cinnamon
basil.cinnamon

In my post a minute ago I used carets to specify a field, as in "open caret" - text - "close caret", but those strings disappeared entirely when the post showed up, which is why I went back and edited the post and replaced the strings with uppercase letters to make them stand out. It seems as if the posting mechanism interprets strings within carets as some form of XML and drops them if not understood!

DaveUnger
DaveUnger

Sorry, I got de-railed, I was thinking about the VBA Split function, which isn't relative to this discussion.

ssharkins
ssharkins

The software's not perfect, as you can see. Thank you for going to the trouble to make sure your post could be clearly understood!