You might receive foreign data in the form of characters strung together that you must import into Microsoft Excel. Generally, you’ll import the full dataset, even if you require only a portion of the string.
For instance, you might receive a list of transaction numbers, which in part contain the customer identification number. You need only the customer portions in Excel to create a relationship between that customer and a table that contains the customer names. That way, when reporting, viewers will see the customer names and not a meaningless number.
When this is the case, you can use Power Query’s Extract and Split Column features to extract delimited strings into their components. I’m using Microsoft 365 Desktop and Power Query in Microsoft Excel. Power Query is available in older versions through Excel 10. You can download the Microsoft Excel demo file for this tutorial.
- Why you should use Excel’s Power Query
- How to get the data into Power Query
- How to extract delimited strings using Extract options in Power Query
- How to extract delimited strings using Split Column in Power Query
Why you should use Excel’s Power Query
You can use Excel string functions, Text to Columns or Flash Fill, but here are reasons you might not:
- Text to Columns writes over the original data.
- Your data might not be in Excel, and although Power Query is available in Excel, Power Query can import data from lots of sources — not only Excel.
- The source data contains more rows that you can import into Excel.
- You might need to use Power Query for something much more complex, and extracting a subset of the original entry is just the first step.
If the data is in Excel, you might use functions or formulas, but unless you’re an expert, that will take a bit of time. Most of us can’t just rattle off the necessary syntax and get it right the first time. Power Query is quick and requires no specialized knowledge of Excel functions.
How to get the data into Power Query
We’ll work with a simple Excel sheet with a few delimited strings in a Table named TableCustomerID. You don’t have to replace the default Table name, but meaningful names are easier to work with if you have multiple Tables. However, the data must be formatted as a Table object. If it isn’t, Power Query will prompt you to convert the data range.
SEE: Here’s how to create and populate a table in Microsoft Excel’s Power Query.
Let’s suppose you have a list of customer identification numbers with three sections each. Furthermore, a hyphen character serves as a delimiter between the three sections (Figure A). You want to use the middle component of each string because that’s the section that actually identifies each customer. The other two components identify the region where the customer resides and a transaction number.
The first step is to load the data into Power Query as follows:
1. Click anywhere inside the Table.
2. Click the Data tab.
3. In the Get & Transform Data group, click From Table/Range.
That’s it. The simple Table shown in Figure A is now in Power Query.
With the data in Power Query, you can start extracting sections.
How to extract delimited strings using Extract options in Power Query
There are really two ways to extract data in Power Query. We’ll begin by using Extract options, which returns a subset of the original value. To begin:
- Click the Transform tab.
- Click the Extract dropdown in the Text Column group. As you can see in Figure B, there are several options, and they’re all self-explanatory for the most part. We’re going to look at the delimiter options, so you can see what each one does.
- After clicking the header of the Customer ID field to select the column, click the Transform tab, if necessary.
- Click Extract in the Text Column Group.
- Choose the Text Before Delimiter option. In the resulting dialog, enter the hyphen character (Figure C) and click OK.
As you can see in Figure D, this option returns only the first character(s) before the delimiter.
To reclaim the original data, delete the Extracted Text Before Delimiter step in the Applied Steps pane shown in Figure E by right-clicking the item and choosing Delete from the contextual menu.
Now, let’s do the same thing with the next option, Text After Delimiter. When prompted, enter the hyphen character and click OK to see the results shown in Figure F. This time, Power Query removes the first two characters, the first number and the first hyphen.
Once again, reclaim the original data by deleting the extract step and then choosing the Text Between Delimiters option. This time, Power Query prompts for two delimiters. In this case, they’re both the hyphen character (Figure G).
Click OK to see the results shown in Figure H.
Now, we’ll look at another way to divide the three sections of each string, but we won’t extract pieces from the strings, we’ll split the strings. Reclaim the original data before you continue.
How to extract delimited strings using Split Column in Power Query
Power Query’s Split Column helps you return more than a single piece of the string. For instance, let’s suppose you want three columns of data, one for each section. To accomplish this, use Split Column as follows:
- After selecting the column, click the Home tab.
- In the Transform group, click Split Column.
- Click the first option, By Delimiter.
- In the resulting dialog, you don’t need to do much because Power Query does a good job of discerning your needs. Make sure Power Query selects Each Occurrence of the Delimiter in the Split At section (Figure I).
- Click OK to see the results in Figure J.
This option separates each string into three columns using the delimiter character to determine where each section begins and ends.
As you can see, both Extract and Split Column help you separate data quickly. You’re likely to run into uses for both.
Subscribe to the Developer Insider Newsletter
From the hottest programming languages to commentary on the Linux OS, get the developer and open source news and tips you need to know. Delivered Tuesdays and Thursdays