Software Development

Easily parse string values with .NET

The .NET Framework simplifies processing and formatting data with the String class and its Split and Join methods or regular expressions. Learn more about using these methods in your application.

Processing string values is an integral aspect of most application development projects. This often involves parsing strings into separate values. For instance, receiving data from an external data source such as a spreadsheet often utilizes a common format like comma-separated values. The .NET String class simplifies the process of extracting the individual values between the commas.

Extracting values

The Split method of the String class allows you to extract individual values separated by a specific character. The separator value is passed to the method, which is overloaded with its second variation accepting a second parameter that specifies the maximum number of elements to return (extract from the string value). (Note: You can specify more than one separator in a character array.) The values pulled from the string are returned in a String array.

Weekly .NET tips in your inbox
TechRepublic's free .NET newsletter, delivered each Wednesday, contains useful tips and coding examples on topics such as Web services, ASP.NET, ADO.NET, and Visual Studio .NET.
Automatically sign up today!

Here are the two variables:

  • String.Split(char[]) in C# or String.Split(Char()) in VB.NET
  • String.Split(char[], int) in C# or String.Split(Char(), Integer) in VB.NET

The following C# snippet populates an array with values contained in a comma-separated string value:

string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";
string[] sites = values.Split(',');
foreach (string s in sites) {
Console.WriteLine(s);
}

The following output is generated:

TechRepublic.com
CNET.com
News.com
Builder.com
GameSpot.com

The equivalent VB.NET code follows:

Dim values As String
values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"
Dim sites As String() = Nothing
sites = values.Split(",")
Dim s As String
For Each s In sites
Console.WriteLine(s)
Next s

You may specify multiple separator characters, which are contained in a character array. The following code splits a string of values separated by a comma, semicolon, or colon. In addition, it uses the optional second parameter to set the maximum number of items returned at four.

char[] sep = new char[3];
sep[0] = ',';
sep[1] = ':';
sep[2] = ';';
string values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com";
string[] sites = values.Split(sep, 4);
foreach (string s in sites) {
Console.WriteLine(s);
}

The following output is generated (notice that the second parameter places the remainder of the string in the last array element):

TechRepublic.com
CNET.com
News.com
Builder.com; GameSpot.com

The equivalent VB.NET code follows:

Dim values As String
values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com"
Dim sites As String() = Nothing
Dim sep(3) As Char
sep(0) = ","
sep(1) = ":"
sep(2) = ";"
sites = values.Split(sep, 4)
Dim s As String
For Each s In sites
Console.WriteLine(s)
Next s

While the Split method allows you to easily work with individual elements contained in a string value, you may need to format values according to a predefined format like comma-separated values. The String class makes it easy to assemble a properly formatted string.

Putting it together

The Join method of the String class accepts the character to be used as the separator as its first parameter. The values to be concatenated are passed as the second parameter in the form of a string array. It has one overloaded method signature that accepts integer values as the third and fourth parameters. The third parameter specifies the first array element to use, and the last parameter is the total number of elements to use.

The following C# code sample demonstrates assembling the values used in the previous example:

string sep = ", ";
string[] values = new String[5];
values[0] = "TechRepublic.com";
values[1] = "CNET.com";
values[2] = "News.com";
values[3] = "Builder.com";
values[4] = "GameSpot.com";
string sites = String.Join(sep, values);
Console.Write(sites);

The following output is generated:

TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com

The equivalent VB.NET follows:

Dim sep As String
sep = ", "
Dim values(4) As String
values(0) = "TechRepublic.com"
values(1) = "CNET.com"
values(2) = "News.com"
values(3) = "Builder.com"
values(4) = "GameSpot.com"
Dim sites As String
sites = String.Join(sep, values)
Console.Write(sites)

We could use the overloaded format to specify where to begin and how many elements to include in the result. The following sample begins with the second (note that element numbering begins at zero) and returns a maximum of three elements:

Dim sep As String
sep = ", "
Dim values(4) As String
values(0) = "TechRepublic.com"
values(1) = "CNET.com"
values(2) = "News.com"
values(3) = "Builder.com"
values(4) = "GameSpot.com"
Dim sites As String
sites = String.Join(sep, values, 2, 3)
Console.Write(sites)

The starting element number and the maximum values to return must be valid within the string array being used. If either is invalid (i.e., not contained in the array), then an exception is thrown. For this reason, it is a good idea to utilize a try/catch block to handle any problems.

While the String class provides the necessary methods, it isn't the only way to handle the parsing of a string value. Another common approach takes advantage of regular expressions.

Parsing with regular expressions

The .NET Framework provides the Regex class contained in the System.Text.RegularExpressions namespace for using regular expressions within a .NET application. Parsing is only one of the many applications of regular expressions.

Let's examine the parsing of our sample string using regular expressions. The following ASP.NET page uses C# to parse a comma-delimited list of sites into an array:





<%@ Page Language="C#" Debug="true" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<script language="C#" runat="server">
private void Page_Load(object sender, System.EventArgs e){
if (!IsPostBack) {
string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";
string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";
Regex r = new Regex(pattern);
string[] sites = r.Split(values);
foreach (string s in sites) {
Response.Write(s);
Response.Write("<br>");
} } }
</script>

The equivalent VB.NET code follows. Notice that the inclusion of quotation marks in the string value (pattern) causes problems. So, the quotation marks contained in the string must be escaped to be recognized; this may be achieved by placing two of the characters adjacent to each other.

<%@ Page Language="VB" Debug="true" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<script language="VB" runat="server">
Sub Page_Load
If Not (IsPostBack) Then
Dim values As String
values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"
Dim pattern As String
pattern = ",(?=(?:[^\""]*\""[^\""]*\"")*(?![^\""]*\\""))"
Dim r As Regex
r = new Regex(pattern)
Dim sites As String()
sites = r.Split(values)
Dim s As String
For Each s In sites
Response.Write(s)
Response.Write("<br>")
Next s
End If
End Sub
</script>

Easily work with data

The .NET Framework makes it easy to work with data regardless of its format. A string containing values separated by a specific character is easily processed via the String class or possibly regular expressions. The method that you decide to use will depend on your specific application.

Miss a column?

Check out the .NET Archive, and catch up on the most recent editions of Tony Patton's column.

About

Tony Patton has worn many hats over his 15+ years in the IT industry while witnessing many technologies come and go. He currently focuses on .NET and Web Development while trying to grasp the many facets of supporting such technologies in a productio...

0 comments

Editor's Picks