Processing string values is an integral aspect of most
application development projects. This often involves parsing strings into
separate values. For instance, receiving data from an external data source such
as a spreadsheet often utilizes a common format like comma-separated values.
The .NET String class simplifies the process of extracting the individual
values between the commas.

Extracting values

The Split
method of the String class allows you to extract individual values separated by
a specific character. The separator value is passed to the method, which is
overloaded with its second variation accepting a second parameter that
specifies the maximum number of elements to return (extract from the string
value). (Note: You can specify more than one separator in a character array.)
The values pulled from the string are returned in a String array.

Weekly .NET tips in your inbox

TechRepublic’s free .NET newsletter, delivered each Wednesday, contains useful tips and coding examples on topics such as Web services, ASP.NET, ADO.NET, and Visual Studio .NET.

Automatically sign up today!

Here are the
two variables:

  • String.Split(char[]) in C# or String.Split(Char())
    in VB.NET
  • String.Split(char[], int) in
    C# or String.Split(Char(), Integer) in VB.NET

The following C# snippet populates an array with values
contained in a comma-separated string value:

string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";
string[] sites = values.Split(',');
foreach (string s in sites) {
Console.WriteLine(s);
}

The following output is generated:

TechRepublic.com
CNET.com
News.com
Builder.com
GameSpot.com

The equivalent VB.NET code follows:

Dim values As String
values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"
Dim sites As String() = Nothing
sites = values.Split(",")
Dim s As String
For Each s In sites
Console.WriteLine(s)
Next s

You may specify multiple separator characters, which are
contained in a character array. The following code splits a string of values
separated by a comma, semicolon, or colon. In addition, it uses the optional
second parameter to set the maximum number of items returned at four.

char[] sep = new char[3];
sep[0] = ',';
sep[1] = ':';
sep[2] = ';';
string values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com";
string[] sites = values.Split(sep, 4);
foreach (string s in sites) {
Console.WriteLine(s);
}

The following output is generated (notice that the second
parameter places the remainder of the string in the last array element):

TechRepublic.com
CNET.com
News.com
Builder.com; GameSpot.com

The equivalent VB.NET code follows:

Dim values As String
values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com"
Dim sites As String() = Nothing
Dim sep(3) As Char
sep(0) = ","
sep(1) = ":"
sep(2) = ";"
sites = values.Split(sep, 4)
Dim s As String
For Each s In sites
Console.WriteLine(s)
Next s

While the Split method allows you to easily work with
individual elements contained in a string value, you may need to format values
according to a predefined format like comma-separated values. The String class
makes it easy to assemble a properly formatted string.

Putting it together

The Join method of the String class accepts the character to
be used as the separator as its first parameter. The values to be concatenated
are passed as the second parameter in the form of a string array. It has one
overloaded method signature that accepts integer values as the third and fourth
parameters. The third parameter specifies the first array element to use, and
the last parameter is the total number of elements to use.

The following C# code sample demonstrates assembling the
values used in the previous example:

string sep = ", ";
string[] values = new String[5];
values[0] = "TechRepublic.com";
values[1] = "CNET.com";
values[2] = "News.com";
values[3] = "Builder.com";
values[4] = "GameSpot.com";
string sites = String.Join(sep, values);
Console.Write(sites);

The following output is generated:

TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com

The equivalent VB.NET follows:

Dim sep As String
sep = ", "
Dim values(4) As String
values(0) = "TechRepublic.com"
values(1) = "CNET.com"
values(2) = "News.com"
values(3) = "Builder.com"
values(4) = "GameSpot.com"
Dim sites As String
sites = String.Join(sep, values)
Console.Write(sites)

We could use the overloaded format to specify where to begin
and how many elements to include in the result. The following sample begins
with the second (note that element numbering begins at zero) and returns a
maximum of three elements:

Dim sep As String
sep = ", "
Dim values(4) As String
values(0) = "TechRepublic.com"
values(1) = "CNET.com"
values(2) = "News.com"
values(3) = "Builder.com"
values(4) = "GameSpot.com"
Dim sites As String
sites = String.Join(sep, values, 2, 3)
Console.Write(sites)

The starting element number and the maximum values to return
must be valid within the string array being used. If either is invalid (i.e.,
not contained in the array), then an exception is thrown. For this reason, it
is a good idea to utilize a try/catch block to handle any problems.

While the String class provides the necessary methods, it
isn’t the only way to handle the parsing of a string value. Another common
approach takes advantage of regular expressions.

Parsing with regular expressions

The .NET Framework provides the Regex
class contained in the System.Text.RegularExpressions
namespace for using regular expressions within a .NET application. Parsing is
only one of the many applications of regular expressions.

Let’s examine the parsing of our sample string using regular
expressions. The following ASP.NET page uses C# to parse a comma-delimited list
of sites into an array:

<%@ Page Language="C#" Debug="true" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<script language="C#" runat="server">
private void Page_Load(object sender, System.EventArgs e){
if (!IsPostBack) {
string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";
string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";
Regex r = new Regex(pattern);
string[] sites = r.Split(values);
foreach (string s in sites) {
Response.Write(s);
Response.Write("<br>");
} } }
</script>

The equivalent VB.NET code follows. Notice that the
inclusion of quotation marks in the string value (pattern) causes problems. So,
the quotation marks contained in the string must be escaped to be recognized;
this may be achieved by placing two of the characters adjacent to each other.

<%@ Page Language="VB" Debug="true" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<script language="VB" runat="server">
Sub Page_Load
If Not (IsPostBack) Then
Dim values As String
values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"
Dim pattern As String
pattern = ",(?=(?:[^\""]*\""[^\""]*\"")*(?![^\""]*\\""))"
Dim r As Regex
r = new Regex(pattern)
Dim sites As String()
sites = r.Split(values)
Dim s As String
For Each s In sites
Response.Write(s)
Response.Write("<br>")
Next s
End If
End Sub
</script>

Easily work with data

The .NET Framework makes it easy to work with data
regardless of its format. A string containing values separated by a specific
character is easily processed via the String class or possibly regular
expressions. The method that you decide to use will depend on your specific application.

Miss a column?

Check out the .NET Archive, and catch up on the most recent editions of Tony Patton’s column.