Developer

Manipulate XML and SQL with LINQ via LINQPad

Edmond Woychowsky describes using LINQPad to write a LINQ query for inserting Excel data into a SQL database.

During a boring work meeting this week, I realized that I'm a Code Monkey, not a Data Architect. This idea struck me when it was revealed that a portion of a system's daily input would be coming in the form of an Excel file, and it was my job to update the database. Since I'm a Code Monkey, I decided to write some code.

But what kind of code of should I write? These three facts helped me come up with my answer:

  • Excel spreadsheets can be exported XML.
  • The data needs to go into a relational database.
  • I have a brand new copy of O'Reilly's LINQ Pocket Reference.

I decided that LINQ, which makes manipulating XML and a relational database easier, would be a good solution. So I downloaded and installed LINQPad, a free tool for developing LINQ (although if you want to use auto-complete, that feature is not free). One of the neat things about LINQPad is that it allows you to play with the more unusual LINQ commands on the fly; two examples are the often ignored XPathSelectElements and XPathSelectElement, which allow for the use of XPath in queries. The use of XPath in queries is probably why these two commands aren't used as often as they could be.

It's no big deal to insert to a SQL database from an XML document using LINQ because there are thousands of how-to examples on the web — unless you insert from an Excel document, at which point things start to get weird. Unfortunately, if you make believe that an XML doppelganger of an Excel spreadsheet is just like any other XML document, this is a good way to run afoul of schema issues when using LINQPad. Consider the XML in Listing A, a cut down version of the XML spreadsheet; you would think the XPath statement //Row would be enough to find each row, but alas, it isn't. Listing A

<?xml version="1.0" encoding="UTF-8"?>

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40">

<Worksheet ss:Name="Sheet1">

<Table>

<Column ss:Width="234.75"/>

<Row ss:AutoFitHeight="0">

<Cell>

<Data ss:Type="String">Alabama</Data>

</Cell>

<Cell>

<Data ss:Type="String">AL</Data>

</Cell>

</Row>

<Row ss:AutoFitHeight="0">

<Cell>

<Data ss:Type="String">Alaska</Data>

</Cell>

<Cell>

<Data ss:Type="String">AK</Data>

</Cell>

</Row>

</Table>

</Worksheet>

</Workbook>

Because of the namespace LINQPad doesn't find the rows. So, rather than wasting time looking for the solution, I decided to write the XPath as //node()[local-name(.) = 'Row'], which works as the test LINQ query (Listing B). Listing B

XDocument xmlDoc = XDocument.Load(@"m:\documents\xml\linq\book1.xml");

var rows  = from row in xmlDoc.XPathSelectElements("//node()[local-name(.) = 'Row']") select row;

foreach(var row in rows)

{

try

{

Console.Write(row.XPathSelectElement("node()[2]/node()").Value);

Console.Write("\t");

Console.WriteLine(row.XPathSelectElement("node()[1]/node()").Value);

}

catch(Exception e) {}

}

The example in Listing B also shows the logic necessary for stepping through the rows and the XPath to get the individual cell's values. The example also illustrates the necessity of knowing which column contains the information needed. For example, because the abbreviation is in column one and the name is in column two, the node()[2] and node[1] is required so that the correct Cell element/column is used.

The final part of the puzzle is the LINQ to SQL, the database insert. In this example, the database name is StatesList, and it's necessary to create a new instance, set the values, and submit. Listing C shows the completed LINQ query. Listing C

XDocument xmlDoc = XDocument.Load(@"m:\documents\xml\linq\book1.xml");

var rows  = from row in xmlDoc.XPathSelectElements("//node()[local-name(.) = 'Row']") select row;

foreach(var row in rows)

{

try

{

StatesList sl = new StatesList

{

StateABBR = row.XPathSelectElement("node()[2]/node()").Value,

StateName = row.XPathSelectElement("node()[1]/node()").Value

};

StatesLists.InsertOnSubmit( sl );

SubmitChanges();

}

catch(Exception e) {}

}

Although this technique grew out of a moment of panic and desperation, it is a workable solution to the problem at hand.

More LINQ resources on TechRepublic

Information is essentially arriving at our doorsteps in a number of formats and, rather than create an application for each format, I find that sometimes it's faster to code a quick and dirty solution. Is this the case where you work?

Get weekly development tips in your inbox Keep your developer skills sharp by signing up for TechRepublic's free Web Developer newsletter, delivered each Tuesday. Automatically subscribe today!

Editor's Picks