Monday, October 5, 2009

Develop with XML? You had better know XPath!!!

I recently had to work on a VB .NET app that made me feel great about my modest development skills. I know I have weaknesses and gaps in my development skills portfolio, but one thing that makes me shake my head and roll my eyes is when I see code that tries to work with XML without using XPath. If you work with XML, and if you don't know about XPath and when you should use it, you really need to spend a few hours learning it.

I started working with XML in 1999 with MSXML3, using DOM and XPath extensively, and then later using extremely elaborate XSLT to do presentation layer transformations for a large web site. Admittedly, back then XML was extremely overhyped, but since then, I haven't met a developer that is comfortable with XPath, let alone XSLT. I'm sure they are out there, I just haven't had the luck to meet them.

The app in question (an add on window for Dynamics GP) was developed by someone a few years ago, but was never really finished, and therefore the client never implemented it. I got the honor of finishing the required functionality and getting it to work properly.

I ignored the fact that this extremely simple, single window "application" with the most rudimentary functionality was completely over-engineered (i.e. 3 tiers). But when I came across code that read a hand coded XML configuration file, I stared in disbelief.

First, why the developer created a custom XML file instead of using the My.Settings feature in Visual Studio 2005 puzzles me. I'll give him the benefit of the doubt and assume that he was using an old version of Visual Studio for some reason. But then I saw this code, which should induce horror in any developer that knows how to work with XML:

Public Function getSQLConnection() As String

Dim xmlRdr As XmlTextReader
Dim sqlConn As String = ""
Dim sDirectory As String = System.Windows.Forms.Application.StartupPath()
Dim sXmlPath As String = sDirectory & "\xmlTrainTemplateConfig.xml"
Try
'create xml reader
xmlRdr = New XmlTextReader(sXmlPath)
'Disable whitespace so that you don't have to read over whitespaces
xmlRdr.WhitespaceHandling = WhitespaceHandling.None
'read the xml declaration
xmlRdr.Read()
'read the 'Configuration' tag
xmlRdr.Read()
'Read the 'Config' tag
xmlRdr.Read()
'Get the 'Config' Attribute Value
If xmlRdr.GetAttribute("type") = "SQL" Then
'Read element 'ConnStr'
xmlRdr.Read()
'Get the 'ConnStr' Element Value
sqlConn = xmlRdr.ReadElementString("ConnStr")
End If
'close the reader
xmlRdr.Close()
Catch EX As Exception
Dim exCustom As New Exception("Error reading xmlTrainTemplateConfig.xml: " & sXmlPath, EX)
Throw exCustom
Finally
getSQLConnection = sqlConn
End Try

End Function


Okay, very funny, I get it, practical joke, right? Seriously, so where is the real code that reads the config file?

So, let me summarize. Reading an XML file line by line, making assumptions about the order of each node/line, and then having to test attribute and element values to see if you are reading the correct node completely defeats the purpose of using an XML file. You might as well just use a text INI file and save yourself time and embarrassment.

The code sample above is, simply, how not to work with XML.

Now, let's try another approach. There are probably a dozen better ways to get the job done, but I'll throw out just one that didn't require any thought or design. Doesn't this look just a tad bit better?

Dim xmlDoc As New XmlDocument
xmlDoc.Load(sXmlPath)
sqlConn = xmlDoc.SelectSingleNode("//ConnStr").InnerText.Trim

This approach uses the most rudimentary XPath node selection expression ("//") that searches the entire XML document for any matching nodes. Technically, this is a very sloppy technique, but like I said, no thought required, and we're dealing with a private config file, so we don't need to be very rigorous.

If you aren't familiar with XPath but want to learn more, I would recommend reading this every good, very simple tutorial:

http://www.w3schools.com/XPath/default.asp

And here is the XPath chapter for the O'Reilly book "XML in a Nutshell" that appears to be available for free online:

http://oreilly.com/catalog/xmlnut/chapter/ch09.html

There are plenty of other XPath references, so it shouldn't be hard to at least learn the basics and bookmark some good references for later use.

To close, I'll leave you with a practical example of how XPath can be used with eConnect. I recently fielded an eConnect question on Experts Exchange where someone asked how they could delete specific taSopLineIvcInsert nodes from an XML document. So if you have an XML document with 500 SOP line items, and you need to delete just 5 specific items, how would you do it?

One way is to try looping through all of the 500 line nodes and check each one to see if it matches the 5 that you need to delete. But that means your code is wasting time checking 495 lines that do not need to be deleted. Wouldn't it be nice to just jump into the XML and select the 5 that you need to delete without having to read the other 495? How does this look?

xmlDoc.LoadXml(myXML)
xmlNode = xmlDoc.SelectSingleNode("//taSopLineIvcInsert[ITEMNMBR='X205']")
xmlNode.ParentNode.RemoveChild(xmlNode)



Pop Quiz: Knowing what you know about SOP lines in Dynamics GP, and now that you are an expert on XPath, what is potentially wrong with the sample code above?

I'll reward the person who gives the first correct answer with a cold beer, or hot cup of coffee, in Fargo at the GP Technical Conference in November. (Winner must be present to receive prize!)

4 comments:

Unknown said...

Steve,

Quite a good article and a pop quiz. Here's what I feel is wrong:

I am not sure about the syntax, but knowing SOP Lines and GP, ITEMNMBR is required to get the SOP Lines to be integrated and it would throw an error if this node is not present in the XML when we integrate the transaction.

Logically, we cannot delete the ITEMNMBR node at any point of time.

Well, that's all I could guess.

Thanks
Vaidy Mohan

Steve Endow said...

Hi Vaidy,

Very good point. It wasn't the flaw that I was thinking of, but you are correct. If that item does not exist in the XML, the selection will not occur and an "Object reference not set" error will occur.

It is actually one of the things I always forget to do while parsing XML: test for the presence of a node before trying to get its value. And once again, I forgot about it in my example. I'm aware of two ways to test for the presence of an XML node, but neither seems very elegant, so I think I need to do some research and see if there is a better way.

Anyway, a beer or coffee for you in Fargo! Thanks for the comment.

And I'm still waiting to see if anyone else finds the other flaw, which is a little more subtle.

Steve

Mariano Gomez said...

In it's current form, the

xmlNode.ParentNode.RemoveChild(xmlNode)

will remove the entire XML node, not the child node. The code must specify the child node as a parameter:

xmlNode.ParentNode.RemoveChild("//[ITEMNMBR='X205']")

And of course, you must save the node again:

xmlNode.Save(myXML)

To Vaidy's point,

If xmlNode IsNot Nothing Then
xmlNode.ParentNode.RemoveChild("//[ITEMNMBR='X205']")
xmlNode.Save("file.xml")
End If

Just a guess of course.

MG.-
Mariano Gomez, MVP

Steve Endow said...

Thanks guys,

Good suggestions to complete the solution.

The specific issue I was thinking of was actually that it is possible for the same line item to appear more than once in a SOP transaction.

So one search through the XML may be insufficient. In which case I would probably go with a Node List object and use SelectNodes instead of SelectSingleNodes.

But such things often come out during testing, and obviously depend on the nature of the data the client needs to process.

Thanks!

Steve