Friday, January 29, 2016

Handy (Lazy) technique for multi-value string parsing with .NET

By Steve Endow

I find that I regularly need to parse strings that contain multiple logical values.

For example, you may have some employee data with a value like:


If you need to extract the first name, last name, and employee number, it can be tedious.  For the first name, you could locate the period in the value, then use a Substring function and get the left X characters until the period.

But then what about the last name?  You would need to locate the period, then locate the first dash, then get the characters between those two positions.  And then repeat this very tedious process for employee number and department.

Similarly, you are probably familiar with this type of multi-value string:


If you want the natural account in the second segment, you would normally do some tedious string parsing.  To do it properly, you can't assume that the account segment lengths will always be the same, so you need to first locate the dashes, then extract each segment relative to the dashes.

In short, this is a very common, but annoyingly tedious task.

I don't know why I didn't think of this sooner, and I'm sure this is obvious and widely used by programmers smarter than me, but it seems my laziness to think of a better solution finally overcame my laziness trying to avoid having to yet again parse a string using Substring--which I hate doing.

So I thought, well, we have a string with two or more values separated by a separator character.

If we pause for a moment, what does that sound like?

Perhaps like a delimited list?

And whenever you encounter a delimited list of values, what is a common way of parsing such values?  The Split function, of course.

So normally I would use Split to deal with values like:  27,56,78,90,12,34

You use string.Split(','), and you end up with a handy array.  So this is widely used when you have multiple independent values separated by the same delimiter.

But if we step back a bit, who cares what the values are and whether they are related, and who cares what delimiter is used?

In the case of our GL account string:


We have three related values:  Segment 1, Segment 2, Segment 3.  Because these segments are related, and not a list of independent values, for some reason I never thought to use Split.  but if we think more abstractly and ignore that relation, the string is just a generic dash delimited list.

In which case, we can do:

string[] account = accountNumString.Split('-');

We can then reference account[0], account[1], and account[2] without having to count any character positions, locate dashes, or use Substring.

Moving on to our employee example:


Here, we have four values.  Two are separated by a comma, and three are separated by a dash.  Like the GL account, let's just pretend they are just delimited lists of values.

We can then use Split to extract specific values from our multi-value delimited string.

string firstName =  employeeInfo.Split('.')[0];
string lastName =   employeeInfo.Split('.')[1].Split('-')[0];
string employeeID = employeeInfo.Split('-')[1];
string department = employeeInfo.Split('-')[2];

Notice for the last name we're using a double split, which sounds like a gymnastic move.

Here we see it in action:

By the same token, rather than sending the GL account segments to an array, you can access the segments directly through Split.

string segment1 = accountNumString.Split('-')[0];
string segment2 = accountNumString.Split('-')[1];
string segment3 = accountNumString.Split('-')[2];

So no need to locate a separator, no need to count character positions, and no need to use Substring.

I'm assuming that this is a common practice for many, but I'm often late to the party, so sadly, this only occurred to me tonight.  It only took me around 30 years to figure it out.

Steve Endow is a Microsoft MVP for Dynamics GP and a Dynamics GP Certified IT Professional in Los Angeles.  He is the owner of Precipio Services, which provides Dynamics GP integrations, customizations, and automation solutions.

You can also find him on Google+ and Twitter

No comments: