Friday, February 20, 2009

Project Accounting Integration Frustration - Part 2: Customer Aliases

By Steve Endow

This is a continuation of Part 1:  https://dynamicsgpland.blogspot.com/2009/02/project-accounting-integration.html


In the first article of this topic, I explained the difficulties I ran into trying to assign customer aliases while importing Customer Project Info records into PA00501. In this article I'll present one possible solution for generating customer aliases that does not require any looping in my .NET code, automatically fills in gaps in the numbering sequence, and only requires one call to the SQL database.

I only recently discovered this approach (which appears to be a standard approach to the problem that many before me have already conquered), so I haven't yet thought of possible drawbacks or complications, but so far it seems to work in concept, and even better, it is a general solution that can be applied to many different situations where you need a robust, flexible, and fast method for getting a next number.

So I was already pretty convinced that looping through an arbitrary list of customer alias options was not practical, and that I was going to have to come up with a more comprehensive solution. What is the alternative to taking a subset of choices and checking each one of them against the list of existing customer aliases?

How about if we reverse the situation. Instead of asking "Is this single value already used?", what if we ask "What are ALL of the values that are already in use?". For that new question to make sense, we have to provide some context, so let's try "Out of all of the possible options of our entire value set, what are all of the values that are already in use?" Huh?

Let's consider that our entire set of possible customer aliases is 00000-ZZZZZ. So that's a "base 36" numbering scheme, which gives us 60,466,176 possible permutations. More than enough for any real GP customer. Hmmmmm. So instead of generating a single value, and then checking it against my alias list, what if I had a list of every single possible alias value at my disposal? Well, first of all, it means that I don't have to write a routine to generate a base 36 number, which is a pain, and I could therefore also probably eliminate any looping code. But how in the world could I possibly deal with a list of 60 million values?

If I think "procedurally", I know that I can't possibly loop through that many values. So loops and cursors are out of the question. But if I think in terms of "set based" operations, where SQL Server will magically sift through my request in a split second, the number of records effectively becomes irrelevant. (There are practical limitations that I'll consider later, but bear with me for now.) Okay...so...if I have a list of every possible customer alias, and I also have a list of every customer alias that has already been assigned in GP, what does that mean? It means that I can use a set based operation to compare the two lists, aka a SQL JOIN! And, if I'm able to compare the two lists using a JOIN, that also means that I can compare them in two different ways: An inner join, and an outer join. This was the "Eureka!" moment at which I knew there was hope.

If I can do an outer join between the two lists, that means that SQL Server will instantly tell me which values have not been used yet. I can then pick any number out of that list, and I will have a valid customer alias.

Okay, so the theory sounds promising. But how can I possibly implement this? How in the world do you generate a list of every 5-digit base 36 number? And that's where my friend's story comes in. While on a road trip several years ago, a friend told me that he worked with a SQL guru that was able to generate any series of sequential numbers with a single SQL statement. To do it, he used a less well known outcast cousin of the Join Family: the Cross Join, also known as the Cartesian Join. Under almost all conventional business circumstances, cartesian joins are considered a mistake--rather than combining matching values between two tables, you combined all possible values between the two tables--not good. But when we want to generate every possible permutation of two data sets, cross joins become your savior.

So how do we use a cross join to generate 60 million values? Well, let's start with the basis for our values: base 36. We want to work with the values from 0-9 and A-Z. So if we had two tables that contained the values 0-Z, we could do a cross join between them to get all of the possible values from 00 to ZZ. So let's create a "sequence" table that has our list of possible values:

CREATE TABLE [cstbSequence](
[Seq] [char](1) NOT NULL
) ON [PRIMARY]

Next, populate that table with 36 values, from 0 to Z. (Technically, you could avoid using a sequence table, but the code gets really silly looking, and there are some conveniences to this approach.)

Now, use a cross join to produce a list of values from 00 to ZZ:

SELECT a.Seq AS S1, b.Seq AS S2
FROM cstbSequence a
CROSS JOIN
cstbSequence b

Instantly, you have 1,296 values. No looping!

But we theoretically want 00000-ZZZZZ, so how do we do that 5 times? Well, if we actually wanted to generate that list (we don't! it would take forever!), we would do something like:

SELECT a.Seq AS S1, b.Seq AS S2, c.Seq AS S3,
d.Seq AS S4, e.Seq AS S5

FROM cstbSequence a
CROSS JOIN
cstbSequence b
CROSS JOIN
cstbSequence c
CROSS JOIN
cstbSequence d
CROSS JOIN
cstbSequence e


And, like magic, we can have every possible value from 00000-ZZZZZ. So now let's move from theory to reality, where a 60 million row table is a bit excessive, impractical, and actually unnecessary. Instead of generating all of the permutations, what if we just generated the 4 digit values, from 0000 to ZZZZ? That brings us down to a much more reasonable 1.6 million records. Let's do it!

First, let's create a table to hold our list of possible values:

CREATE TABLE [cstbAliasValues](
[AliasValue] [char](5) NOT NULL
) ON [PRIMARY]
GO

CREATE UNIQUE CLUSTERED INDEX [IX_cstbAliasValues] ON [cstbAliasValues]
([AliasValue] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,

SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF,
ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]


And now let's fill up the table:

INSERT INTO cstbAliasValues
SELECT S1+S2+S3+S4 AS AliasValue FROM
(
SELECT a.Seq AS S1, b.Seq AS S2, c.Seq AS S3, d.Seq AS S4
FROM cstbSequence a
CROSS JOIN
cstbSequence b
CROSS JOIN
cstbSequence c
CROSS JOIN
cstbSequence d
) seq
ORDER BY S1, S2, S3, S4


On my modest virtual server, this operation takes about 21 seconds, but it's a one-time process.

So now we have a list of all of the 4 digit alias values, so let's compare them to our list of assigned aliases. But if we have a 4 digit list, how do we compare it to a 5 digit alias list? Well, if we go back to the requirements of this solution, I said that I wanted to be able to generate an alias that resembled my customer ID. In other words, I would like the first few characters of my alias to be the same as my customer ID. So let's stick with our example of customer ID 123456. By doing so, we are able to check our PA00501 table for aliases that, at a minimum, begin with the number 1--we really don't care about the other permutations that begin with 0 or 2-Z. One way to do that is to use a derived table, where we pretend that our alias values all begin with the number 1.

SELECT av.Alias, cpi.PAcustalias FROM
(SELECT '1'+AliasValue AS Alias FROM cstbAliasValues) av
LEFT OUTER JOIN PA00501 cpi ON av.Alias = cpi.PAcustalias

Assuming that our customer IDs range from 1-9, in this manner, we can just use a value of 1-9 to make our 1.6 million values become a 'virtual' 15 million values.

Almost there!

So now, let's add a filter to narrow down our range and just check the values that can begin with the numbers 123, and only show available aliases:

SELECT av.Alias, cpi.PAcustalias FROM
(SELECT '1'+AliasValue AS Alias FROM cstbAliasValues) av
LEFT OUTER JOIN PA00501 cpi ON cpi.PAcustalias = av.Alias
WHERE av.Alias LIKE '123%' AND cpi.PAcustalias IS NULL

And finally, since we really only need 1 available alias, we don't care about all of the available options--we just want the next available alias, so let's add a TOP clause:

SELECT TOP 1 av.Alias FROM
(SELECT '1'+AliasValue AS Alias FROM cstbAliasValues) av
LEFT OUTER JOIN PA00501 cpi ON cpi.PAcustalias = av.Alias
WHERE av.Alias LIKE '123%' AND cpi.PAcustalias IS NULL


And so there you have it. A SQL statement that will return the next available, sequential, base 36, customer alias in a single call to the database without using a single loop.

The general benefits of this solution is that it technically allows you to have multiple 'systems' issue unique numbers. In my example, GP is issuing customer alias numbers for manually entered customers, and I'm issuing customer alias values for imported customers.

Besides the obvious benefits of this approach, there are several other powerful possibilities.

For instance, let's say that you need to create a routine that will automatically generate unique serial numbers for a hundred different products in a product line. And let's say that you want to avoid using the numbers 0 and 1 and the letters O and I because the two look very similar when printed by your serial number printer. In that case, you just remove those four values from your sequence table, and when you generate your serial numbers, you are guaranteed to never have those characters. Just use the product prefix and then append the unique value to the end. No need to have convoluted logic to make sure that those 4 characters are never issued.

Or, suppose that you need to issue unique document numbers, but you never want to issue the same number twice, even if the document is physically deleted from your document table. In that case, after you issue the number, you can simply delete it from your values table.

Or maybe you needed blocks of 500 unique numbers to issue to a client application that was used by a remote sales force so that they could work offline, but you need a record of every block that was issued and to whom it was issued? Just change the TOP 1 clause to TOP 500, and add a few additional fields to your values table.

This approach does take a few more steps to deploy and implement, but if you need a long term, robust solution for unique numbering, I'm thinking it's a pretty good one.

If you can think of any drawbacks or flaws to this approach, I'm very interested in feedback. I have not yet implemented this in production, so there may be some issues that I haven't considered.


UPDATE:  This solution was implemented at a Dynamics GP customer site and has worked flawlessly for years.

No comments: