Thursday, February 9, 2012

Importing 725,000 Records Into Dynamics GP In 30 Seconds

At the MS Dynamics World Decisions Fall 2011 virtual conference, I gave a presentation titled Importing 10 Million Transactions Into Dynamics GP.  While the project discussed in that presentation was challenging, I was recently faced with a slightly different high-volume integration challenge.

The client works with several large trading partners, each of which provides a consolidated monthly payment, along with a data file that is effectively the remittance advice for the payment, indicating how the check should be applied to thousands of customers.  But this isn't a typical cash receipt.  A single payment may be up to $4 million, and the remittance advice is a 140 megabyte text file containing 725,000 rows of detailed data relating to thousands of invoices.  And that is just one of their trading partners.

The accounting staff have been manually processing these massive data files in Excel, manually entering thousands of transactions each month, then manually applying the payments to outstanding invoices, and then manually entering adjustments or memos for over-payments or under-payments.  Due to the highly detailed format of the remittance data file and quirks in the data, there was no way to directly import the file using a tool like Integration Manager, and even if they did manage to import the cash receipts, they would still have to apply the payments and manually enter the adjustments.

Although I have developed automated integrations that can process tens of thousands of transactions over the course of a day, I don't think I've ever had to process hundreds of thousands of records at once--while a user was waiting and watching the user interface.  It was obvious that performance was going to be a challenge.

I thought about a few different designs, and came to the conclusion that the best approach would be to get the raw data into a SQL table first, and then process it further from there.  I briefly considered loading the entire file into memory and then import it, but I need to perform several operations and transformations on the data that would be impractical and inefficient using ADO.NET.

So, I need to load a 140 megabyte CSV file into SQL Server programmatically.  Although I loved DTS, I am not a fan of SSIS, so programmatically running an SSIS job to bring the file in did not appeal to me--not worth the hassle.  However, I do have a C# class that does an excellent job of parsing delimited text files, and is blazing fast (it can load the 140MB CSV file into memory in 2 seconds).  So, I can read the text file and get it into a DataTable in memory, but then what?  How do I push that massive data table into SQL?

Performing over 700,000 separate insert statements is clearly out of the question--I didn't even bother to test it since I knew that it would take far too long.  I needed something faster...much faster...

After doing a little searching, I came across the .NET SqlClient SqlBulkCopy class.  Faaaaantastic.  All of the brains of the famous bcp utility, but in a stylish .NET package.  Just 5 lines of code.

using (System.Data.SqlClient.SqlBulkCopy bulkcopy = new System.Data.SqlClient.SqlBulkCopy(gpConn))
    bulkcopy.BulkCopyTimeout = 300;
    bulkcopy.DestinationTableName = tableName;

With those few magical lines of code, SqlBulkCopy pushes all 725,000 records into SQL Server in 30 seconds (on my relatively modest virtual server).

Because the client (thankfully) doesn't want to actually import 725,000 separate cash receipts, they summarize the data by customer and month, which results in 2,100 summary cash receipts.  Because I have the data in SQL now, I'm able to use a query to summarize the data.  But even there I run into a performance issue.

Because there is so much data to summarize, my query initially took several minutes to run, and sometimes even caused my .NET app to receive a command timeout error.  Having read many articles about SQL query optimization, I knew that my best bet was to create a covering index on my data table.  Sure enough, after adding the covering index, the query dropped to 4 seconds.

So in roughly 45 seconds, the 140MB file is loaded and converted into 2,100 cash receipt transactions, ready to be handed off to eConnect.  eConnect then becomes the bottleneck, taking a few minutes to create the cash receipts, apply the payments to invoices, and then create debit and credit memos for any over or under payments.

Needless to say the customer is amazed and thrilled with the new solution.  Compared to the manual process that they have been performing to get the data into GP, the custom integration is like magic.

Steve Endow is a Dynamics GP Certified Trainer and Dynamics GP Certified IT Professional in Los Angeles.  He is also the owner of Precipio Services, which provides Dynamics GP integrations, customizations, and automation solutions.


Steve Chapman said...

That is a fantastic story. I have no doubt that the customer is, "amazed and thrilled."

Frank Hamelly said...

That's incredible Steve. Thanks for the post!

Marty said...

Steve, you set the standard for being a professional!