Showing posts with label eConnect 2010. Show all posts
Showing posts with label eConnect 2010. Show all posts

Thursday, February 9, 2012

Importing 725,000 Records Into Dynamics GP In 30 Seconds

At the MS Dynamics World Decisions Fall 2011 virtual conference, I gave a presentation titled Importing 10 Million Transactions Into Dynamics GP.  While the project discussed in that presentation was challenging, I was recently faced with a slightly different high-volume integration challenge.

The client works with several large trading partners, each of which provides a consolidated monthly payment, along with a data file that is effectively the remittance advice for the payment, indicating how the check should be applied to thousands of customers.  But this isn't a typical cash receipt.  A single payment may be up to $4 million, and the remittance advice is a 140 megabyte text file containing 725,000 rows of detailed data relating to thousands of invoices.  And that is just one of their trading partners.

The accounting staff have been manually processing these massive data files in Excel, manually entering thousands of transactions each month, then manually applying the payments to outstanding invoices, and then manually entering adjustments or memos for over-payments or under-payments.  Due to the highly detailed format of the remittance data file and quirks in the data, there was no way to directly import the file using a tool like Integration Manager, and even if they did manage to import the cash receipts, they would still have to apply the payments and manually enter the adjustments.

Although I have developed automated integrations that can process tens of thousands of transactions over the course of a day, I don't think I've ever had to process hundreds of thousands of records at once--while a user was waiting and watching the user interface.  It was obvious that performance was going to be a challenge.

I thought about a few different designs, and came to the conclusion that the best approach would be to get the raw data into a SQL table first, and then process it further from there.  I briefly considered loading the entire file into memory and then import it, but I need to perform several operations and transformations on the data that would be impractical and inefficient using ADO.NET.

So, I need to load a 140 megabyte CSV file into SQL Server programmatically.  Although I loved DTS, I am not a fan of SSIS, so programmatically running an SSIS job to bring the file in did not appeal to me--not worth the hassle.  However, I do have a C# class that does an excellent job of parsing delimited text files, and is blazing fast (it can load the 140MB CSV file into memory in 2 seconds).  So, I can read the text file and get it into a DataTable in memory, but then what?  How do I push that massive data table into SQL?

Performing over 700,000 separate insert statements is clearly out of the question--I didn't even bother to test it since I knew that it would take far too long.  I needed something faster...much faster...

After doing a little searching, I came across the .NET SqlClient SqlBulkCopy class.  Faaaaantastic.  All of the brains of the famous bcp utility, but in a stylish .NET package.  Just 5 lines of code.

using (System.Data.SqlClient.SqlBulkCopy bulkcopy = new System.Data.SqlClient.SqlBulkCopy(gpConn))
{
    bulkcopy.BulkCopyTimeout = 300;
    bulkcopy.DestinationTableName = tableName;
    bulkcopy.WriteToServer(dataTable);
    gpConn.Close();
}

With those few magical lines of code, SqlBulkCopy pushes all 725,000 records into SQL Server in 30 seconds (on my relatively modest virtual server).

Because the client (thankfully) doesn't want to actually import 725,000 separate cash receipts, they summarize the data by customer and month, which results in 2,100 summary cash receipts.  Because I have the data in SQL now, I'm able to use a query to summarize the data.  But even there I run into a performance issue.

Because there is so much data to summarize, my query initially took several minutes to run, and sometimes even caused my .NET app to receive a command timeout error.  Having read many articles about SQL query optimization, I knew that my best bet was to create a covering index on my data table.  Sure enough, after adding the covering index, the query dropped to 4 seconds.

So in roughly 45 seconds, the 140MB file is loaded and converted into 2,100 cash receipt transactions, ready to be handed off to eConnect.  eConnect then becomes the bottleneck, taking a few minutes to create the cash receipts, apply the payments to invoices, and then create debit and credit memos for any over or under payments.

Needless to say the customer is amazed and thrilled with the new solution.  Compared to the manual process that they have been performing to get the data into GP, the custom integration is like magic.

Steve Endow is a Dynamics GP Certified Trainer and Dynamics GP Certified IT Professional in Los Angeles.  He is also the owner of Precipio Services, which provides Dynamics GP integrations, customizations, and automation solutions.

http://www.precipioservices.com

Thursday, September 23, 2010

Partial Explanation of eConnect 2010 "Distributed Transaction" Warnings

In my last post, I discussed an issue where my Windows Event Log filled up with thousands of eConnect 2010 warning messages saying "Distributed transaction was used."

Many thanks to guru Louis for responding to the post and giving me some great info explaining what the eConnect 2010 "Distributed Transaction was used" warning meant and why it might be occurring.

First, just a reminder that eConnect 2010 is working great, and my data was importing into GP 2010 just fine--I'm only receiving warnings, not errors, so they are not interfering with my integration.  I just happened to notice the hundreds of warnings because my Event Log filled up.

Louis explained that for performance reasons, distributed transactions are not used or enabled by default with eConnect 2010, so what I am seeing is eConnect 2010 issuing a warning to let me know that my import is incurring the extra overhead of a distributed transaction.  He then offered some pointers to track down why my import was invoking distributed transactions.

In the case of my customer import, I wasn't intentionally doing anything related to a distributed transaction, and honestly, before today I didn't know enough about them to know why or when they might be used.  I checked my connection string to confirm that it wasn't changing during my import, and I confirmed that I hadn't enabled any settings to turn them on.

I then went ahead and upgraded the eConnect 9 SOP invoice and cash receipt imports that would accompany the customer import.  I was afraid they would have the same problem, but to my surprise, neither wrote a single entry to the eConnect event log.  They worked perfectly.  Hmmm.

So it was only my customer import that issued the warnings, which seemed really strange.  The customer record is quite simple, so I couldn't imagine what would require a transaction.

And then I remembered something.  As part of my customer import, I am importing e-mail addresses.  And as all good students of GP and eConnect know, there is no handy e-mail address field on the customer window.  It's tucked back in the Internet Information window in GP, and in the taCreateInternetAddresses and CMPInternetAddressType in eConnect.  So...what if sending these with the customer record were somehow causing eConnect to  invoke a distributed transaction?

So I commented out the code that adds the internet address type to my customer XML, and re-ran my integration.  Viola!  No entries in my eConnect event log!  Hmmm, so for some reason the presence of the customer internet addresses is triggering a distributed transaction on one of my servers. 

This is a great find, but as the title says, it's only a partial explanation.  This problem only occurs on my 32-bit server, and doesn't occur on my 64-bit server or the client's 64-bit server.  So my guess at this point is that it might be a configuration issue with my 32-bit machine, or a quirk in the 32-bit version of eConnect 2010.

Fortunately it isn't a show stopper, just a head scratcher, so it won't interfere with my client's GP 2010 upgrade schedule. And in the process, I learned alot more about eConnect 2010 alot faster than I would have if everything worked perfectly.

From that learning, I have a few more ideas for eConnect 2010 posts on deck...