Wednesday, February 20, 2013

Finding Duplicate GL Entries

I faced an interesting dilemma this past week, in trying to locate duplicate GL entries integrated from another system.  Obviously, GP doesn't know they are duplicates and assigned each one a unique journal entry number.  So we had to find a way to locate the duplicates, and then only remove the duplicates (not the originating entries that needed to remain).

So what to do...

--First, I started with identifying the GL20000 that would be needed to identify the duplicates
SELECT OPENYEAR, JRNENTRY, SOURCDOC, REFRENCE, DSCRIPTN, TRXDATE,ACTINDX, ORGNTSRC, CRDTAMNT,DEBITAMT, DEX_ROW_ID as DEXROW1,


--Then I used, for the first time, ROW_NUMBER and PARTITION BY to essentially assign an index number to each duplicate.  So the first record would be 1, the first duplicate would be 2, the second duplicate would be 3 for each set of identical records.  This would help me isolate all of the "1"s as the records that should remain (the original entries) and anything greater than 1 as a duplicate to remove.  The PARTITION BY function let me reset the numbering for each set of duplicate records.
ROW_NUMBER() OVER (PARTITION BY SOURCDOC, REFRENCE, DSCRIPTN, TRXDATE,ACTINDX, ORGNTSRC, CRDTAMNT,DEBITAMT ORDER BY JRNENTRY)

--Fortunately, the integration passed information in to the originating source field that we could use to isolate entries from the integration only.

FROM GL20000 GL1 WHERE ORGNTSRC LIKE 'TheIntegrationHadAUniqueValueHere%'

--Next, I joined back to the GL20000 and grouped it to find only those record with actual duplicates (COUNT(*)>1). So anything that wasn't duplicated, wasn't returned.
AND ((GL1.REFRENCE IN (SELECT GL1.REFRENCE FROM GL20000 AS TMP GROUP BY REFRENCE, DSCRIPTN,TRXDATE,ACTINDX,ORGNTSRC,CRDTAMNT,DEBITAMT HAVING COUNT (*)>1 AND REFRENCE=GL1.REFRENCE AND DSCRIPTN=GL1.DSCRIPTN AND TRXDATE=GL1.TRXDATE AND ACTINDX=GL1.ACTINDX AND ORGNTSRC=GL1.ORGNTSRC AND CRDTAMNT=GL1.CRDTAMNT AND DEBITAMT=GL1.DEBITAMT)))
So I ended up my result set of duplicated records, I was then able to eliminate the records with a row number of 1 as the original entries.  The only thing I missed, and had to identify manually was a situation where there were duplicate debits or credits within a single journal entry.  So, for example, two debits in one journal entry both for $110.  I had a handful of those, and they caused my identified duplicate entries to not balance.  Once I identified those (using a calculation in Excel to net debits and credits for each line after sorting the duplicates together), then my identified duplicates balanced debits to credits.

I know that there may be a more elegant way to accomplish this, but this worked and reconciled to the expected duplicates so I thought I would share in case someone else comes up against this.

No comments: