Tuesday, November 4, 2008

SQL User Defined Functions


Have you ever written a SQL query only to come across a relatively small requirement that suddenly makes the query more complex than you want?

In the last few months, it's happened to me twice, but fortunately I was able to utilize a User Defined Function (UDF) to meet the requirement while simplifying my query.

The first project involved exporting data from GP. It had a requirement to split the GP SOP Order number into two fields--the alpha prefix in one field, and the numeric suffix in the other. Well, because there were different SOP order types involved, the alpha prefixes had different lengths, as did the numeric suffixes, and to top it off, some of the order numbers had a dash separating the prefix and suffix, while others did not. Examples:

ORD0000095
WAR-1025
SP001024
CONS-2933

So, how do you split a field value 'in half' when you don't know where one half ends and the other begins? Unfortunately, SQL 2005 still does not have native Regular Expression capabilities. There are some functions that provide some limited character searches (PATINDEX, CHARINDEX), but they don't allow you to search for a pattern or range like RegEx.

After doing some research, I found several references to using a .NET CLR assembly to make Regular Expressions available within SQL statements. While this approach was very appealing, it was more complicated than I wanted, as it would require a change to SQL Server security to allow assemblies to run, and would introduce CLR into the mix, something that is not typically used in GP environments.

Since I was responsible for assisting with deployment of the solution in a remote client environment, I needed something very something simple and self contained in standard SQL--the fewer pieces, the better. So I began to consider the possibility of using a cursor to loop through each character of the order number to determine where the prefix ended, and the suffix began. I'm very reluctant to use cursors in SQL unless absolutely necessary because they typically can't perform nearly as well as a set-based solution, but given my constraints, it seemed like my only option.

After checking with a few colleagues about my cursor approach, I received a recommendation to package up the prefix / suffix routine into a SQL User Defined Function. So here is where I admit that in the 10 years that I've been working with SQL Server, I have never seen the need to use a User Defined Function. Most business data that I've worked with has been pretty straightforward, with multiply and divide and a string function being as complex as things would get. I had heard of User Defined Functions, but didn't realize how simple they really were to create and use--which probably contributed to me never "seeing the need" to use them!

As SQL Server Books Online (BOL) states under "User-defined Function Basics":

The benefits of using user-defined functions in SQL Server are:

--They allow modular programming
--They allow faster execution
--They can reduce network traffic

It goes on to point out that UDFs can be written in T-SQL or a .NET language.

So after reading a little more about UDFs, I built my function. At first, I was going to have the function return the index number indicating where the numeric portion of the order number started. But after realizing how convenient the function would be, I decided to write two functions: one to return the prefix, and the other to return the suffix. (Code listed below)

Once I created the functions, using them was a snap:

SELECT DYNAMICS.dbo.GetOrderPrefix(RTRIM(ORIGNUMB)) FROM SOP30200

This approach made my already complex SQL query much simpler and cleaner than if I had tried to code it all in one query.

Like most technical solutions, there are often many approaches to solving the problem. You weigh the requirements against the platform, environment, tools, performance, constraints, your knowledge and skills, budgets, and time. The lesson I'm trying to convey is not the specific solution or function that I implemented, but rather the opportunity that User Defined Functions presents, offering the potential to reduce the complexity of your SQL queries, views, or stored procedures.

One final note. Although I chose not to use a CLR-based UDF (I would have liked to!), I would strongly recommend looking into them if it is an option in your environment. They offer the tremendous flexibility and power of the .NET CLR, and based on my research, they are extremely fast for complex computations or logic.



Here is the GetOrderPrefix prefix function:


USE [DYNAMICS]
GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[GetOrderPrefix]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
BEGIN
DROP FUNCTION [dbo].[GetOrderPrefix]
END
GO

CREATE FUNCTION [dbo].[GetOrderPrefix] (@SOPNumber varchar(40))
RETURNS varchar(20)
AS
BEGIN
DECLARE @NumStart int
SET @NumStart = 0
IF LEN(RTRIM(@SOPNumber)) > 0
BEGIN
DECLARE @Position int
SET @Position = 0
--If the doc number has a dash, locate the prefix relative to the dash
IF CHARINDEX('-', @SOPNumber) > 0
BEGIN
--Locate the dash in the document number
SELECT @Position = CHARINDEX('-', @SOPNumber)
END
ELSE
--If no dash was found, loop through the characters and find the first numeric character
BEGIN
--Loop through the initial characters of the doc number to identify the end of the alpha prefix
SET @Position = 2
WHILE @NumStart = 0
BEGIN
IF ISNUMERIC(SUBSTRING(@SOPNumber, @Position, 1)) = 1
BEGIN
SET @NumStart = @Position
END
ELSE
BEGIN
SET @Position = @Position+1
END
END
END
-- Return the alpha prefix
RETURN SUBSTRING(@SOPNumber, 1, @Position - 1)
END
--If an empty string was passed in, return one back
RETURN ''
END
GO


And GetOrderSuffix:


USE [DYNAMICS]
GO

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[GetOrderSuffix]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
BEGIN
DROP FUNCTION [dbo].[GetOrderSuffix]
END
GO


CREATE FUNCTION [dbo].[GetOrderSuffix] (@SOPNumber varchar(40))
RETURNS varchar(20)
AS
BEGIN
DECLARE @NumStart int
SET @NumStart = 0
IF LEN(RTRIM(@SOPNumber)) > 0
BEGIN

DECLARE @Position int
SET @Position = 0

--If the doc number has a dash, locate the prefix relative to the dash
IF CHARINDEX('-', @SOPNumber) > 0
BEGIN
--Locate the dash in the document number, add 1 for the starting point of the numeric suffix
SELECT @Position = CHARINDEX('-', @SOPNumber) + 1
END

ELSE

--If no dash was found, loop through the characters and find the first numeric character
BEGIN
--Loop through the initial characters of the doc number to identify the end of the alpha prefix
SET @Position = 2
WHILE @NumStart = 0
BEGIN
IF ISNUMERIC(SUBSTRING(@SOPNumber, @Position, 1)) = 1
BEGIN
SET @NumStart = @Position
END
ELSE
BEGIN
SET @Position = @Position+1
END
END
END

-- Return the numeric suffix
RETURN SUBSTRING(@SOPNumber, @Position, LEN(RTRIM(@SOPNumber))-@Position+1)
END
--If an empty string was passed in, return one back
RETURN ''
END

GO

3 comments:

Mariano Gomez said...

If the case is obtaining alpha characters, then you can use something like the code below (requires SQL Server 2005). If the
non-matching portion can be a mix of numeric and non-numeric characters, then it is a lot more difficult. In the code below you can replace the reference to the system table master..spt_values with auxiliary table with numbers.

CREATE FUNCTION dbo.GetPrefix (@string NVARCHAR(100))
RETURNS NVARCHAR(100)
BEGIN
RETURN
(SELECT CAST(
(SELECT CASE
WHEN SUBSTRING(@string, n, 1) LIKE '[0-9]'
THEN SUBSTRING(@string, n, 1)
ELSE ''
END
FROM (SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number BETWEEN 1 AND 100) AS Nums(n)
WHERE n <= LEN(@string)
FOR XML PATH('')) AS NVARCHAR(100)));
END
GO

Likewise, the following code will remove all alpha characters, leaving you with the numeric portion of the order:

CREATE FUNCTION dbo.CleanNonNumeric (@string NVARCHAR(100))
RETURNS NVARCHAR(100)
BEGIN
RETURN
(SELECT CAST(
(SELECT CASE
WHEN SUBSTRING(@string, n, 1) LIKE '[^0-9]'
THEN SUBSTRING(@string, n, 1)
ELSE ''
END
FROM (SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number BETWEEN 1 AND 100) AS Nums(n)
WHERE n <= LEN(@string)
FOR XML PATH('')) AS NVARCHAR(100)));
END

The above code uses the "^" operator in the LIKE clause.

Best regards,

MG.-
Mariano Gomez, MIS, MVP, MCP, PMP
Maximum Global Business, LLC
http://www.maximumglobalbusiness

Steve Endow said...

Thanks Mariano, excellent suggestion. I'm embarrassed that I didn't realize that LIKE supported RegEx-type ranges like [0-9], [^0-9] and [a-z].

Yes, your approach of stripping out alpha characters and returning the numeric portion, or vice versa, is an interesting alternative and makes a lot of sense.

And very interesting use of spt_values as well.

Thanks!

Steve Endow

Mariano Gomez said...

Steve,
Very good article! This is a common occurrence at every place I have consulted for in the past, especially when integrating multi-platform systems and GP.

Keep up the good work! I am reading for sure.

MG.-
Mariano Gomez, MVP
Maximum Global Business, LLC
http://www.maximumglobalbusiness.com

The Dynamics GP Blogster
http://dynamicsgpblogster.blogspot.com