|
With the use of the Internet as its own user interface, data enters the database via a number of ways: windows application, web application, and web service. It is no longer safe to put the business rules at the application level or expect the rules at the level to be implemented correctly or in the same way across all data entry points. The only safe way to ensure business rules in databases is to add the rule at the database level. Unfortunately there is no enforcement via the database table itself beyond general constraints such as size and type and nullability.
This article focuses on a methodology for ensuring data integrity. The code is Microsoft SQL Server specific however it could be applied to any database. The methods discussed in the article require access to a COM object via SQL Server as well as cursors. Any version of SQL Server that has these features will support the methods in this article.
When to Validate the Data
There are several ways to enforce business rules via stored procedures; three of which I will discuss here.
The first opportunity to check the data is at the time of data entry. By checking the data at data entry, you can deny the insert, update, or delete based on the business rules for any new data.
The second time to check the data is just after data entry via a trigger. When using a trigger data gets entered but you can be notified of the problem immediately.
The third time is on a regular schedule such as every hour or every night. This is how you can approach existing data in the database. This is also useful if you want to add a constraint that the data currently violates. You have to find and fix the data according to the new constraint before you can add the new constraint.
The ideal stored procedure should be easy to call and easy to use, and not interfere with the calling application or stored procedure. The stored procedure should report as much information to the calling application as well as to the local Database Administrator (DBA). This report to the DBA could be an email or a listing in a log file (via SQL table).
Example of a Valid Email Address
In order to illustrate how to enforce business rules, let's use a valid email address as an example. The column constraint for a valid email address will be varchar(255) since the email address length can be quite short or very long. You could argue for or against this data type definition but that isn't the point of the article so I'll leave the data type definition alone.
The email address can be validated with a regular expression. While SQL Server hasn't always provided regular expression support, it does provide the use of a COM object. For this example, we will create a stored procedure that calls a COM object to valid the email address. While there are many good and correct regular expressions to validate an email address, for this article let's use
'^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$'.
Let's also add that the database shouldn't have any free email addresses since the application is for business users. It adds an interesting level of checking. The list of free email domains will be kept in a table of its own.
During Data Entry
Before the data is inserted into the table, the business rule stored procedure should be executed. If the stored procedure returned a successful result, the insert should continue. If the business rule fails, the insertion should not continue. This will help keep bad data out. However the bad data may be required for the application to continue. One solution is instead of entering bad data when the business rule fails, replace the bad data with a valid empty value or null.
A Stored Procedure for Each Business Rule
Each business rule should be defined by its own stored procedure. The stored procedure should be able to take the data and return a status number and status description. This would be used by all applications to determine what to do in the event of an error.
For example, a stored procedure for a valid email address looks like:
Create procedure spValidate_EmailAddress
(
@EmailAddress varchar(255),
@Success int OUTPUT,
@SuccessDescription varchar(255) OUTPUT
)
As
-- Success=1 is what we want
Set @Success=0 -- failure case
Set @SuccessDescription='General Failure'
-- code here for validation
If @Success <> 1
-- make entry in log file
return
This stored procedure doesn't fail in a way that would cause the calling application or stored procedure to automatically fail such as raising an error. This is done on purpose because the stored procedure shouldn't stop execution. Stopping execution would be bad for a web service where a person doesn't actually see the error.
This stored procedure should be called from every insert, update, and delete stored procedure prior to data entry and on a regular schedule after data entry. After all Business Rule stored procedures are written, they should be run for all existing data in order to find and fix those errors in data value.
spValidate_EmailAddress
In order to write this stored procedure, the machine that it runs on needs to have a regular expression engine and the SQL server needs a way to call this engine. The engine can be wrapped for calling via SQL server. There are several good articles that explain this. This article on sqlteam.com, Regular Expressions in T-SQL, explains how to add regular expressions to your database via the VB Regular Expression COM object. Based on the article, this stored procedure looks like:
Create procedure spValidate_EmailAddress
(
@EmailAddress varchar(255),
@Success bit OUTPUT,
@SuccessDescription varchar(255) OUTPUT
)
As
Declare @RegularExpression varchar(255)
Declare @IgnoreCase bit
Declare @RegularExpressionTest bit -- Success is -1
Set @Success = 1
Set @IgnoreCase = 1
Set @RegularExpressionTest = 0 --failure
Set @RegularExpression = '^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$'
-- code here for validation against regular expression
Set @RegularExpressionTest = find_regular_expression(@EmailAddress,@RegularExpression,@IgnoreCase)
If (@RegularExpressionTest <> -1) BEGIN
Set @Success=0
Set @SuccessDescription='Fails regular expression test'
-- enter failure into log or send email to DBA
END
-- code here for validation against free email domains
-- parse out domain name
Declare @Domain varchar(255)
Declare @IndexOfAtSymbol int
Declare @LengthOfDomain int
Set @IndexOfAtSymbol = PATINDEX ( '%@%' , @ EmailAddress )
Set @LengthOfDomain = Len(@EmailAddress) - @IndexOfAtSymbol
SELECT @Domain = SUBSTRING(@EmailAddress,@IndexOfAtSymbol+1,@LengthOfDomain)
-- check domain name against free email domain table
if (exists (select Id from tblBadDomainName where Domain = @Domain))
BEGIN
Set @Success=0
Set @SuccessDescription=@SuccessDescription + 'Fails free domain test'
-- enter failure into log or send email to DBA
END
Return
After Data Entry
There are two times to check for data integrity after the data has been inserted into the database. The first is immediately after the table has changed using a trigger. The second is on an hourly or nightly schedule. Both require that the DBA notice and fix the data. If you want to add a constraint to the table that the data currently doesn't adhere to, you will need to find it and fix it before you can add the constraint.
Using Triggers
You can immediately add a check for validity against business rules when a table is changed via a trigger. The trigger can include a call to the business rule and notify the DBA when the business rule fails. The trigger will not stop bad data from getting into the table unless you use an INSTEAD OF trigger. You also should specify which column the trigger is dependent on.
You should use a trigger when the SQL insert, update, or delete is not contained in a stored procedure but instead is passthrough SQL. This type of SQL is not actually kept in the database but rather in the calling application's code. However, a trigger can be a poor choice for validation if the trigger is set on a table that is routinely modified such as bulk insert or update or has other triggers.
The particular version of SQL Server can have other issues with triggers that may limit your interest in this method. It is important to check the trigger ability for the version you are running before using this method.
Running a Nightly Check of All Business Rules
Creating and running a nightly routine to check the data against business rules is important in order to check that the data is consistent with expectations over time. However, the check is not on one piece of data but the entire table or database. For example, if we implement the email business rule at a point months or years after the database has been created and used, there could be many email addresses that fail the check. Each needs to be found and fixed. The DBA should be able to quickly see if there are any failures of each business rule and how many.
Assuming we have many business rules, the main stored procedure for the nightly check will be:
Create procedure spValidate_NightlyCheckOfBusinessRules
As
Declare @EmailSubject varchar(8000)
Declare @CountOfInvalidEmailAddresses int
-- Run Email Validation Business Rule
Exec spValidate_EmailAddress_Cursor @CountOfInvalidEmailAddresses output
-- create subject of email to include how many failures exist
Set @EmailSubject = @EmailSubject + ‘spValidate_EmailAddress_Cursor = ‘ + Convert(varchar(10), @CountOfInvalidEmailAddresses)
-- send email
exec spSMTPMail
'fromemail@domain.com',
'fromname',
'toemail@domain.com',
'toname',
'NightlyDatabaseCheck_Delete: done.',
@EmailSubject
Return
The spValidate_EmailAddress_Cursor stored procedure will be a cursor that runs through every email address in the table and makes a list of email addresses that don't match. This list can be another sql table or an email to the DBA or just marking another column in the table where the email address column is found. If you have implemented a trigger for business rule validation, you shouldn't keep the results of the check in the same table. Regardless of how you store the results, a summary of the results (the count of bad emails) should be returned as an output parameter.
Create procedure spValidate_EmailAddress_Cursor
(
@CountOfFailures int OUTPUT
)
As
Declare @CountOfRows int
Set @CountOfRows = 0
Set @CountOfFailures=0
Declare @Id int
Declare @EmailAddress varchar(255)
Declare @Success int
Declare @SuccessDescription varchar(8000)
-- create cursor
DECLARE
EmailCursor Cursor
FORWARD_ONLY
STATIC
READ_ONLY
FOR
Select Id, EmailAddress
from
tblAddress
Open EmailCursor
-- grab row
Fetch NEXT FROM EmailCursor
INTO
@Id,
@EmailAddress
While (@@FETCH_STATUS = 0)
BEGIN
-- keep count of the records processed
Set @CountOfRows = @CountOfRows + 1
-- test for validity
Exec spValidate_EmailAddress @EmailAddress, @Success output, @SuccessDescription output
-- do something if the test fails
If (@Success<>1)
BEGIN
-- notify DBA of problem including @Id
-- keep count of the failures
Set @CountOfFailures = @CountOfFailures + 1
END
Fetch NEXT FROM BulkEmailCursor
INTO
@Id,
@EmailAddress,
END
CLOSE EmailCursor
DEALLOCATE EmailCursor
return
Summary
Validating data should be done before the data is entered and on a regular schedule after the data is entered. The data validity should follow the business rules and should not stop execution of the calling application if a failure is found. It is ultimately the DBA's responsibility for the data so the bad data should be easy to find and fix.
|