Data Management

How do I... Reject alpha characters in a SQL Server character column?

Enforcing your data's integrity is probably the single most important issue you face when designing a database. Validating user input is one way of keeping bad data from making its way into your analysis and reports. It only takes one piece of bad data to throw everything off. Susan Sales Harkins explains how to use CHECK constraints in SQL Server to control user input.

Enforcing your data's integrity is probably the single most important issue you face when designing a database. Validating user input is one way of keeping bad data from making its way into your analyses and reports. It takes only one piece of bad data to throw everything off. It's serious business and there are no shortcuts -- mistakes, even innocent ones, are easy to make. It's just too easy to enter one too many zeros or enter "6heodore" instead of "Theodore." Granted, you can't stop every single mistake at the input stage, but it's the best place to start. (We are working with SQL Server Express because it's free and easy to use, but the concept and examples are valid in SQL Server.)

This blog post is also available in PDF form as a TechRepublic download.

About CHECK

When you restrict the values a column can store, you're enforcing domain integrity. In other words, you're making sure that all of the data for a single column (or a table) meets specific conditions. SQL Server offers two built-in constraints for enforcing domain integrity:

  • CHECK is a rule that applies to data. You might use a CHECK constraint to require a value for a particular field to be greater than 1 or to contain a specific number of characters.
  • DEFAULT fills in a value when the user fails to supply one. (We are not discussing DEFAULT here.)

Using CHECK to reject alpha characters

The easiest way to reject alpha characters is to use a numeric data type, but that's not always practical. Sometimes, you want a character column to contain only numeric values. This happens when valid data consists of numeric characters, but you don't use them mathematically. For instance, phone numbers and ZIP codes comprise numeric characters, but you store them as text because you won't evaluate them in mathematical equations.

A character column accepts both numbers and letters. If domain integrity requires that such a column accept only numbers, use CHECK to make that happen. It's easier than you might think. Simply specify a numeric character for each required position. A character column that stores ZIP code values is a good example.

Using Management Studio, expand a database and add a new single-column table. Name the column PostalCode and set its data type as char(5). Name the table anything you like and expand the new table's node. To add a constraint that rejects any entry that contains an alpha character, do the following:

  • Right-click the table's Constraints node and choose New Constraint.
  • Enter (PostalCode LIKE '[0-9][0-9][0-9][0-9][0-9]') in the Expression text box to limit the first five characters to numbers. Be careful not to enter space characters between the digit component [0-9].)
  • Modify the constraint's default name to CK_PostalCodes_PostalCode.
  • Enter a description, such as "Limit ZIP code values to numbers," as shown in Figure A.
  • In the New Constraint dialog box, click Add.
  • Click Close.
  • Click Save on the Standard menu.

Figure A

A simple expression rejects any alpha character

Now, check the constraint by trying to adding a couple of records. First, run the simple query:

USE AdventureWorks

INSERT PostalCodes

VALUES ('40604')
Figure B shows what happens when the ZIP code value satisfies the CHECK constraint -- SQL Server adds a new record for the ZIP code value. Now, run this second query
USE AdventureWorks

INSERT PostalCodes

VALUES ('r0604')
The value has an alpha character -- r instead of 4 -- but the CHECK constraint catches your typo, as you can see in Figure C.

Figure B

The CHECK constraint accepts five number characters

Figure C

The CHECK constraint rejects any entry that contains an alpha character

Add CHECK programmatically

You can add a CHECK constraint programmatically when you create the table or after by executing a Transact-SQL (T-SQL) ALTER TABLE statement. The following T-SQL adds a new column to the example table:

USE AdventureWorks

ALTER TABLE dbo.PostalCodes

ADD PostalCodeExtended char(9) NULL

GO

ALTER TABLE dbo.PostalCodes

ADD CONSTRAINT CK_PostalCodes_PostalCodeExtended

CHECK (PostalCodeExtended LIKE

'[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]')

GO

The first statement adds a character column named PostalCodeExtended. The next statement uses the ADD CONSTRAINT clause to add a CHECK constraint that forces number characters. The only difference is that this column expects a nine-character ZIP code value. Similarly to the manual example, the constraint will reject any value that contains an alpha character.

Figure D shows what happens when you try to enter a value that's fewer than nine characters or that contains an alpha character. In this case, the table accepts only one value, 406045555.

Figure D

Both the data type and the constraint reject invalid values

You're out!

Laying siege to bad data from the get-go will reduce mistakes and the headaches later on. Defining the appropriate data type is your first line of defense. When data types aren't enough, SQL Server's CHECK constraint is a powerful ally.

If you've upgraded a legacy database that's still using rules (the predecessor to constraints), convert them to constraints. There's no guarantee that subsequent versions of SQL Server will continue to support rules. Converting rules before they're extinct will save you a lot of time and trouble later on.

To learn more about constraints visit:

http://msdn2.microsoft.com/en-us/library/ms190273.aspx

http://msdn2.microsoft.com/en-us/library/ms190273.aspx

Susan Sales Harkins is an independent consultant and the author of several articles and books on database technologies. Her most recent book is "Mastering Microsoft SQL Server 2005 Express," with Mike Gunderloy, published by Sybex. Other collaborations with Mike Gunderloy are "Automating Microsoft Access 2003 with VBA," "Upgrader's Guide to Microsoft Office System 2003," "ICDL Exam Cram 2," and "Absolute Beginner's Guide to Microsoft Access 2003" all by Que. Currently, Susan volunteers as the Publications Director for Database Advisors at http://www.databaseadvisors.com. You can reach her at ssharkins@gmail.com.

About

Susan Sales Harkins is an IT consultant, specializing in desktop solutions. Previously, she was editor in chief for The Cobb Group, the world's largest publisher of technical journals.

55 comments
Jaqui
Jaqui

not every country has numeric only postal codes. here in Canada, a valid postal code is 3 numbers AND 3 letters. [ my own for example: V6Z 1W1 ] if you insist on making postal / zip code sections numeric only you are chasing millions of potential customers away.

SuperBoy
SuperBoy

While the author cited avoiding math functions as a reason to avoid numerical columns, another reason is to have full control of the representation. Zip codes, and social security numbers may have leading zeros. Leading zeros are most easily stored / represented in char columns. The author might want to also link to discussion of when using char versus varchar make sense -- limited, known length char strings are best in a char column, where variable length strings (street names, names of people) are better as a column of type VARiableCHARacter.

michael.adel
michael.adel

But don't force the user to validate for you. Here is an example that I'm sure many of you have come across. You've entered a phone number into a form and the phone number gets rejected because you typed a hypen. People don't type their phone number as 4155551234 but rather (415) 555-1234 (or the like). Shouldn't the programmer deal with the parens and dashes? Or design the input so the area code, exchange and number are in seperate fields. (If you do this please use auto tab - so the cursor automatically moves to the next field.) We should make data entry easy for users and deal with the validation programmatically.

Dan Aquinas
Dan Aquinas

While ensuring valid data is always a great thing, the article misses the other half of the problem; that is, what is the mechanism(s) available to convey to the user what the problem is, and what "constraints" they have violated, and most importantly, in a manner the user understands. I do not trust that whatever verbage SQL Server provides for a "constraint violation" will meed the requirement of clearly conveying to the user what the problem was, and how they can correct it.

dminder
dminder

Is it possible to use Regex checks in constraints?? I know that they can be implemented into a program, but it would save alot of time and effort when you have cross platform programs as well as cross language programs that all use the same data..... Thanks and wonderful article!!

harveyjohn06
harveyjohn06

can it be used to check if the user entered a valid email address? like can it check if the data entered has the "@website.com" expression? if yes, then how? by the way, i'm just a second year college student so i still don't know much of these stuffs. would really appreciate your help, thanks!

The Ref
The Ref

While check constraints are very useful, what is also needed is knowing the rules for those constraints. Zip codes are location specific, so a 5 digit zip code will stop many other countries entering details. UK have 6 Alpha Numeric character postcodes, Australian postcodes have 4 digits. Phone numbers are similar, most numbers are international plus 10 digits, but there are probably exceptions. For example, I would enter '+617 1234 5678' into a phone entry field. Making constraints too rigid limits usability. On positive side, constraints are a perfect way to limit user input to prevent things like cross site scripting.

ssharkins
ssharkins

You're absolutely right about postal codes varying in form. The article, however, is about forcing only numeric characters in a character column. The article isn't about handling postal codes, that was just the example I chose, hence the article's title. However, dealing with all those possibilities is definitely a valid discussion here and I'm glad the article has prompted it. I liked the solution presented yesterday -- provide the appropriate constraints and apply them conditionally, depending upon the country value.

admin
admin

Isn't all that compelling to me. By that same logic would one store dollar amounts as VARCHAR so that you can include the commas and dollar signs that are commonly used to display them? Or maybe it is possible to maintain "full control of the representation" on the application layer without resorting to storing *everything* as a string?

ssharkins
ssharkins

The discussion's wide open -- I don't have a link on me :) -- but will see what I can find. In the meantime, there's plenty of time for a discussion on the subject here.

Tony Hopkinson
Tony Hopkinson

Do you store with leading zeros or any other persentation related modifiers, or do you present the raw data with suitable ones. The answer is of course, depends. There are a lot of advantages to chars as opposed to varchars, my rule is to use char if I can but never to try to get away with using one. If it's char(5), it should be five characters, not less, do that trick and you run into all sorts of funnies. Using varchar just in case though, means you need to beef up your spec.

ssharkins
ssharkins

I agree with your final statement -- data entry should be as easy as possible. Unfortunately, you can't cater to each individual operator. Sometimes it's easier, and proper, to retrain people than to try to accommodate everyone. It takes only a few records for someone to catch on and stop entering a hyphen or ()'s. The truth is, entering them is inefficient -- they are wasted keystrokes. But that only really matters if they're processing tons of records. It's a question of balance more than one solution for all. As long as the form tells you what's wrong with the original entry so you can fix it -- well, I think it's Okay. What frustrates me is a form that doesn't really give me a hint to what's wrong! The separate fields and auto tab is an interesting solution. Does that work well for you? Access has an interesting attribute called the Input Mask, that displays these literal characters, so the user doesn't enter them. However, they're not stored with the actual entry.

Tony Hopkinson
Tony Hopkinson

You return an error number say 6756544 and then trap it. Then you show a message box along the lines of Something 'as gone wroong (6756544)! Deliberate spelling mistake, because all us coder types are interested in that the message box popped up. Nice to see someone else who's been at the sharp end chipping in. Shockingly this sort of drivel goes on a lot, the only one you get more often is the business type enhancement leave out the error number, it will confuse people. Backend validations should only be used to check that the program isn't violating data integrity. If it's user input, check it before you send it to the database, otherwise you'll end up in a convoluted unmaintainable mess. You do realise someone is going suggest raising message numbers in the DB to communicate back error states to the UI as the 'easy' solution, don't you?

ssharkins
ssharkins

Succinct and lucid -- you can't imagine how much I appreciate your comment. Thank you!

admin
admin

But you have to create a SQLCLR function to access the .NET regular expression classes to do it. The LIKE expression syntax does not allow very complex validations like email addresses.

admin
admin

Regular expressions are the easiest/most complete way to check for valid email addresses. This requires using SQLCLR UDF's to access the .NET regular expression classes. Email address validation is not simple because there are a lot of different types of email addresses that are valid, like these: fish@ocean.com john.q.smith@unknown.people.org harold1234@192.168.10.3 lone_ranger@hiyo.silver.tv Etc. Accounting for all these variations with the LIKE predicate would be a pretty complex undertaking. Regular expressions allow grouping, quantifiers, backtracking, etc., that allows you to handle any variant pretty easily. LIKE predicate expressions allow only simple pattern matching.

ssharkins
ssharkins

I'm not sure CHECK is the right way to go with this validation. You could use a string function to make sure that @ and .com are in the address. How do you guys validate email address entries?

michael.adel
michael.adel

Who doesn't know what "cross site scripting" is? A brief description would be useful. mike

admin
admin

I usually store constraint patterns (either LIKE predicate expressions or regular expressions) in a table, keyed off of the locale. If I know that an address I am storing is Canadian (user chose Canada at input time), then I can compare the postal code to the pattern for Canadian postal codes ("A0A 0A0"). If the address is U.S. then I can compare to the pattern for U.S. postal codes ("00000" or "00000-0000"), etc. You can do the same for other types of data as well.

ssharkins
ssharkins

The example is just a means to an end -- a valid string comprised of numeric characters. You're right though -- we're seldom lucky enough to get a column that's this rigid and postal codes are a good example. Even here in the US we have the 5 of 9 digit problem. Opening a postal code column to an international market totally changes the rules.

Jaqui
Jaqui

with using something like that as an example is the number of people who would ignore the differences based on county and code their application as in the example. While vetting user input is definitely something to be promoted, the track record of a lot of developers, specially for web based apps, says we should be really cautious about picking examples, since they will most likely be used as is by someone.

Tony Hopkinson
Tony Hopkinson

Generally you find leading zeros in intelligent numbers, most of which are f'ing dumb. I'd rather have a function to convert to and from, as long as it's not going to get hammered, in say a join query with data from another system or some such.

Tony Hopkinson
Tony Hopkinson

is a boolean property of the component. If you can't do it from access they must have cabbaged it somewhere.

ssharkins
ssharkins

Tony, you do rant so well... :) I have a rather unique problem. When presenting a solution, I have to be succinct and focused. More often than not, I can work most error handling into the actual code -- you know what I mean, I'm sure. On the other hand, if I spend a lot of time writing and explaining an error handling routine to go along with the procedure, the article loses its focus and often confuses the reader -- many of my readers aren't professional developers, but people just trying to get their work done. If I don't include a basic error handler, I get ugly email -- "how unprofessional..." I agree, an error routine that simply regurgitates the system's information isn't helpful in a polished product, but for my purposes (articles) -- it seems a good compromise. I always include a disclaimer -- "...be sure to test this thoroughly on your system and accommodate errors..." Also, I think during the training process basic error routines work well to keep the would-be developer informed without interfering too much. It's hard for someone without the right expertise to plan for and handle errors before they even occur. Having said all that, I do agree with you -- a finished product should handle errors, not just share system information.

admin
admin

I can't respond to your last post ("message limit reached"), so I'm responding here. Based on your post, I assume you don't take advantage of MySQL's regular expression functionality (REGEXP and RLIKE). These items are platform-specific extensions, and "tie you" to the platform. I'd bet a shiny new nickel that I could look at a few hundred lines of code from your shop and point out plenty of platform-specific functionality you use on either platform. I don't think it's so much a question of whether or not you use platform-specific functionality, I think it's a matter of where do you draw the line. I'd be interested to see a platform-independent method of validating email addresses using only standard ISO SQL.

ssharkins
ssharkins

Personally, I think it would be easier to just confirm the email than to try to come up with a validating routine that works 100% of the time. Of course, that only works if the application is Internet based and many aren't -- nothing I work on is.

harveyjohn06
harveyjohn06

what is that SQLCLR UDF? is it a function called in SQL server or in the .NET framework? can you show me how it is done? i really want to learn more about these things.. thanks

Tony Hopkinson
Tony Hopkinson

You just moved to MS SQL 2005 onwards and the OS's that will run it, introduced a potential framework dependancy, a performance deficit and tied yourself to MS. If none of this matters enough go right ahead. The single biggest advantage of the using .net for a validation like this, is you can use the same one in all layers of the system. So you could do it clientsside web or in a straight gui, and server side with one piece of code. Course to do that the function should take a string, return a boolean and do nothing but return the result of the match.

The Ref
The Ref

XSS, or cross site scripting, is a way to get a web page to execute malicious code on a client browser. A good description is at http://en.wikipedia.org/wiki/Cross-site_scripting If you put the following into a database field, and this field gets displayed in a browser, you will pop up an alert. With a bit of thought you could make a popup that prompts for username and password. As the user is in a trusted application they may send you their login details. (I am unsure how this will appear when posted) alert("hello")Hello

ssharkins
ssharkins

So, you use the country to determine the postal code's constraint? Great solution -- and so simple.

admin
admin

Thank you for a very good article. Sometimes it's easy to get stuck on the examples and lose the overall picture, like thinking your article is about ZIP Code validation when it's really about integrity constraints and the LIKE predicate. The only way I could think of to avoid the peripheral internationalization issues discussion is with examples from an internationally-defined standard like EANs. You don't have to worry about a Canadian EAN having a different format from a British EAN or U.S. EAN. But then someone would be bound to bring up the issue of check digits :)

Tony Hopkinson
Tony Hopkinson

A lot of people think you flip the locale and you are done, so not true. It's not even true always if you do it right. Beware of auto conversions in variant types under windows for instance., it would appear that MS didn't get internationalisation correct in their own iternal code. Mind you I always consider anyone who uses variants by choice as mentally subnormal anyway.

Tony Hopkinson
Tony Hopkinson

which dbms you use should come into it at all. As long as DateTime is the number of days since?.Number of seconds since midnight and all data goes into the db in an uambiguous format, ISO standard as Admin@ mentioned. All is golden. It would be really nice if ? was the in the f'ing standard as well. It's when you present that date in some format as a string and then use that in downstream systems that all the problems come in. Personally I'd like to see all DBMS refuse any date,(bugger it any value) in a non ISO format, that would make a start. Of course it wipe out about 95% of web db applications on the planet and about 94% of straight GUI DB Client/Server ones as well. Oh and stop using dumbass platform specific representations like MySQL's Time.

admin
admin

Based on our Canadian counterpart's initial statement, my assumption was that we were talking about developers targeting multiple regions from the jump. If we're talking about retrofitting internationalization into a region-specific application, then you're absolutely right. Retrofitting is several orders of magnitude more difficult than building it in from the start. I've had projects where we basically started over from scratch rather than try to retrofit internationalization into it, because of the complexities involved. As I recall, most of the complexities arose from trying to maintain backwards-compatibility with the existing region-specific code.

Tony Hopkinson
Tony Hopkinson

I don't, nor except for those who plan to deploy across multiple regions does anyone else. As you say it's a straight resources argument, internationalisation costs and big style. You ever done any UI translations? They are great. You do this tight well lined up form with all your stuff layed out real nice. Then translate the labels and content to say, german. A casual inspection of the result reveals the word dummkopf, if you thought the tool you bought to do it was going to be OK then it's arschloch. :p

Tony Hopkinson
Tony Hopkinson

I worked for a US company that tried to break in to the UK market (CCC) it was hilarious watching developers on both sides of the pond bump into their ignorance. Well at least until their European adventure got cancelled anyway. You need three currency figures? 1,000.65 EUR is not a valid number? Huh? It takes a great deal of thought, effort and planning, bits of it jump out and bite your ass at the most inconvenient moment, so if you design for the U Of K it will work (fingers crossed), across either pond not a chance. They had no plans to do that at the time, so they didn't want to invest valuable resources in it as a future. I've banged my head against this particular wall, I got concussion and made redundant. Not hack at my US colleagues by the way, you should see the UK only stuff I work on now. :p

ssharkins
ssharkins

I see the internationalism discussion a lot and it is valid, but in a limited way. I realize that TR is a global organization, but I can't write to the global audience anymore than the average developer can develop for that audience. Many of my European colleagues disagree with me -- everything should be written for the global audience. Unfortunately, it's a question of economics, efficiency, and balance. Why would I spend a client's money to internationalize a custom application that's never going to be used outside that client's business, let alone in an international market? I wouldn't. Articles and examples, for better or worse, are going to hit this same wall. I certainly don't want to make anyone angry, but I can't consider all the possible international nuances -- that's one of the reasons I keep my examples as simple and generic as possible. My hope is that when an international reader wants more information, they can start a thread that might prove helpful to that audience.

ssharkins
ssharkins

Thank you admin -- you seem to have a keen understanding of the issues I must balance when writing an article. The only change I would make to the article would be a very stronger disclaimer about the article not attempting to solve global postal code issues at large. I would never knowingly misread a reader. I do appreciate the global postal code issue and I hope you guys continue to discuss possible solutions. We might actually help someone! :)

admin
admin

But as you mentioned, it's all in there. A lot of internationalization stuff is available through the O/S and various standards bodies. If you want to internationalize dates then use the ISO standard "2008-02-03T09:32:46.845" format to avoid any ambiguity regardless of regional settings. Want to know the decimal point separator in use in a given region? Look it up in the standard, or for locally installed apps use the local O/S settings. The fact that developers don't pay attention to the standards on the first go-round only means they'll be "service-packing" the copies they actually sell from now until they go out of business.

Jaqui
Jaqui

is that currently, the most common internationalization db powered apps are website scripts, and those "designing" them mostly shouldn't be coding. The article should have [i]mentioned[/i] the internationalization issues, since it's not just postal / zip codes that are different. That way those screwups who don't have a clue would stat to get one. or, as the author mentioned in her first reply to me, discussing the different issues for internationalization for a db powered app in the article discussion is a good thing.

Tony Hopkinson
Tony Hopkinson

But internationalisation, with a new codebase is a class one nightmare. Doing it to an existing one, well don't even go there. All the stuff is there, it just doesn't get used as a matter of course. If MyDouble.Isvalid() return MyDouble.ToString(); else return "0.0"; oops it's "0,0" in mainland europe... So is it the 3rd of Feb or the 2nd of March today? It's a good job everybody's regional setting are correct and consistent otherwwise we'd never get anywhere. :( :( :(

admin
admin

That any developer who writes an application that stores region-specific data, like Canadian (or German, or British, etc.) addresses pretty much knows they have to account for regional data constraints. Anyone who doesn't know that either 1) has no business writing applications for the international market or 2) will learn particularly quickly once they try to demo the app for their first international client. The article is very good as it stands, and the author made the right choice in giving a narrow example. Trying to expand the example to cover the 200+ countries defined by ISO 3166 would only confuse the issue. After all, the U.S. and Canada are not the only two countries in the world. I find the argument that most programmers cannot recognize and modify code samples to their own regions to be a little over the top.

Dan Aquinas
Dan Aquinas

Sorry to have posted my reply twice. Hopefully someday web technology will prevent such things from happening.

Dan Aquinas
Dan Aquinas

I found your use of vulgarity to lessen the professionalism of your reply. It's not needed and just gets in the way of understanding your otherwise fine input.

Dan Aquinas
Dan Aquinas

I found your use of vulgarity to lessen the professionalism of your reply. It's not needed and just gets in the way of understanding your otherwise fine input.

ssharkins
ssharkins

I didn't think you were fussing at me, and we agree on the principle -- in a working application, error handling needs to share information with the user and handle the error. I'm always straddling the fence between what readers need to understand the solution I'm providing and what some developer's going to complain about because I set off his/her pet peeve alarm. :) Truth be told, I actually prefer to provide code without any error handling at all because those generic things can actually mask problems if the code's not perfect, and well... I'm not perfect. :)

Tony Hopkinson
Tony Hopkinson

Here's an even simpler scenario Customer's to Orders. Of course you have a primary key on both and a foreign key to link the two. In your UI though, you'll almost certainly have some thing along the lines of a Customer pick list, so you can never have an order without a customer, or the same customer twice. Why are the DB rules necessary?. Because you might mess up the code. Now hands up who thinks this is a suitable error to the user. Primary Key Violation on PK_Order, Damn good one for developers and QA though. You keep up the what, you can rely on us for when and why. PS it would be interesting to see some this is how you would do the same thing without MS written all over it.

Tony Hopkinson
Tony Hopkinson

So would I ! Why isn't there one? Why doesn't the standard change so something like Regex can be called with a standard signature? A more cynical b'stard than I might suggest that if it did supporting multiple databases would be less problematic, reducing the apparent benefit of vendor dependance. But hey, I'm sure nothing like that's happening after all look how often we get a new standard. :p FYI because it would mean I'd have to do the same thing at least three times to do validation on the backend. We don't, front end once, writes to a varchar. Crap, but cost effective, and yes a nasty admin could make an arse of it, but it's their data so that's their option.

Tony Hopkinson
Tony Hopkinson

you have no real incentive to stop being one, and you are aware of the issues surrounding an upgrade to SQL2005, then I completely agree. It's a very elegant solution. It's not one I can currently directly employ in the applications I'm working in because I have to implement for MySQL too, and not necessarily MySQL under windows. As soon as you have to support multiple platforms, you end up with lowest common denominator, or multiple DB layers to make use of whatever is there and present a common interface to the rest of the system. There's never a right answer to that quandary, but constraining yourself to a specific piece of the market (all be it a large one at the at the moment) is not generally an acceptable ssolution, for the guys who pay me the bucks. I should point out this isn't a dig at .NET which I like, or MS whom I don't. It's the same argument for any vendor dependancy, there's a benefit and a cost, forgeting either side of that equation leaves 'you' looking like an arse.

admin
admin

...is everywhere. Even in vendor implementations of standard SQL, not all vendors implement the same option. End result: instant platform dependency. Throw in the non-standard vendor extensions and the platform dependency grows. I'm simply pointing out that platform dependency is not unique to SQLCLR nor SQL Server in general. In this instance, I think SQLCLR is an excellent choice to solve this particular problem. It can expose regular expression functionality, similar to that provided by Oracle, which in fact could help portability between platforms. After all, it's a lot easier to port code that uses platform-specific features if you have similar or equivalent functionality on the target DBMS.

Tony Hopkinson
Tony Hopkinson

I'd never recommend you do validation just in one layer, I'm a big believer in doing it everywhere you can. Whether that's through static typing, asseerts , plain old if statements, check constraints, using SPs.... All I was saying is that using the tech you suggested will make you dependant. It might be a wise, practical or even good decision, but it's not one that should be made in ignorance. There's too much of that going around already.

admin
admin

...And use their proprietary built-in Regular Expression functionality, tying yourself to Oracle, introducing a potential platform dependency, and a performance deficit. At some point it helps to ask yourself how hardcore you are about avoiding platform dependencies. If you want to avoid all non-ISO 9075 features then you need to code in plain-jane ISO SQL, which means all those PL/SQL, T-SQL, and non-ISO procedural (and other) features that you use every day are out of the picture. It also means you can't use platform-specific features that create dependencies and tie you to a specific platform, like Indexes (which are not defined by the ISO standard). Of course you could use proprietary client-side procedural functionality, and create humongous WHILE loops, etc., that break an email address up and check it's constituent components one at a time. You can always do these validations client-side, so long as data integrity at the database level is not imperative. That is to say, you can guarantee that no sysadmin will ever load a bunch of data directly into the tables, bypassing the UI. Or you can say that you explicitly trust (100%) any and all data the sysadmins load into the database when they bypass your UI. It's far too big an assumption for me to make, although everyone has their own point of view.

admin
admin

As long as you can guarantee that your application will provide a country. Most internationalized applications that I work with do guarantee that a country code will be passed back from the UI. If no country code is passed you still have some options: you can either accept everything the user keys in, or apply rules for a default country.

Editor's Picks