Tuesday, April 10, 2007

Use of Regular Expression in Javascript

you need to define the regular expressions that the script will match against your visitors' input:

var fnameRegxp = /^([a-zA-Z]+)$/;

This statement checks that only upper or lowercase case letters, repeated one or more times, pass the validation test, which, unless you’re hoping to send your newsletter to C3PO, should be the case. Remember when I mentioned that regular expressions can still return true if there are incorrect characters present, provided that the correct pattern of characters is somewhere within the string? Putting the circumflex and dollar sign at the beginning and end of the regular expression ensures that this does not happen, and that the string is only valid if it contains just what you’re asking for.

var lnameRegxp = /^([a-zA-Z]+)$/;
var houseRegxp = /^([0-9A-Za-z]+)$/;

These then check that the surname entered is also any upper or lowercase character repeated one or more times, and that the house name consists of just numbers and letters. You could have shortened this to /^([\w]+)$/ using the shorthand escape code for "any word character," but that would allow underscores to be used, which rarely feature in property names.

var pcodeRegxp = /^([A-Za-z]{1,2})([0-9]{2,3})([A-Za-z]{2})$/;
var telnoRegxp = /^([0-9]{11})$/;

I’ve used local examples for the post code (the UK version of a zip code) and telephone regular expressions. UK postcodes are in a format consisting of one or two letters, followed by two or three numbers (depending on the county), and followed again by two letters. It should be easy to see how you could change this to match your own local form of postal or zip code and telephone number formats. The phone number check simply ensures that the correct number of numbers is present. Following these comes the most complex of regular expressions -- those that check for valid email addresses and URLs:

var emailRegxp = /^([\w]+)(.[\w]+)*@([\w]+)(.[\w]{2,3}){1,2}$/;
var urlRegxp = /^(http:\/\/www.|https:\/\/www.|ftp:\/\/www.|www.){1}([\w]+)(.[\w]+)/;

Due to sub-domains, there may be any number of characters and dots preceding the @ sign. The first of these expressions says that any word character displayed one or more times can then be followed by a dot, then any number of word characters displayed zero or more times, followed by the @ symbol, followed by any word character displayed one or more times, followed by a dot and two or three word characters repeated at least once but no more than twice, so email addresses ending in .com or .co.uk will pass, whereas .co.uk.com would fail. Similarly, the URL may begin with either http://www. or https://www. or ftp://www. or just www. once followed by any word character one or more times, followed by a dot and any number of word characters at least once but no more than twice.
Finally, the date of birth check allows dates in the format dd/mm/yyyy or dd-mm-yyyy, both formats being equally as popular:

var dobRegxp = /^([0-9]){2}(\/|-){1}([0-9]){2}(\/|-)([0-9]){4}$/;

We then need to actually test that each of the values submitted are in the correct format. I accomplished this by using a series of nested if statements and alerts. This method is useful for demonstration and testing purposes, although in reality, a for…next loop would probably be more efficient, and some kind of color highlighting scheme that would flag the erroneous input values a different color, say red, on the form itself would be more professional. The example if statements and alerts however, can be constructed as follows:

if (fnameRegxp.test(fname) != true)
alert("First name appears to be incorrect");
if (lnameRegxp.test(lname) != true)
alert("Last name appears to be incorrect");
if (houseRegxp.test(house) != true)
alert("Address 1 appears to be incorrect");
if (pcodeRegxp.test(pcode) != true)
alert("Address 2 appears to be incorrect");
if (telnoRegxp.test(telno) != true)
alert("Telephone number appears to be incorrect");
if (emailRegxp.test(email) != true)
alert("Email address appears to be incorrect");
if (email != verEmail)
alert("Email appears to be incorrect");
if (urlRegxp.test(url) != true)
alert("URL appears to be incorrect");
if (dobRegxp.test(dob) != true)
alert("Date of Birth appears to be incorrect");

Notice that instead of using a regular expression to check that the second email address entered (that should be the same as the first for verification purposes) is correct, we simply check that its value is exactly the same as the first email address entered.
Finally, if all of the data is in the correct format, we need this program to output a "Data Correct" alert and change the value of the action property of the form to the name of the cgi function that will process the data. Once again, against a professional backdrop, the true alert would probably be removed in favor of a fresh page thanking the visitor for their time, but this is useful for demonstration and

testing purposes:
else {
alert("Data Correct");
document.myForm.action.value = "process.cgi";
}
}

Another benefit to using regular expressions to validate user input is that any fields that are checked against a regular expression fail if the field is left blank, so you don’t have to write any separate omission checking functions.
Explaining regular expressions is almost as difficult as coding them. It will be far easier for you to see what I mean by writing and playing around with them yourself. Unfortunately, it is not possible to be 100 percent certain that all information entered is correct, but with regular expressions you can at least be sure that 99 percent of the data is correct.
Although this may seem like a cumbersome amount of code, without regular expressions, it would be ten times as long. Using client-side validation is not the securest way to validate your forms, but it may save processing time on your server by ensuring that only correct data is passed to it in the first place. Other than that, JavaScript is both quick and easy to implement and the regular expressions subset of the language is simpler than that of some of the more powerful Web programming languages, so for sites that don’t need maximum security, it is certainly worth considering.

No comments: