2009. május 30., szombat
Validating email addresses in Delphi
Problem/Question/Abstract:
Is an email address valid?
Answer:
Nowadays it's very common that our programs store email addresses in databases as part of the data of personnel, customers, providers, etc. When prompting the user for an email address, how do we know if the entered value is formally correct? In this article I'll show you how to validate email addresses using a variation of the RFC #822.
The RFC #822 rules the "STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES".
According to this rule, the following are valid email addresses:
John Doe johndoe@server.com
John Doe
"John Doe" johndoe@server.com
"John Doe"
The purpose of my code is not to validate such things, but strictly what is necessary to reach a single recipient (like "johndoe@server.com"), that in the specification is referred as an "addr-spec", which has the form:
local-part@domain
local-part = one "word" or more, separated by periods
domain = one "sub-domain" or more, separated by periods
A "word" can be an "atom" or a "quoted-string":
atom = one or more chars in the range #33..#126 except ()<>@,;:\/".[]
quoted-string = A text enclosed in double quotes that can contain 0 or
more characters (#0..#127) except '"' and #13. A backslash ('\')
quotes the next character.
A "sub-domain" can be a "domain-ref" (an "atom") or a "domain-literal":
domain-literal = A text enclosed in brackets that can contain 0 or
more characters (#0..#127) except '[', ']' and #13. A backslash ('\')
quotes the next character.
According to the RFC 822, extended characters (#128..#255) cannot be part of an email address, however many mail servers accept them and people use them, so I'm going to take them into account.
The RFC 822 is very open about domain names. For a real Internet email address maybe we should restrict the domain part. You can read more about domain names in the RFC #1034 and RFC #1035.
For the RFC 1034 and the RFC 1035, a domain name is formed by "sub-domains" separated by periods, and each subdomain starts with a letter ('a'..'z', 'A'..'Z') and should be followed by zero or more letters, digits and hyphens, but cannot end with a hyphen. We are going to consider that a valid domain should have at least two "sub-domains" (like "host.com").
Now that we have the rules clear, let's get to the work. The algorithm for the function resembles a states-transition machine. Characters of the string are processed in a loop, and for each character first we determine in which state the machine is and then we process the character accordingly, to determine if the machine should continue in that state, switch to a different state or produce an error (breaking the loop). These kind of algorithms are extensively treated in programming-algorithms textbooks, so let's get right to the code:
function ValidEmail(email: string): boolean;
// Returns True if the email address is valid
// Author: Ernesto D'Spirito
const
// Valid characters in an "atom"
atom_chars = [#33..#255] - ['(', ')', '<', '>', '@', ',', ';', ':',
'\', '/', '"', '.', '[', ']', #127];
// Valid characters in a "quoted-string"
quoted_string_chars = [#0..#255] - ['"', #13, '\'];
// Valid characters in a subdomain
letters = ['A'..'Z', 'a'..'z'];
letters_digits = ['0'..'9', 'A'..'Z', 'a'..'z'];
subdomain_chars = ['-', '0'..'9', 'A'..'Z', 'a'..'z'];
type
States = (STATE_BEGIN, STATE_ATOM, STATE_QTEXT, STATE_QCHAR,
STATE_QUOTE, STATE_LOCAL_PERIOD, STATE_EXPECTING_SUBDOMAIN,
STATE_SUBDOMAIN, STATE_HYPHEN);
var
State: States;
i, n, subdomains: integer;
c: char;
begin
State := STATE_BEGIN;
n := Length(email);
i := 1;
subdomains := 1;
while (i <= n) do
begin
c := email[i];
case State of
STATE_BEGIN:
if c in atom_chars then
State := STATE_ATOM
else if c = '"' then
State := STATE_QTEXT
else
break;
STATE_ATOM:
if c = '@' then
State := STATE_EXPECTING_SUBDOMAIN
else if c = '.' then
State := STATE_LOCAL_PERIOD
else if not (c in atom_chars) then
break;
STATE_QTEXT:
if c = '\' then
State := STATE_QCHAR
else if c = '"' then
State := STATE_QUOTE
else if not (c in quoted_string_chars) then
break;
STATE_QCHAR:
State := STATE_QTEXT;
STATE_QUOTE:
if c = '@' then
State := STATE_EXPECTING_SUBDOMAIN
else if c = '.' then
State := STATE_LOCAL_PERIOD
else
break;
STATE_LOCAL_PERIOD:
if c in atom_chars then
State := STATE_ATOM
else if c = '"' then
State := STATE_QTEXT
else
break;
STATE_EXPECTING_SUBDOMAIN:
if c in letters then
State := STATE_SUBDOMAIN
else
break;
STATE_SUBDOMAIN:
if c = '.' then
begin
inc(subdomains);
State := STATE_EXPECTING_SUBDOMAIN
end
else if c = '-' then
State := STATE_HYPHEN
else if not (c in letters_digits) then
break;
STATE_HYPHEN:
if c in letters_digits then
State := STATE_SUBDOMAIN
else if c <> '-' then
break;
end;
inc(i);
end;
if i <= n then
Result := False
else
Result := (State = STATE_SUBDOMAIN) and (subdomains >= 2);
end;
Any collaboration to improve this function will be welcome.
Copyright (c) 2001 Ernesto De Spirito
Visit: http://www.latiumsoftware.com/delphi-newsletter.php
Feliratkozás:
Megjegyzések küldése (Atom)
Nincsenek megjegyzések:
Megjegyzés küldése