2009. május 30., szombat

Validating email addresses in Delphi


Problem/Question/Abstract:

Is an email address valid?

Answer:

Nowadays it's very common that our programs store email addresses in databases as part of the data of personnel, customers, providers, etc. When prompting the user for an email address, how do we know if the entered value is formally correct? In this article I'll show you how to validate email addresses using a variation of the RFC #822.

The RFC #822 rules the "STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES".

According to this rule, the following are valid email addresses:

  John Doe johndoe@server.com
  John Doe
  "John Doe" johndoe@server.com
  "John Doe"

The purpose of my code is not to validate such things, but strictly what is necessary to reach a single recipient (like "johndoe@server.com"), that in the specification is referred as an "addr-spec", which has the form:

  local-part@domain

  local-part = one "word" or more, separated by periods
  domain = one "sub-domain" or more, separated by periods

A "word" can be an "atom" or a "quoted-string":

  atom = one or more chars in the range #33..#126 except ()<>@,;:\/".[]
  quoted-string = A text enclosed in double quotes that can contain 0 or
    more characters (#0..#127) except '"' and #13. A backslash ('\')
    quotes the next character.

A "sub-domain" can be a "domain-ref" (an "atom") or a "domain-literal":

  domain-literal = A text enclosed in brackets that can contain 0 or
   more characters (#0..#127) except '[', ']' and #13. A backslash ('\')
   quotes the next character.

According to the RFC 822, extended characters (#128..#255) cannot be part of an email address, however many mail servers accept them and people use them, so I'm going to take them into account.

The RFC 822 is very open about domain names. For a real Internet email address maybe we should restrict the domain part. You can read more about domain names in the RFC #1034 and RFC #1035.
  
For the RFC 1034 and the RFC 1035, a domain name is formed by "sub-domains" separated by periods, and each subdomain starts with a letter ('a'..'z', 'A'..'Z') and should be followed by zero or more letters, digits and hyphens, but cannot end with a hyphen. We are going to consider that a valid domain should have at least two "sub-domains" (like "host.com").

Now that we have the rules clear, let's get to the work. The algorithm for the function resembles a states-transition machine. Characters of the string are processed in a loop, and for each character first we determine in which state the machine is and then we process the character accordingly, to determine if the machine should continue in that state, switch to a different state or produce an error (breaking the loop). These kind of algorithms are extensively treated in programming-algorithms textbooks, so let's get right to the code:

function ValidEmail(email: string): boolean;
// Returns True if the email address is valid
// Author: Ernesto D'Spirito
const
  // Valid characters in an "atom"
  atom_chars = [#33..#255] - ['(', ')', '<', '>', '@', ',', ';', ':',
    '\', '/', '"', '.', '[', ']', #127];
  // Valid characters in a "quoted-string"
  quoted_string_chars = [#0..#255] - ['"', #13, '\'];
  // Valid characters in a subdomain
  letters = ['A'..'Z', 'a'..'z'];
  letters_digits = ['0'..'9', 'A'..'Z', 'a'..'z'];
  subdomain_chars = ['-', '0'..'9', 'A'..'Z', 'a'..'z'];
type
  States = (STATE_BEGIN, STATE_ATOM, STATE_QTEXT, STATE_QCHAR,
    STATE_QUOTE, STATE_LOCAL_PERIOD, STATE_EXPECTING_SUBDOMAIN,
    STATE_SUBDOMAIN, STATE_HYPHEN);
var
  State: States;
  i, n, subdomains: integer;
  c: char;
begin
  State := STATE_BEGIN;
  n := Length(email);
  i := 1;
  subdomains := 1;
  while (i <= n) do
  begin
    c := email[i];
    case State of
      STATE_BEGIN:
        if c in atom_chars then
          State := STATE_ATOM
        else if c = '"' then
          State := STATE_QTEXT
        else
          break;
      STATE_ATOM:
        if c = '@' then
          State := STATE_EXPECTING_SUBDOMAIN
        else if c = '.' then
          State := STATE_LOCAL_PERIOD
        else if not (c in atom_chars) then
          break;
      STATE_QTEXT:
        if c = '\' then
          State := STATE_QCHAR
        else if c = '"' then
          State := STATE_QUOTE
        else if not (c in quoted_string_chars) then
          break;
      STATE_QCHAR:
        State := STATE_QTEXT;
      STATE_QUOTE:
        if c = '@' then
          State := STATE_EXPECTING_SUBDOMAIN
        else if c = '.' then
          State := STATE_LOCAL_PERIOD
        else
          break;
      STATE_LOCAL_PERIOD:
        if c in atom_chars then
          State := STATE_ATOM
        else if c = '"' then
          State := STATE_QTEXT
        else
          break;
      STATE_EXPECTING_SUBDOMAIN:
        if c in letters then
          State := STATE_SUBDOMAIN
        else
          break;
      STATE_SUBDOMAIN:
        if c = '.' then
        begin
          inc(subdomains);
          State := STATE_EXPECTING_SUBDOMAIN
        end
        else if c = '-' then
          State := STATE_HYPHEN
        else if not (c in letters_digits) then
          break;
      STATE_HYPHEN:
        if c in letters_digits then
          State := STATE_SUBDOMAIN
        else if c <> '-' then
          break;
    end;
    inc(i);
  end;
  if i <= n then
    Result := False
  else
    Result := (State = STATE_SUBDOMAIN) and (subdomains >= 2);
end;

Any collaboration to improve this function will be welcome.


Copyright (c) 2001 Ernesto De Spirito
Visit: http://www.latiumsoftware.com/delphi-newsletter.php

Nincsenek megjegyzések:

Megjegyzés küldése