2006. január 20., péntek

Towards a more accurate sort order


Problem/Question/Abstract:

Sorting Addresses is a pain at the best of times, especially when a client supplies bad data (You may define clear fields in your DB, but when the data comes in, does it fit easily??)
This attempts to resolve this issue

Answer:

unit AddrSortOrder;

{The custom sort order is used to deal with the fact that the
house and flat numbers are sorted as strings. They are stored as
strings to allow things like '150-175' as a house number, or '3a',
or perhaps even simply a flat 'A'.
The need for a custom sort order is caused by the fact that with an
ordinary ASCII sort order '4' will appear after '30'. This is not
desirable behaviour.

This approach to fix this problem is to look for the first number
in the string (if there is one) and then use this as some kind of
primary sort order. The rest of the sorting will then be done on
the remaining characters (with preceding and trailing spaces
stripped out), based on the ASCII value of their upper-
case varients. Potential problems caused by this approach include
(but are not limited to) the use of accented characters will
possibly cause strange orderings and furthermore, if there is a block
of flats with three floors A, B, C for example then supposing the
flats on those floors are A1, A2, A3, B1, B2, B3 then the ordering
of records will not be ideal - this approach will sort them as
A1, B1, A2, B2, A3, B3. This behaviour is regrettable, but
acceptable - we cannot tell that it is not flat A on floor 1 for
example. It's unlikely that we will be able to find a sort order
that always produces ideal results.
Some examples of sorted lists (not all ideal):
EXAMPLE 1       EXAMPLE 2        EXAMPLE 3
  Flat 1          1                 A
  Flat 2          -2                B
  3               2-4               C
  3B              3a                1
  Flat 3A         5                 2
}

interface

uses SysUtils;

function CalcSortIndex(NumStr: string): double;

implementation

function CalcSortIndex(NumStr: string): double;
var
  strlength, i, j, tmp: integer;
  found: boolean;
  numpart, strpart, divisor: double;
  choppedstr: string;
begin
  //This function will return the sort index value for the string passed

  strlength := length(NumStr);
  if strlength = 0 then
  begin
    result := 0;
    exit;
  end;

  found := false;

  //split the string into a 'number' and a 'string' part..

  //initialise
  choppedstr := numstr;
  numpart := 0;

  //Locate the first digit (if there)
  for i := 1 to strlength do
  begin
    if numstr[i] in ['0'..'9'] then
    begin
      found := true; //First digit found!!
      break;
    end;
  end; //for i..

  if found then
  begin
    //now get the to the end of the digits..
    found := false;
    for j := i to strlength do
    begin
      if not (numstr[j] in ['0'..'9']) then
      begin
        found := true; //end of digits found
        break;
      end;
    end; //for j..

    //Separate out the string parts
    if found then
    begin
      //Number was embedded..
      val(copy(numstr, i, j - i), numpart, tmp);
      Delete(choppedstr, i, j - i);
    end
    else
    begin
      //Number went to the end of the string
      val(copy(numstr, i, strlength), numpart, tmp);
      Delete(choppedstr, i, strlength);
    end;
  end;

  choppedstr := Uppercase(trim(choppedstr));
  strlength := length(choppedstr);

  //evaluate a number for the remaining part of the string
  strpart := 0;
  divisor := 1;

  for i := 1 to strlength do
  begin
    divisor := divisor / 256;
    //convert from Char to single using a variant conversion
    strpart := strpart + (ord(choppedstr[i]) * divisor);
  end;

  //All done, return the value
  result := numpart + strpart;
end;

end.

NB a version of this Algorithm for MSSQL7 is also posted (Title "Towards a more accurate sort order in MSSQL7")

Nincsenek megjegyzések:

Megjegyzés küldése