2010. október 14., csütörtök

How to sort a TStringList by numerical value using the Heapsort algorithm


Problem/Question/Abstract:

I cannot use the Sort method in TStringList as I would like to sort by Integer. My TStringList is filled with numbers such as:

20, 12, 1, 23, 54, 32

Of course, they're converted to string before being added to TStringList. What is a fast algorithm to achieve this sort? I normally have less than 50 items in my TStringList, if that is a factor.

Answer:

You'd end up doing a lot of conversions using StrToInt, which is wasteful, so I would recommend that you create a

type
PInteger = ^Integer type

store all of the StrToInt values in the TStringList.Objects array, and then when you use the sort, do your comparisons based on

PInteger(SL.Objects[Idx])^

The quicksort that TStringList uses (see CLASSES.PAS) uses a very simple partition function, which is completely unaware of the data it's sorting. It's using the midpoint index to begin to decide where to start partitioning, which is just as reliable as picking a random number when deciding how to sort. If, for example, you had a big list of items that was already sorted in the reverse direction, and you used this quicksort on it, and would call itself recursively once for every element in the list! Now, when you take into account that you're pushing a few items on the stack (the return address as well as the parameters as well as the registers you are saving) it might not take too long for your 16K of stack space to get eaten up (16,384 bytes divided by about maybe 32 bytes (and that's being pretty optimistic!) is about 2048 items before you run the risk of killing the stack!). The MaxListSize in CLASSES is 16380 (65520 div sizeof (Pointer)), so it's certainly possible to cause this problem.

Remember that TStringList.Sort is declared as virtual, so if you wanted to override it, you certainly could in a class derived from TStringList.

Also mind that the odds of anyone having to sort this much data (2000 items) seems pretty remote (correct me, anyone, if you've ever sorted more than 2000 strings in an application). The most reliable sort with the same running time as QuickSort is a HeapSort. They both run in O(N lg N) time, whereas sorts like the InsertionSort (which someone mentioned) and BubbleSort (which someone else mentioned) run in O(N^2) time, on the average.

The biggest differences between HeapSort and QuickSort, in terms of their run time and storage are:

HeapSort only calls itself recursively at most lg N times, where as QuickSort could call itself recursively N times (big difference, like 10 vs 1024, or 32 vs 2^32);
The worst case upper bound time on HeapSort is only O(N lg N), whereas in the worst case for QuickSort, the running time is O(N^2).


program H;

uses
  WinCrt, SysUtils;

const
  min = 10;
  max = 13;
  maxHeap = 1 shl max;

type
  heap = array[1..maxHeap] of integer;
  heapBase = ^heap;

var
  currentSize, heapSize: integer;
  A: heapBase;

procedure SwapInts(var a, b: integer);
var
  t: integer;
begin
  t := a;
  a := b;
  b := t
end;

procedure InitHeap(size: integer);
var
  i: integer;
begin
  heapSize := size;
  currentSize := size;
  Randomize;
  for i := 1 to size do
    A^[i] := Random(size) + 1;
end;

procedure Heapify(i: integer);
var
  left, right, largest: integer;
begin
  largest := i;
  left := 2 * i;
  right := left + 1;
  if left <= heapSize then
    if A^[left] > A^[i] then
      largest := left;
  if right <= heapSize then
    if A^[right] > A^[largest] then
      largest := right;
  if largest <> i then
  begin
    SwapInts(A^[largest], A^[i]);
    Heapify(largest)
  end;
end;

procedure BuildHeap;
var
  i: integer;
begin
  for i := heapSize div 2 downto 1 do
    Heapify(i)
end;

procedure HeapSort;
var
  i: integer;
begin
  BuildHeap;
  for i := currentSize downto 2 do
  begin
    SwapInts(A^[i], A^[1]);
    dec(heapSize);
    Heapify(1)
  end;
end;

type
  TAvgTimes = array[min..max] of TDateTime;
var
  sTime, eTime, tTime: TDateTime;
  i, idx, size: integer;
  avgTimes: TAvgTimes;
begin
  tTime := 0;
  i := min;
  size := 1 shl min;
  new(A);
  while i <= max do
  begin
    for idx := 1 to 10 do
    begin
      InitHeap(size);
      sTime := Time;
      HeapSort;
      eTime := Time;
      tTime := tTime + (eTime - sTime)
    end;
    avgTimes[i] := tTime / 10.0;
    inc(i);
    size := size shl 1;
  end;
end.

Nincsenek megjegyzések:

Megjegyzés küldése