2007. augusztus 30., csütörtök

How to split a string when the substrings are separated by more than one space character


Problem/Question/Abstract:

I have a lot of the following lines:

02-07-01 12:05:30  XXX     AAAAAA 100 BBBBB   666666        300
3700     -555.00     4.00

The only way to separate those items in the string is if there are more than 2 spaces between the items. There can also be 3 to 5 spaces actually. If there is one space between strings - it is probably part of the bigger item as shown above: 'AAAAAA 100 BBBBB'. What would be the simplest way to split this string? I looked at delimitedText but I am not sure of if it is going to help me.

Answer:

Solve 1:

See routine below. Is this data you have in a fixed-width column format produced by another program? If so you cannot count on fields being separated by more than one space! In fact you may have cases where there is no space between the fields because a value fills the whole field width! For such files you have to use a different strategy to parse the lines.

{SplitDataString:
Dissect a string of items separated by more than one space character
Param S contains the string to split, param list takes the items obtained from S
Precondition: list <> nil

Description:
An item cannot start or end with a space but it may contain space characters flanked
by non-space characters. The routine does not support multibyte character sets as it
is implemented now.

Created 28.7.2002 by P. Below}

procedure SplitDataString(S: string; list: TStrings);
var
  startindex: Integer;

  function HasNextItem: Boolean;
  begin
    {We do not support a "Item" starting with a space!}
    while (startindex <= Length(S)) and (S[startindex] = #32) do
      Inc(startindex);
    Result := startindex <= Length(S);
  end;

  function GetNextItem: string;
  var
    endindex: Integer;
  begin
    for endindex := startindex + 1 to Length(S) do
    begin
      if S[endindex] = ' ' then
        if S[endindex + 1] = ' ' then
        begin
          {found end of a Item}
          Result := Copy(S, startindex, endindex - startindex);
          startindex := endindex + 2;
          Exit;
        end;
    end;
    {If we end here Item is the last in S}
    Result := Copy(S, startindex, maxint);
    startindex := Length(S) + 1;
  end;

begin
  Assert(Assigned(list));
  {remove whitespace from start and end of string}
  S := Trim(S);
  startindex := 1;
  while HasNextItem do
    list.Add(GetNextItem);
end;

Example of use:

procedure TForm1.Button1Click(Sender: TObject);
begin
  memo1.clear;
  SplitDataString('02-07-01 12:05:30  XXX     AAAAAA 100 BBBBB   666666        300                                            ' + '3700      -555.00     4.00', memo1.lines);
end;


Solve 2:

function SepSpacedOutStr(s: string): string;
var
  i, x: integer;
begin
  s := SysUtils.Trim(s);
  if s <> '' then
  begin
    SetLength(result, Length(s));
    x := 0;
    i := 1;
    while i <= Length(s) do
    begin
      if (s[i] <> #32) or ((i < Length(s)) and (s[i + 1] <> #32)) then
      begin
        Inc(x);
        result[x] := s[i];
      end
      else
      begin
        if (i < Length(s)) and (s[i + 1] = #32) then
        begin
          Inc(x);
          result[x] := ',';
          Inc(x);
          result[x] := #32;
          while (i < Length(s)) and (s[i + 1] = #32) do
            Inc(i);
        end;
      end;
      Inc(i);
    end;
    SetLength(result, x);
  end
  else
    result := '';
end;

Nincsenek megjegyzések:

Megjegyzés küldése