2006. január 30., hétfő

Detect UNIX textfiles


Problem/Question/Abstract:

How to detect if an ASCII textfile uses UNIX or Windows linebreaks?
This function will detect if a textfile is a Windows or UNIX textfile, and while we're at it, let's show two versions of the same function, one beautful and one less beautful.

Answer:

First of all, the reason is because Windows uses CRLF ($0D $0A or #13 #10) and UNIX/Linux uses just LF ($0A or #10) as linebreaks in textfiles.

The need to do it is because when using the Readln procedure it will not work on UNIX files because it cannot detect the linebreak. Instead of seeing your application go crazy it might be a nice thing to detect if it's a UNIX file or not in advance, and then provide the option to convert it if necessary.

The way to detect if it's a UNIX or Windows file is to spot the difference, i.e. to see if a CR char precedes the LF char.

Here is a go at it:

function IsFileUNIX(Filename: string): boolean;
var
  StopRead: boolean;
  F: file of Byte;
  CurB, PrevB: Byte;
begin
  StopRead := False;
  PrevB := 0;
  Result := True;

  AssignFile(F, Filename);
  FileMode := 0; // read only
  Reset(F);

  while (not Eof(F)) and (StopRead = False) do
  begin
    Read(F, CurB);

    // check if $0D precedes $0A
    if CurB = $0A then
    begin
      Result := PrevB <> $0D;
      StopRead := True;
    end;

    PrevB := CurB;
  end;
end;

Well, this function did what I wanted, however, I thought it looked kind of ugly so I began to think a little bit how I may use the same principle, but execute it with fewer statements and make the function a little bit more beautiful.

Simply replacing the while loop with a repeat loop did miracles, here's the second go at it:

function IsFileUNIX2(Filename: string): boolean;
var
  F: file of Byte;
  CurB, PrevB: Byte;
begin
  AssignFile(F, Filename);
  FileMode := 0; // read only
  Reset(F);

  repeat
    PrevB := CurB;
    Read(F, CurB);
  until (CurB = $0A) or (Eof(F));

  // check if $0D precedes $0A
  Result := PrevB <> $0D;
end;

Nincsenek megjegyzések:

Megjegyzés küldése