2009. január 11., vasárnap

Count the lines of text contained in a text file


Problem/Question/Abstract:

How to count the lines of text contained in a text file

Answer:

Solve 1:

The fastest way would be to count the instances of #13#10 yourself. However you need to be careful because #13 and #10 could easily be swapped to give #10#13 instead which makes this kind of counting more difficult. In this case it's far easier just to count the instances of one of them and this has the bonus of being more compatible with non-Windows (ie. non CR/LF'd) files - not all operating systems bother with both #13 and #10. The following is a basic implementation of the code:

function CountLines(const FileName: string): integer;
const
  BufferSize = 1024;
  SearchByte = 10;
var
  FileHandle, BytesRead, Index: integer;
  Buffer: array[1..BufferSize] of byte;
begin
  FileHandle := FileOpen(FileName, fmOpenRead or fmShareDenyWrite);
  BytesRead := FileRead(FileHandle, Buffer[1], BufferSize);
  if (BytesRead > 0) then
    Result := 1
  else
    Result := 0;
  repeat
    for Index := 1 to Min(BufferSize, BytesRead) do
    begin
      if (Buffer[Index] = SearchByte) then
        Inc(Result);
    end;
    BytesRead := FileRead(FileHandle, Buffer[1], BufferSize);
  until
    BytesRead <= 0;
  FileClose(FileHandle);
end;

This code is searching for #10's in the file, and treating this as a line delimeter. It takes care of the case where an empty file has 0 lines but a file with no #10s has one line in the initialisation of the Result return value. You can easily modify the seach byte and/or the buffer size.


Solve 2:

If it is a smaller file (< 1 MB) load it into a TStringlist and look at the stringlists Count property. If it is larger you need to read it completely and count lines. A simple loop would be this:

function CountLines(const filename: string): Integer;
var
  buffer: array[0..4095] of Char;
  f: Textfile;
begin
  Result := 0;
  Assignfile(f, filename);
  Reset(f);
  try
    SetTextBuffer(f, buffer, sizeof(buffer));
    while not Eof(f) do
    begin
      readLn(f);
      Inc(result);
    end;
  finally
    Closefile(f);
  end;
end;

Using a larger than the default buffer of 128 bytes speeds the reading somewhat.


Solve 3:

Buffering can help quit a bit:

function TextLineCount_BufferedStream(const Filename: TFileName): Integer;
const
  MAX_BUFFER = 1024 * 1024;
var
  oStream: TFileStream;
  sBuffer: string;
  iBufferSize: Integer;
  iSeek: Integer;
  bCarry: Boolean;
begin
  Result := 0;
  bCarry := False;
  oStream := TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite);
  try
    SetLength(sBuffer, MAX_BUFFER);
    repeat
      iBufferSize := oStream.Read(sBuffer[1], MAX_BUFFER);
      if iBufferSize <= 0 then
        break;
      {Skip LFs that follow a CR - even if it falls in seperate buffers}
      iSeek := 1;
      if bCarry and (sBuffer[1] = #10) then
        Inc(iSeek);
      while iSeek <= iBufferSize do
      begin
        case sBuffer[iSeek] of
          #10:
            Inc(Result);
          #13:
            if iSeek = iBufferSize then
              Inc(Result)
            else if sBuffer[iSeek + 1] <> #10 then
              Inc(Result)
            else
            begin
              Inc(Result);
              Inc(iSeek);
            end;
        end;
        Inc(iSeek);
      end;
      {Set carry flag for next pass}
      bCarry := (sBuffer[iBufferSize] = #13);
    until
      iBufferSize < MAX_BUFFER;
  finally
    FreeAndNil(oStream);
  end;
end;

Nincsenek megjegyzések:

Megjegyzés küldése