2005. augusztus 30., kedd

Convert a Unicode string to a normal string


Problem/Question/Abstract:

I have an application that reads data from a server via winsock. The data sent are in Unicode format and I need to parse out the constituent strings and display in a ListView. They are sent as C strings so the data looks like this: array of chars#0array of chars#0array of chars#0#0. Since the 'array of chars' is actually an array of widechars it also contains #0 bytes in the msb of the character. I tried StringReplace(Intext, #0, '', [rfReplaceAll]); but it does not convert, maybe it cannot go past the first #0 in the input string?

Answer:

Yes. What you need to do here is work with PWideChars. It would have helped, of course, to post a bit more specific information, e.g. what the type of Intext is. Anyway, all you need is a way to get the address of the first widechar in the data. Assuming intext is a String (even though it contains widechars) the process would look like this:

procedure SplitServerWidecharList(const intext: string; list: TStrings);
var
  p: PWideChar;
begin
  Assert(Assigned(list));
  list.Clear;
  if intext <> '' then
  begin
    p := PWideChar(@intext[1]); {points to first widechar}
    while p^ <> #0000 do
    begin
      {Convert this widestring to Ansi and store it}
      list.add(WidecharToString(p));
      {Find end of this widestring}
      while p^ <> #0000 do
        Inc(p);
      {Hop to start of the next one  }
      Inc(p);
    end;
  end;
end;

Can you be sure of the byte order of the received Unicode characters? The code above assumes little-endian byte order, if the data comes in in big-endian byte order you would have to swap the bytes in every widechar before you could process it as above.

Nincsenek megjegyzések:

Megjegyzés küldése