2011. május 1., vasárnap

Sets to Strings, and Back

Problem/Question/Abstract:

Employing Sets, Strings, and Run-type Type Information

Answer:

A frequently asked question is: "How can I store a font style in the registry?" You have a number of choices; one approach is to convert the style to a string and store the string. You'll then need to convert the string back to a font style when loading the information from the registry. It's easy to hard-wire a couple of functions to do this for font styles, but it would be more useful to write general-purpose subroutines that work for any set. This article shows you how to take advantage of Delphi's Run-time Type Information (RTTI) to write functions that do just that: convert any set to a string and back again.

About RTTI

RTTI lies at the heart of Delphi's integrated development environment. The difference between a public property, method, or field, and a published one, is that published declarations have RTTI information that is available at run time. The Object Inspector uses RTTI for published properties to help it get and set property values. The form designer uses RTTI for published fields to define a field for each component you drop on a form.

RTTI also lets Delphi manage the lifetime of strings, dynamic arrays, interfaces, and variants. Even if you declare a dynamic array of strings in a record, for example, Delphi can navigate RTTI for the record, array, and string types to decide when it must initialize and finalize the strings and the dynamic array.

A set's RTTI contains a pointer to RTTI for the type that makes up the components of the set - usually an enumerated type, but it can also be an integer or character type. This article examines in depth RTTI for a set type. If you have the Professional or Enterprise Edition, you can read the source code in \Source\Vcl\TypInfo.pas; if you have the Standard Edition, read the interface in \Doc\TypInfo.int. Either file contains the information you need to make sense of RTTI.

A type's RTTI has two parts: type information and type data. The type information is stored in a TTypeInfo record, which contains a type kind and the type's name. The type kind is an enumeration of the various kinds of types in Delphi: tkInteger, tkChar, tkEnumeration, tkSet, and so on. The type kinds are self-explanatory. The built-in TypeInfo function returns a pointer to a type's TTypeInfo record.

TTypeData stores the type data in a variant record. Each type kind refers to different members of the record. For example, tkSet has a pointer to a PTypeInfo for the component type. Ordinal types have the MinValue and MaxValue members to store the range of values defined for the type. See the declaration in TypInfo.pas (or TypInfo.int) for the details of other types' data. If an ordinal type is declared as a subrange of another type, the type data's BaseType member points to the PTypeInfo for the base enumerated type.

Note that the pointers in TypeInfo all use two levels of indirection. For example, a TTypeData member might have the type PPTypeInfo instead of PTypeInfo (that is, pointer to pointer to TTypeInfo instead of a simple pointer to TTypeInfo). If you use packages, the type information for different types can reside in different packages. The extra level of pointers makes it easy for Windows to load the package DLL at any starting address and fix up RTTI pointers.

Sets in Delphi

Delphi stores a set value as a bit mask. A set can contain up to 256 elements, which requires a bitmask with 256 bits. At eight bits per byte, that means the largest possible set occupies 32 bytes. Delphi uses only as many bytes as it needs, so a set of up to eight elements usually fits in a single byte. For example, TFontStyles is a set with up to four members, occupying bits 0 through 3. Delphi ignores the most significant four bits in the byte because they aren't needed. Figure 1 illustrates the arrangement of bits in a byte.


Figure 1: Bit representation of TFontStyles.

I wrote "usually" in the previous paragraph because sets of integers or characters aren't always aligned on byte boundaries. Delphi stores sets so that a member's bit position is always the same for a given ordinal value. In other words, ordinal value 10 always sits at the third bit from the right in a byte. If the set type is, say, "set of 7..10", Delphi stores that set in two bytes, even though it fits in only four bits. The last bit of the first byte stores member 7, and the first three bits of the next byte store members 8, 9, and 10, as shown in Figure 2. This representation of sets results in compact storage and code, and makes it easy to assign set values and test set membership.

Figure 2: Bit representation of an integer set.

This representation for sets also makes it possible to write general-purpose subroutines that can handle any set. The subroutines need RTTI for the set type. The set type's component type specifies the MinValue and MaxValue for the ordinal type, and those values dictate the bitwise representation for the set. Delphi restricts the members of a set to ordinal values in the range 0..255, which means you can treat any set as a set of integers where the integers are a subrange of 0..255. For example, TFontStyles is equivalent to a set of 0..3, where 0 is the ordinal value of fsBold, 1 is fsItalic, and so on. The functions described in this article rely on this trick: Every set is treated as a set of integers, using Run-time Type Information to learn the true enumerated literal for each set member.

SetToString Function

The first - and easier - task is to convert a set to a string. The SetToString function requires the PTypeInfo pointer for the set type, and the set value. To write a subroutine that takes any set as an argument, it must use an untyped parameter to get around Delphi's strict type checking. The function's header is as follows:

function SetToString(Info: PTypeInfo; const Value): string;

To give the caller control over the formatting, overloaded functions take additional parameters for prefix, separator, and suffix strings. (Note that SetToString uses overloaded functions instead of default parameter values. When you have string parameters, you should use overloaded subroutines. Otherwise, every use of the default string parameter results in a separate copy of the string. Using overloaded subroutines avoids the additional overhead of multiple copies of the same string.)

SetToString is a fairly simple function; it iterates over all possible members of the set, tests the bit for that member, and adds the member's literal representation to the Result string if the member is present in the set. Figure 3 shows a first draft of this overloaded function.

const
MaxSet = 255; // Largest ordinal value in a Delphi set.
BitsPerByte = 8;
type
TSet = set of 0..MaxSet;

function SetToString(Info: PTypeInfo; const Value):
string; overload;
begin
Result := SetToString(Info, Value, ',');
end;

function SetToString(Info: PTypeInfo; const Value;
const Separator: string): string; overload;
begin
Result := SetToString(Info, Value, Separator, '[', ']');
end;

function SetToString(Info: PTypeInfo; const Value;
const Separator, Prefix, Suffix: string):
string; overload;
var
CompInfo: PTypeInfo;
CompData: PTypeData;
SetValue: TSet absolute Value;
Element: 0..MaxSet;
begin
CompInfo := GetTypeData(Info)^.CompType^;
CompData := GetTypeData(CompInfo);
Result := '';
for Element := CompData.MinValue to CompData.MaxValue do
begin
if Element in SetValue then
if Result = '' then
Result := Prefix + GetEnumName(CompInfo, Element)
else
Result := Result + Separator +
GetEnumName(CompInfo, Element);
end;
if Result = '' then
Result := Prefix + Suffix
else
Result := Result + Suffix;
end;
Figure 3: First, simple implementation of SetToString.

The GetEnumName function is one of Delphi's standard functions in the TypeInfo unit. It takes an ordinal value, and returns a string literal. If the type is an enumerated type, GetEnumName returns the enumerated literal. If the type is an integer type, the function calls IntToStr to convert the number to a string.

If the set type is a set of characters, though, GetEnumName doesn't work correctly. It doesn't test the type kind, so the result is unpredictable - usually an access violation. To solve this problem, write a function that does the same thing, but support characters, integers, and enumerations. Figure 4 shows the OrdToString function.

// Convert an ordinal value to a string. The ordinal value
// can be an integer, enumerated value, or a character.

function OrdToString(Info: PTypeInfo; Value: Integer):
string;
resourcestring
sCvtError =
'OrdToString: type kind must be ordinal, not %s';
const
AsciiChars = [32..127]; // Printable ASCII characters.
begin
case Info.Kind of
tkInteger:
Result := IntToStr(Value);
tkChar, tkWChar:
if Value in AsciiChars then
Result := '''' + Chr(Value) + ''''
else
Result := Format('#%d', [Value]);
tkEnumeration:
Result := GetEnumName(Info, Value);
else
raise EConvertError.CreateFmt(sCvtError,
[GetEnumName(TypeInfo(TTypeKind), Ord(Info.Kind))]);
end;
end;
Figure 4: OrdToString converts any ordinal value to a string.

You may have noticed another problem with SetToString. Consider the case of an integer or character set whose first member doesn't have ordinal value 0. The function tests the wrong bit in the set. To handle this case, the function must find the correct bit position. If the first member of the set falls after a byte boundary, Delphi saves space and doesn't store the initial empty bytes. SetToString must handle this case too.

Finally, the function checks the type info to make sure the type is a set type (type kind of tkSet). If not, it raises an exception. Figure 5 lists the final version of SetToString.

resourcestring
sNotASet = 'SetToString: argument must be a ' +
'set type; %s not allowed';
const
// Mask to force the minimum set value to be
// a set element on a byte boundary.
ByteBoundaryMask = not (BitsPerByte - 1);

function SetToString(Info: PTypeInfo; const Value;
const Separator, Prefix, Suffix: string): string;
var
CompInfo: PTypeInfo;
CompData: PTypeData;
SetValue: TSet absolute Value;
Element: 0..MaxSet;
MinElement: 0..MaxSet;
begin
if Info.Kind <> tkSet then
raise EConvertError.CreateFmt(sNotASet,
[GetEnumName(TypeInfo(TTypeKind), Ord(Info.Kind))]);
CompInfo := GetTypeData(Info)^.CompType^;
CompData := GetTypeData(CompInfo);
Result := '';
MinElement := CompData.MinValue and ByteBoundaryMask;
for Element := CompData.MinValue to CompData.MaxValue do
begin
if (Element - MinElement) in SetValue then
if Result = '' then
Result := Prefix + OrdToString(CompInfo, Element)
else
Result := Result + Separator +
OrdToString(CompInfo, Element);
end;
if Result = '' then
Result := Prefix + Suffix
else
Result := Result + Suffix;
end;
Figure 5: Final version of SetToString.

StringToSet Function

More difficult is the task of converting a string to a set. The string might use any prefix, suffix, and separator characters; it might contain space characters. The string might not be correctly formed. A string can contain enumerated literals, integers, or characters. Characters have multiple representations as well: quotes ('x'), ordinal value (#13), or control character (^M).

Characters are sufficiently different from integers and enumerations that two functions are needed. StringToSet calls StringToEnumSet or StringToCharSet, depending on the type kind of the set's component type. Before it converts the string, though, it must initialize the set to empty. The size of the set's value is independent of the type, so StringToSet initializes the set to empty by calling FillChar. The size of the set depends on the limits of the component type, which must be rounded to byte boundaries. Figure 6 shows the StringToSet function.

procedure StringToSet(const Str: string;
Info: PTypeInfo; var Value);
var
CompInfo: PTypeInfo;
CompData: PTypeData;
SetValue: TSet absolute Value;
MinValue, MaxValue: Integer;
begin
if Info.Kind <> tkSet then
raise EConvertError.CreateFmt(sNotASet,
[GetEnumName(TypeInfo(TTypeKind), Ord(Info.Kind))]);
CompInfo := GetTypeData(Info)^.CompType^;
// Initialize SetValue to an empty set. Only initialize
// as many bytes as are present in the set.
CompData := GetTypeData(CompInfo);
MinValue := CompData.MinValue and ByteBoundaryMask;
MaxValue := (CompData.MaxValue + BitsPerByte - 1) and
ByteBoundaryMask;
FillChar(SetValue, (MaxValue - MinValue) div BitsPerByte, 0);
if CompInfo.Kind in [tkChar, tkWChar] then
StringToCharSet(Str, CompData, SetValue)
else
StringToEnumSet(Str, CompInfo, CompData, SetValue);
end;
Figure 6: StringToSet divides its work between StringToEnumSet and StringToCharSet.

Starting with StringToEnumSet (because it's simpler), the first task is to skip over leading white space characters, then look for a prefix character. Any non-alphanumeric character is allowed as a prefix, but it must be only one character. SetToString allows any string, but StringToSet must be a little more restrictive to keep it manageable. Then, the function skips more white space and collects an alphanumeric token.

The TypeInfo unit has the GetEnumValue function to convert an enumerated literal to its ordinal value. Like GetEnumName, it doesn't handle character types. StringToEnumSet doesn't handle sets of characters either, so StringToEnumSet can call GetEnumValue. GetEnumValue raises an EConvertError exception if the type is an integer type and the string is not a valid integer. It returns -1 for an enumerated type when the name is not valid for the type. StringToEnumSet checks for a negative value indicating an error from GetEnumValue. It also checks to ensure the ordinal value is in range for the set's type. For any error, it raises an EConvertError exception.

StringToEnumSet must find the correct bit position in the set, in the same manner as SetToString. Once it finds that position, it sets the bit to one, and continues its loop. The next time through the loop, a non-alphanumeric character can be a separator between set elements. The final loop looks for a trailing non-alphanumeric character as the suffix. If the string cannot be parsed, the function raises an exception. Figure 7 lists StringToEnumSet.

const
WhiteSpace = [#0..' '];
Alphabetic = ['a'..'z', 'A'..'Z', '_'];
Digits = ['0'..'9'];
AlphaNumeric = Alphabetic + Digits;
resourcestring
sInvalidSetString =
'StringToSet: %s not a valid literal for the set type';
sOutOfRange =
'StringToSet: %0:d is out of range [%1:d..%2:d]';

procedure SkipWhiteSpace(const Str: string;
var I: Integer);
begin
while (I <= Length(Str)) and (Str[I] in WhiteSpace) do
Inc(I);
end;

procedure StringToEnumSet(const Str: string;
CompInfo: PTypeInfo; CompData: PTypeData;
var Value: TSet);
var
ElementName: string;
Element: Integer;
MinElement: Integer;
Start: Integer;
I: Integer;
begin
MinElement := CompData.MinValue and ByteBoundaryMask;
I := 1;
while I <= Length(Str) do
begin
SkipWhiteSpace(Str, I);
// Skip the prefix, separator, or suffix.
if (I <= Length(Str)) and
not (Str[I] in AlphaNumeric) then
Inc(I);
SkipWhiteSpace(Str, I);
// Remember the start of the set element,
// and collect the entire element name.
Start := I;
while (I <= Length(Str)) and (Str[I] in AlphaNumeric) do
Inc(I);
// No name, so skip to the next element.
if I = Start then
Continue;
ElementName := Copy(Str, Start, I - Start);
Element := GetEnumValue(CompInfo, ElementName);
if Element < 0 then
raise EConvertError.CreateFmt(sInvalidSetString,
[AnsiQuotedStr(ElementName, '''')]);
if (Element < CompData.MinValue) or
(Element > CompData.MaxValue) then
raise EConvertError.CreateFmt(sOutOfRange,
[Element, CompData.MinValue, CompData.MaxValue]);
Include(Value, Element - MinElement);
end;
end;
Figure 7: The StringToEnumSet function.

Converting a string to a character set is harder, because parsing the characters is more involved. The basic structure of StringToCharSet is the same as StringToEnumSet, except that each set element must be a character. StringToCharSet supports quoted characters and ordinal values after a number sign (#127), which are the same formats used by SetToString. Note that Delphi supports one other way to specify characters: A caret followed by a character can be used for control characters (^M). However, SetToString doesn't use that format, so StringToCharSet doesn't either.

A quoted character can be a repeated quote (''''). An ordinal value can be decimal or hexadecimal (starting with a dollar sign, as per Delphi conventions). The details of parsing characters aren't relevant to this article, so check out Listing One for the full story.

Putting It All Together

Now that you have the SetToString and StringToSet functions, you need to put them to good use. For example, if you want to store font information in the registry, you can store the font name as a string, the size as an integer, and convert the style to a string, as shown in Figure 8. Because the Value parameter is untyped, you cannot pass a property directly to SetToString or StringToSet, so you must use a temporary variable. If the font style is, say, fsBold and fsItalic, the registry would contain the string [fsBold,fsItalic].

procedure SaveFont(Font: TFont; Reg: TRegistry);
var
Style: TFontStyles;
begin
Reg.WriteString('Name', Font.Name);
Reg.WriteInteger('Size', Font.Size);
Style := Font.Style;
Reg.WriteString('Style',
SetToString(TypeInfo(TFontStyles), Style));
end;

procedure LoadFont(Font: TFont; Reg: TRegistry);
var
Style: TFontStyles;
begin
Font.Name := Reg.ReadString('Name');
Font.Size := Reg.WriteInteger('Size');
StringToSet(Reg.ReadString('Style'),
TypeInfo(TFontStyles), Style);
Font.Style := Style;
end;
Figure 8: Saving a font in the registry.

The Object Inspector already uses a function similar to SetToString to display the value of a set-type property. You can write your own property editor for set-type properties and call SetToString. You can even let the user type a new set value and call StringToSet - something Delphi doesn't currently allow.

Conclusion

The SetToString and StringToSet functions work with any set of any type, thanks to the wonders of Run-time Type Information. How you use these functions is up to you.

The project referenced in this article is available for downloadhttp://www.baltsoft.com/files/dkb/attachment/setstring.zip.

Begin Listing One -StringToCharSet
const
Digits = ['0'..'9'];
HexDigits = ['a'..'f', 'A'..'F'] + Digits;
CharBegin = ['#', ''''];
AsciiChars = [' '..'~']; // Printable ASCII characters.
resourcestring
sNotAChar =
'StringToSet: Not a valid character (%.10s)';
sCharOutOfRange =
'StringToSet: Character #%0:d is ' +
'out of range [#%1:d..#%2:d]';

// Convert a string to a set of character elements.

procedure StringToCharSet(const Str: string;
CompData: PTypeData; var Value: TSet);
var
ElementName: string;
Element: Integer;
MinElement: Integer;
Start: Integer;
I: Integer;
begin
MinElement := CompData.MinValue and ByteBoundaryMask;
I := 1;
while I <= Length(Str) do
begin
SkipWhiteSpace(Str, I);
// Skip over one character, which might be the prefix,
// a separator, or suffix.
if (I <= Length(Str)) and not (Str[I] in CharBegin) then
Inc(I);
SkipWhiteSpace(Str, I);
if I > Length(Str) then
Break;
case Str[I] of
'#':
begin
// Character is specified by ordinal value,
// e.g. #31 or #$A2.
Inc(I);
Start := I;
if (I < Length(Str)) and (Str[I] = '$') then
begin
Inc(I);
while (I <= Length(Str)) and
(Str[I] in HexDigits) do
Inc(I);
end
else
begin
while (I <= Length(Str)) and
(Str[I] in Digits) do
Inc(I);
end;
ElementName := Copy(Str, Start, I - Start);
Element := StrToInt(ElementName);
end;
'''':
begin
// Character is enclosed in quotes, e.g. 'A'.
Start := I; // Save position for error messages.
Inc(I);
if (I <= Length(Str)) then
begin
Element := Ord(Str[I]);
if Str[I] = '''' then
// Skip over a repeated quote character.
Inc(I);
// Skip to the closing quote.
Inc(I);
end;
if (I <= Length(Str)) and (Str[I] = '''') then
Inc(I)
else
raise EConvertError.CreateFmt(sNotAChar,
[Copy(Str, Start, I - Start)]);
end;
else
// The unknown character might be the suffix. Try
// skipping it and subsequent white space. Save the
// original index in case the suffix-test fails.
Start := I;
Inc(I);
SkipWhiteSpace(Str, I);
if I <= Length(Str) then
raise EConvertError.CreateFmt(sNotAChar,
[Copy(Str, Start, I - Start)])
else
Exit;
end;
if (Element < CompData.MinValue) or
(Element > CompData.MaxValue) then
raise EConvertError.CreateFmt(sCharOutOfRange,
[Element, CompData.MinValue, CompData.MaxValue]);
Include(Value, Element - MinElement);
end;
end;
End Listing One

Nincsenek megjegyzések:

Megjegyzés küldése