2011. február 7., hétfő

Virtual Methods and Polymorphism Part 2


Problem/Question/Abstract:

Smart Linking

Answer:

In Part I we explored the magic of polymorphism and its Object Pascal implementation, the virtual method. We discovered that the indicator of which virtual method to invoke on the instance data is stored in the instance data itself.

In this installment, we conclude our exploration with a discussion of abstract interfaces and how virtual methods can defeat and enhance "smart linking."

Abstract Interfaces

An abstract interface is a class type that contains no implementation and no data - only abstract virtual methods. Abstract interfaces allow you to completely separate the user of the interface from the implementation of the interface.

And I do mean completely separate; with abstract interfaces, you can have an object implemented in a DLL and used by routines in an .EXE, just as if the object were implemented in the .EXE itself. Abstract interfaces can bridge:

conceptual barriers within an application,
logistical barriers between an application and a DLL,
language barriers between applications written in different programming languages, and
address space barriers that separate Win32 processes.

In all cases, the client application uses the interface class just as it would any class it implemented itself.

Let's now take a closer look at how an abstract interface class can bridge the gap between an application and a DLL. (By the way, abstract interfaces are the foundation of OLE programming.)

Importing Objects from DLLs: The Hard Way. If you want an application to use a function in a DLL, you must create a "fake" function declaration that tells the compiler what it needs to know about the parameter list and result type of the function. Instead of a method body, this fake function declaration contains a reference to a DLL and function name. The compiler sees these and knows what code to generate to call the proper address in the DLL at run time.

To have an application use an object that's implemented in a DLL, you could do essentially the same thing, declaring a separate function for each object method in the DLL. As the number of methods in the DLL object increases, however, keeping track of all those functions will become a chore. To make things a little easier to manage, you could set up the DLL to give you (the client application) an array of function pointers that you would use to call any of the DLL functions associated with a particular DLL class type.

You can see where this is headed. A Virtual Method Table is precisely an array of function pointers (we discussed the VMT last month). Why do things the hard way when the compiler can do the dirty work for you?

Importing Objects from DLLs: The Smart Way. The client module (the application) requires a class declaration that will make the compiler "visualize" a VMT that matches the desired DLL's array of function pointers. Enter the abstract interface class. The class contains a hoard of virtual; abstract; method declarations in the same order as the functions in the DLL's array of function pointers. Of course, the abstract method declarations need parameter lists that match the DLL's functions exactly.

Now you can fetch the array of function pointers from the DLL and typecast a pointer to that array into your application's abstract interface class type. (Okay; it actually needs to be a pointer to a pointer to an array of function addresses. The first pointer simulates the object instance, the second pointer simulates the VMT pointer embedded in the instance data, but who's counting?)

With this typecast in place, the compiler will think you have an instance of that class type. When the compiler sees a method call on that typecast pointer, it will generate code to push the parameters on the stack, then look up the nth virtual method address in the "instance's VMT" (the pointer to the function table provided by the DLL), and call that address. Voil?! Your application is using an "object" that lives in a DLL as easily as one of its own classes.

Exporting Objects from DLLs. Now for the flip side. Where does the DLL get that array of function pointers? From the compiler, of course! On the DLL side, create a class type with virtual methods with the same order and parameter lists as defined by the "red-herring" array of function pointers, and implement those methods to perform the tasks of that class. Then implement and export a simple function from the DLL that creates an instance of the DLL's class and returns a pointer to it. Again, Voil?! Your DLL is exporting an object that can be used by any application that can handle pointers to arrays of function addresses. Also known as objects!

Abstract Interfaces Link User and Implementor. Here's the clincher. How do you guarantee that the order and parameter lists of the methods in the application's abstract interface class exactly match the methods implemented in the DLL?

Simple. Declare the DLL class as a descendant of the abstract interface class used by the application, and override all the abstract virtual methods. The abstract interface is shared between the application and the DLL; the implementation is contained entirely within the DLL.

Abstract Interfaces Cross Language Boundaries. This can also be done between modules written in different languages. The Microsoft Component Object Model (COM) is a language-independent specification that allows different programming languages to share objects as just described. At its core, COM is simply a specification for how an array of function pointers should be arranged and used. COM is the foundation of OLE.

Since Delphi's native class type implementation conforms to COM specifications, there is no conversion required for Delphi applications to use COM objects, nor any conversion required for Delphi applications to expose COM objects for other modules to use.

Of course, when dealing with multiple languages, you won't have the luxury of sharing the abstract interface class between the modules. You'll have to translate the abstract interface class into each language, but this is a small price to pay for the ability to share the implementation.

The Delphi IDE is built entirely upon abstract interfaces, allowing the IDE main module to communicate with the editor and debugger kernel DLLs (implemented in BC++), and with the multitude of component design-time tools that live in the component library (CMPLIB32.DCL) and installable expert modules.

Virtuals Defeat Smart Linking

When the Delphi compiler/linker produces an .EXE, the procedures, variables, and static methods that are not referenced by "live" code (code that is actually used) will be left out of the .EXE file. This process is called smart linking, and is a great improvement over normal linkers that merely copy all code into the .EXE regardless of whether it's actually needed. The result of smart linking is a smaller .EXE on disk that requires less memory to run.

Smart Linking Rule for Virtuals. If the type information of a class is touched (for example, by constructing an instance) by live code, all the virtual methods of the class and its ancestors will be linked into the .EXE, regardless of whether the program actually uses the virtual methods.

For the compiler, keeping track of whether an individual procedure is ever used in a program is relatively simple; figuring out whether a virtual method is used requires a great deal more analysis of the descendants and ancestors of the class. It's not impossible to devise a scheme to determine if a particular virtual method is never used in any descendants of a class type, but such a scheme would certainly require a lot more CPU cycles than normal smart linking, and the resulting reduction in code size would rarely be dramatic. For these reasons (lots of work, greatly reduced compile/link speed, and diminishing returns), adding smart linking of virtual methods to the Delphi linker has not been a high priority for Borland.

If your class has a number of utility methods that you don't expect to use all the time, leaving them static will allow the smart linker to omit them from the final .EXE if they are not used by your program.

Note that including virtual methods involves more than just the bytes of code in the method bodies. Anything that a virtual method uses or calls (including static methods) must also be linked into the .EXE, as well as anything those routines use, etc. Through this cascade effect, one method could potentially drag hundreds of other routines into the .EXE, sometimes at a cost of hundreds of thousands of bytes of additional code and data. If most of these support routines are used only by your unused virtual method, you have a lot of deadwood in your .EXE.

The best general strategy to keep unused virtual methods - and their associated deadwood - under control, is to declare virtual methods sparingly. It's easier to promote an existing static method to virtual when a clear need arises, rather than trying to demote virtual methods down to statics at some late stage of your development cycle.

Virtuals Enhance Smart Linking

Smart linking of virtuals is a two-edged sword: What is so often cursed for bloating executables with unused code can also be exploited to greatly reduce the amount of code in an executable in certain circumstances - even beyond what smart linking could normally achieve with ordinary static methods and procedures. The key is to turn the smart linking rule for virtuals inside out:

Inverse Smart Linking Rule for Virtuals. If the type information of a class is not touched by live code, then none of that class' virtual methods will be linked into the executable. Even if those virtual methods are called polymorphically by live code!

In a virtual method call, the compiler emits machine code to grab the VMT pointer from the instance data, and to call an address stored at a particular offset in the VMT. The compiler can't know exactly which method body will be called at run time, so the act of calling a virtual method does not cause the smart linker to pull any method bodies corresponding to that virtual method identifier into the final executable.

The same is true for dynamic methods. The act of constructing an instance of the class is what cues the linker to pull in the virtual methods of that particular class and its ancestors. This saves the program from the painful death that would surely result from calling virtual methods that were not linked into the program. After all, how could you possibly call a virtual method of an object instance defined and implemented in your program if you did not first construct said instance? The answer is: you can't. If you obtained the object instance from some external source, e.g. a DLL, then the virtual methods of that instance are in the DLL, not your program.

So, if you have code that calls virtual methods of a class that is never constructed by routines used in the current project, none of the code associated with those virtual methods will be linked into the final executable.

The code in Figure 1 will cause the linker to pull in all the virtual methods of TKitchenGadget and TOfficeManager, because those classes are constructed in live code (the main program block), and all the virtual methods of TBaseGadget, because it's the ancestor of TKitchenGadget.

type
  TBaseGadget = class
    constructor Create;
    procedure Whirr; virtual; { Linked in: YES }
  end;

  TOfficeGadget = class(TBaseGadget)
    procedure Whirr; override; { Linked in: NO }
    procedure Buzz; { Linked in: NO }
    procedure Pop; virtual; { Linked in: NO }
  end;

  TKitchenGadget = class(TBaseGadget)
    procedure Whirr; override; { Linked in: YES }
  end;

  TOfficeManager = class
  private
    FOfficeGadget: TOfficeGadget;
  public
    procedure InstantiateGadget; { Linked in: NO }
    { Linked in: YES }
    procedure Operate(AGadget: TOfficeGadget); virtual;
  end;

  { ... Non-essential code omitted ... }

procedure TOfficeManager.InstantiateGadget;
begin { Dead code, never called }
  FOfficeGadget := TOfficeGadget.Create;
end;

procedure TOfficeManager.Operate(AGadget: TOfficeGadget);
{ Live code, virtual method of a constructed class }
begin
  AGadget.Whirr
end;

var
  X: TBaseGadget;
  M: TOfficeManager;
begin
  X := TKitchenGadget.Create;
  M := TOfficeManager.Create;

  X.Free;
  M.Free;
end.
Figure 1: Inverse virtual smart linking: TOfficeGadget.Whirr will not be linked into this program, although Whirr is touched by the live method TOfficeManager.OperateGadget.

Because TOfficeManager.Operate is virtual, its method body is all live code (even though Operate is never called). Therefore, the call to AGadget.Whirr is a live reference to the virtual method Whirr. However, TOfficeGadget is not constructed in live code in this example -TOfficeManager.InstantiateGadget is never used. Nothing of TOfficeGadget will be linked into this program, even though a live routine contains a call to Whirr through a variable of type TOfficeGadget.

Variations on a Theme. Let's see how the scenario changes with a few slight code modifications. The code in Figure 2 adds a call to AGadget.Buzz in the TOfficeManager.Operate method. Notice that the body of TOfficeGadget.Buzz is now linked in, but TOfficeGadget.Whirr is still not. Buzz is a static method, so any live reference to it will link in the corresponding code, even if the class is never constructed.

type
  TBaseGadget = class
    constructor Create;
    procedure Whirr; virtual; { Linked in: YES }
  end;

  TOfficeGadget = class(TBaseGadget)
    procedure Whirr; override; { Linked in: NO }
    procedure Buzz; { Linked in: YES }
    procedure Pop; virtual; { Linked in: NO }
  end;

  TKitchenGadget = class(TBaseGadget)
    procedure Whirr; override; { Linked in: YES }
  end;

  TOfficeManager = class
  private
    FOfficeGadget: TOfficeGadget;
  public
    procedure InstantiateGadget; { Linked in: NO }
    { Linked in: YES }
    procedure Operate(AGadget: TOfficeGadget); virtual;
  end;

  { ... Non-essential code omitted ... }

procedure TOfficeManager.InstantiateGadget;
begin { Dead code, never called }
  FOfficeGadget := TOfficeGadget.Create;
end;

procedure TOfficeManager.Operate(AGadget: TOfficeGadget);
{ Live code, virtual method of a constructed class }
begin
  AGadget.Whirr;
  AGadget.Buzz; { This touches the static method body }
end;
var
  X: TBaseGadget;
  M: TOfficeManager;
begin
  X := TKitchenGadget.Create;
  M := TOfficeManager.Create;

  X.Free;
  M.Free;
end.
Figure 2: Notice how the addition of a call to the static Buzz method affects its linked-in status. TOfficeGadget.Whirr is still not included.

The code in Figure 3 adds a call to the static method TOfficeManager.InstantiateGadget. This brings the construction of the TOfficeGadget class into the live code of the program, which brings in all the virtual methods of TOfficeGadget, including TOfficeGadget.Whirr (which is called by live code) and TOfficeGadget.Pop (which isn't). If you deleted the call to AGadget.Buzz, the TOfficeGadget.Buzz method would become dead code again. Static methods are linked in only if they are used in live code, regardless of whether their class type is used.

type
  TBaseGadget = class
    constructor Create;
    procedure Whirr; virtual; { Linked in: YES }
  end;

  TOfficeGadget = class(TBaseGadget)
    procedure Whirr; override; { Linked in: YES }
    procedure Buzz; { Linked in: YES }
    procedure Pop; virtual; { Linked in: YES }
  end;

  TKitchenGadget = class(TBaseGadget)
    procedure Whirr; override; { Linked in: YES }
  end;

  TOfficeManager = class
  private
    FOfficeGadget: TOfficeGadget;
  public
    procedure InstantiateGadget; { Linked in: YES }
    { Linked in: YES }
    procedure Operate(AGadget: TOfficeGadget); virtual;

  end;

  { ... Non-essential code omitted ... }

procedure TOfficeManager.InstantiateGadget;
begin { Live code }
  FOfficeGadget := TOfficeGadget.Create;
end;

procedure TOfficeManager.Operate(AGadget: TOfficeGadget);
{ Live code, virtual method of a constructed class }
begin
  AGadget.Whirr;
  AGadget.Buzz; { This touches the static method body }
end;

var
  X: TBaseGadget;
  M: TOfficeManager;
begin
  X := TKitchenGadget.Create;
  M := TOfficeManager.Create;

  M.InstantiateGadget;

  X.Free;
  M.Free;
end.
Figure 3: With a call to InstantiateGadget, the construction of TOfficeGadget becomes live and all of TOfficeGadget's virtual methods are linked.

Life in the Real World. Let's examine a slightly more complex (and more interesting) example of this virtual smart linking technique inside the VCL.

The Delphi streaming system has two parts: TReader and TWriter, which descend from a common ancestor, TFiler:

TReader contains all the code needed to load components from a stream.
TWriter contains everything needed to write components to a stream.

These classes were split because many Delphi applications never need to write components to a stream - most applications only read forms from resource streams at program start up. If the streaming system was implemented in one class, all your applications would wind up carrying around all the stream output code, although many don't need it.

So, splitting the streaming system into two classes improved smart linking. End of story? Not quite.

In a careful examination of the code linked into a typical Delphi application, the Delphi R&D team noticed that bits of TWriter were being linked into the .EXE. This seemed odd, because TWriter was definitely never instantiated in the test program. Some of those TWriter bits touched a lot of other bits that piled up rather quickly into a lot of unused code. Let's backtrack a little to see what lead to this code getting into the .EXE, and its surprising solution.

Delphi's TComponent class defines virtual methods that are responsible for reading and writing the component's state in a stream, using TReader and TWriter classes. Because TComponent is the ancestor of just about everything of importance in Delphi, TComponent is almost always linked into your Delphi programs, along with all the virtual methods of TComponent.

Some of TComponent's virtual methods use TWriter methods to write the component's properties to a stream. Those TWriter methods were static methods.

Therefore, TComponent virtual methods are always included in Delphi form-based applications, and some of those virtual methods (e.g. TComponent.WriteState) call static methods of TWriter (e.g. TWriter.WriteData). Thus, those static method bodies of TWriter were being linked into the .EXE. TWriter.WriteData is the kingpin method that drives the entire stream output system, so when it is linked in, almost all the rest of TWriter tags along (everything, ironically, except TWriter.Create).

The solution to this code bloat (caused indirectly by the TComponent.WriteState virtual method) may throw you for a loop: To eliminate the unneeded TWriter code, make more methods of TWriter (e.g. WriteData) virtual!

The all-or-none clumping of virtual methods that we curse for working against the smart linker can be used to our advantage, so that TWriter methods that must be called by live code are not actually included unless TWriter itself is instantiated in the program. Because methods such as TWriter.WriteData are always used when you use a TWriter, and TWriter is a mule class (no descendants), there is no appreciable cost to making TWriter.WriteData virtual.

The benefits, however, are appreciable: Making TWriter.WriteData virtual shaved nearly 10KB off the size of a typical Delphi 2 .EXE. Thanks to this and other code trimming tricks, Delphi 2 packs more standard features (e.g. form inheritance and form linking) into smaller .EXEs than Delphi 1.

What's Really in Your Executables? The simplest way to find out if a particular routine is linked into a particular project is to set a breakpoint in the body of that routine and run the program in the debugger. If the routine is not linked into the .EXE, the debugger will complain that you have set an invalid breakpoint.

To get a complete picture of what's in your .EXE or DLL, configure the linker options to emit a detailed map file. From Delphi's main menu, select Project | Options to display the Project Options dialog box. Select the Linker tab. In the Map File group box, select Detailed. Now recompile your project. The map file will contain a list of the names of all the routines (from units compiled with $D + debug information) that were linked into the .EXE.

Because the 32-bit Delphi Compiled Unit (.DCU) file has none of the capacity limitations associated with earlier, 16-bit versions of the Borland Pascal product line, there is little reason to ever turn off debug symbol information storage in the .DCU. Leave the $D, $L, and $Y compiler switches enabled at all times so the information is available when you need it in the integrated debugger, map file, or object browser. (If hard disk space is a problem, collect the loose change beneath the cushions of your sofa and buy a new 1GB hard drive.)

Novelty of Inverse Virtual Smart Linking. This technique of using virtual methods to improve smart linking is not unique to Delphi, but because Delphi's smart linker has a much finer granularity than other compiler products, this technique is much more effective in Delphi than in other products.

Most compilers produce intermediate code and limited symbol information in an .OBJ format, and most linkers' atom of granularity for smart linking is the .OBJ file. If you touch something inside a library of routines stored in one .OBJ module, the entire .OBJ module is linked into the .EXE. Thus, C and C++ libraries are often broken into swarms of little .OBJ modules in the hope of minimizing dead code in the .EXE.

Delphi's linker granularity is much finer - down to individual variables, procedures, and classes. If you touch one routine in a Delphi unit that contains lots of routines, only the thing you touch (and whatever it uses) is linked into the .EXE. Thus, there is no penalty for creating large libraries of topically-related routines in one Delphi unit. What you don't use will be left out of the .EXE.

Developing clever techniques to avoid touching individual routines or classes is generally more rewarding in Delphi than in most other compiled languages. In other products, the routines you so carefully avoided will probably be linked into the .EXE anyway because you are still using one of the other routines in the same module. Measuring with a micrometer is futile when your only cutting tool is a chainsaw.

Conclusion

Virtual methods are often maligned for bloating applications with unnecessary code. While it's true that virtuals can drag in code that your application doesn't need, this series has shown that careful and controlled use of virtual methods can achieve greater smart linking efficiency than would be possible with static methods alone.

Nincsenek megjegyzések:

Megjegyzés küldése