2006. augusztus 3., csütörtök

Virtual Methods and Polymorphism Part 1


Problem/Question/Abstract:

Virtual Methods, Inside Out

Answer:

Polymorphism is perhaps the cornerstone of object-oriented programming (OOP). Without it, OOP would have only encapsulation and inheritance - data buckets and hierarchical families of data buckets - but no way to uniformly manipulate related objects.

Polymorphism is the key to leveraging your programming investments to enable a relatively small amount of code to drive a wide variety of behaviors, without requiring carnal knowledge of the implementation details of those behaviors. However, before you can extend existing Delphi components, or design new, extensible component classes, you must have a firm understanding of how polymorphism works and the opportunities it provides.

True to its name, polymorphism allows objects to have "many forms" in Delphi, and a component writer typically uses a mix of all these forms to implement a new component. In this article, we'll closely review the implementation and use of one of Delphi's polymorphism providers, the virtual method, and some of its more peculiar sand traps and exotic applications, e.g. its part in making .EXEs smaller. (Dynamic methods, message methods, and class reference types are Delphi's other polymorphism providers, but are outside the scope of this article.)

This article assumes you are familiar with Delphi class declaration syntax and general OOP principles. If you're a bit rusty with these concepts, you should first refer to the Delphi Language Reference. Also note that in this article, "virtual" denotes the general term that applies to all forms of virtual methods (i.e. methods declared with virtual, dynamic, or override), and "virtual" denotes the specific term that refers only to methods declared with the virtual directive. For example, most polymorphism concepts and issues apply to all virtual methods, but there are a few noteworthy items that apply only to virtual methods.  

Review: Syntax of Virtual Methods

Here's a review of the two kinds of virtual methods and four language directives used to declare them:

Virtual methods come in two flavors: virtual and dynamic. The only difference between them is their internal implementations; that is, they use different techniques to achieve the same results.
Calls to virtual methods are dispatched more quickly than calls to dynamic methods.
Seldom-overridden virtual methods require much more storage space for their compiler-generated tables than dynamic methods.
The keywords, virtual and dynamic, always introduce a new method name into a class' name space.
The override directive redefines the implementation of an existing virtual method (virtual or dynamic) that a class inherits from an ancestor.
The override method uses the same dispatch mechanism (virtual or dynamic) as the inherited virtual method it replaces.
The abstract directive indicates that no method body is associated with that virtual method declaration. Abstract declarations are useful for defining a purely conceptual interface, which is in turn useful for maintaining absolute separation between the user of a class and its implementation.
The abstract directive can only be used in the declaration of new virtual (virtual or dynamic) methods; you can't make an implemented method abstract after the fact.
A class type that contains one or more abstract methods is an abstract class.
A class type that contains nothing but abstract methods (no static methods, no virtual methods, no data fields) is called an abstract interface (or, in C++ circles, a pure virtual interface).

Polymorphism in Action

What do virtual methods do? In general, they allow a method call to be directed, at run time, to the appropriate piece of code, appropriate for the type of the object instance used to make the call. For this to be interesting, you must have more than one class type, and the class types must be related by inheritance from a common ancestor.

Figure 1 shows three classes we'll use to explore the execution characteristics of polymorphism: a simple base class named TBaseGadget that defines a static method named NotVirtual and a virtual method, ThisIsVirtual; and two descendant classes, TKitchenGadget and TOfficeGadget, that override the ThisIsVirtual method they inherit from TBaseGadget. TOfficeGadget also introduces a new static method named NotVirtual and a new virtual method named NewMethod.

type
  TBaseGadget = class
    procedure NotVirtual(X: Integer);
    procedure ThisIsVirtual(Y: Integer); virtual;
  end;

  TKitchenGadget = class(TBaseGadget)
    procedure ThisIsVirtual(Y: Integer); override;
  end;

  TOfficeGadget = class(TBaseGadget);

function NewMethod: Longint; virtual;
  procedure NotVirtual(X, Y, Z: Integer);
    procedure ThisIsVirtual(Y: Integer); override;
end;
Figure 1: Three classes to explore polymorphism.

Identical names in different classes aren't related. Declaring a static method in a descendant that happens to have the same name as a static method in an ancestor is not a true override. Other than same-name similarity, no relationship exists between static methods declared in a descendant and static methods declared in an ancestor class. Your brain makes an association, but the compiler does not. For instance, TBaseGadget has a NotVirtual method, and TOfficeGadget has a disparate method, also named NotVirtual.

If we start with a variable P of type TBaseGadget, we can assign to it an instance of a TBaseGadget; or an instance of one of its descendants, such as a TKitchenGadget or TOfficeGadget. Recall that Delphi object instance variables are pointers to the instance data allocated from the global heap, and that pointers of a class type are type compatible with all descendants of that type. We can then call methods using the instance variable P:

var
  P: TBaseGadget;
begin
  P := TBaseGadget.Create;
  P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
  P.ThisIsVirtual(5); { Call TBaseGadget.ThisIsVirtual }
  P.Free;
end;

(In the interest of brevity, I'll fold the execution traces into comments in the source code. You can step through the sample code to verify the execution trace.)

If P refers to an instance of TKitchenGadget, the execution trace would resemble the code in Figure 2. Nothing remarkable here; we have one call to a static method going to the version defined in the ancestor type, and one call to a virtual method going to the version of the method associated with the object instance type.

var
  P: TBaseGadget;
begin
  P := TKitchenGadget.Create;
  P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
  P.ThisIsVirtual(5); { Call TKitchenGadget.ThisIsVirtual }
  P.Free;
end;
Figure 2: Execution with an instance of TKitchenGadget.

You may deduce that the inherited static method, NotVirtual, is called because TKitchenGadget doesn't override it. This observation is correct, but the explanation is flawed, as Figure 3 shows. If P refers to an instance of TOfficeGadget, you may be a little puzzled by the result.

var
  P: TBaseGadget;
begin
  P := TOfficeGadget.Create;
  P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
  { The compiler will not allow the following two lines:
   P.NotVirtual(1,2,3);   "Too many parameters"
   P.NewMethod;           "Method identifier expected" }
  P.ThisIsVirtual(5); { Call TOfficeGadget.ThisIsVirtual }
  P.Free;
end;
Figure 3: Execution with an instance of TOfficeGadget.

Static method calls are resolved by variable type. Although TOfficeGadget has its own NotVirtual method, and P refers to an instance of TOfficeGadget, why does TBaseGadget.NotVirtual get called instead? This occurs because static (non-virtual) method calls are resolved at compile time according to the type of the variable used to make the call. For static methods, what the variable refers to is immaterial. In this case, P's type is TBaseGadget, meaning the NotVirtual method associated with P's declared type is TBaseGadget.NotVirtual.

Notice that NewMethod defined in TOfficeGadget is out of reach of a TBaseGadget variable. P can only access fields and methods defined in its TBaseGadget object type.

New names obscure inherited names. Let's say P is declared as a variable of type TOfficeGadget. The following method call would be allowed:

P.NotVirtual(1, 2, 3)

However, this method call:

P.NotVirtual(1)

would not be allowed, because TOfficeGadget.NotVirtual requires three parameters.

TOfficeGadget.NotVirtual obscures the TBaseGadget.NotVirtual method name in all instances and descendants of TOfficeGadget. The inherited method is still a part of TOfficeGadget (proven by the code in Figure 3); you just can't get to it directly from TOfficeGadget and descendant types.

To get past this, you must typecast the instance variable:

TBaseGadget(P).NotVirtual(1)

If P were declared as a TOfficeGadget variable, P.NewMethod would also be allowed, because the compiler can "see" NewMethod in a TOfficeGadget variable.

Descendant >= ancestor. An instance of a descendant type could be greater than its ancestor type in both services and data. However, the descendant-type instance can never be less than what its ancestors define. This makes it possible for you to use a variable of an ancestral type (e.g. TBaseGadget) to refer to an instance of a descendant type without loss of information.

Inheritance is a one-way street. With a variable of a particular class type, you can access any public symbol (field, property, or method) defined in any of that class' ancestors. You can assign an instance of a descendant class into that variable, but cannot access any new fields or methods defined by the descendant class. The fields of the descendant class are certainly in the instance data that the variable refers to, yet the compiler has no way of knowing that run-time situation at compile time.

There are two ways around this "nearsightedness" of ancestral class types:

Typecasting - The programmer assumes a lot and forces the compiler to treat the variable as a descendant type.
Virtual methods - The magic of virtual will call the method appropriate to the type of the associated instance, determined at run time.

Ancestors set the standard. Why do we care about the nearsightedness of ancestral classes? Why not simply use the matching variable type when you create or manipulate an object instance? Sometimes this is the simplest thing to do. However, this "simplest" solution falls apart when you begin talking about manipulating multiple classes that do almost the same things.

Ancestral class types set the minimum interface standard through which we can access a set of related objects. Polymorphism is the use of virtual methods to make one verb (method name) produce one of many possible actions depending on the context (the instance). To have multiple, possible actions, you must have multiple class types (e.g. TKitchenGadget and TOfficeGadget) each potentially defining a different implementation of a particular method.

To be able to make one call that could cover those multiple class types, the method must be defined in a class from which all the multiple class types descend - in an ancestral class such as TBaseGadget. The ancestral class, then, is the least common denominator for behavior across a set of related classes.

For polymorphism to work, all the actions common to the group of classes need to at least be named in a common ancestor. If every descendant is required to override the ancestor's method, the ancestral method doesn't need to do anything at all; it can be declared abstract.

If there is a behavior that is common to most of the classes in the group, the ancestor class can pick up that default behavior and leave the descendants to override the defaults only when necessary. This consolidates code higher in the class hierarchy, for greater code reuse and smaller total code size. However, providing default behaviors in an ancestor class can also complicate the design issues of creating flexible, extensible classes, since what is done by ancestors usually cannot be entirely undone.

Polymorphism lets ancestors reach into descendants. Another aspect of polymorphism doesn't appear to involve instance pointer types at all - at least not explicitly.

Consider the code fragment in Figure 4. The TBaseGadget.NotVirtual method contains an unqualified call to ThisIsVirtual. When P refers to an instance of TKitchenGadget, P.NotVirtual will call TBaseGadget.NotVirtual. Nothing new, so far. However, when that code calls ThisIsVirtual, it will execute TKitchenGadget.ThisIsVirtual. Surprise! Even within the depths of TBaseGadget, a non-virtual method, a virtual method call is directed to the appropriate code.

procedure TBaseGadget.NotVirtual;
begin
  ThisIsVirtual(17);
end;

var
  P: TBaseGadget;

begin
  P := TKitchenGadget.Create;
  P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
  P.Free;
end.
Figure 4: Polymorphism allows ancestors to call into descendants.

How can this be? The resolution of virtual method calls depends on the object instance associated with the call. A pointer to the object instance is secretly passed into all method calls, surfacing inside methods as the Self identifier. Inside TBaseGadget.NotVirtual, a call to ThisIsVirtual is actually a call to Self. ThisIsVirtual. Self, in this context, operates like a variable of type TBaseGadget that refers to an instance of type TKitchenGadget. Thus, when the instance type is TKitchenGadget, the virtual method call resolves, at run time, to TKitchenGadget.ThisIsVirtual.

How is this useful? An ancestral method - virtual or not - can call a sequence of virtual methods. The descendants can determine the specific behavior of one or more of those virtual methods. The ancestor determines the sequence in which the methods are called, plus miscellaneous setup and cleanup code. The ancestor, however, does not completely determine the final behavior of the descendants. The descendants inherit the sequence logic from the ancestor, and can override one or more of the steps in that sequence. But, the descendants don't have to reproduce the entire sequence logic. This is one of the ways OOP promotes code reuse.

Fully-qualified method calls are reduced to static calls. As a footnote, consider what happens if TBaseGadget.NotVirtual contains a qualified call to TBaseGadget.ThisIsVirtual:

procedure TBaseGadget.NotVirtual;
begin
  TBaseGadget.ThisIsVirtual(17);
end;

Although ThisIsVirtual is a virtual method, a fully-qualified method call will compile down to a regular static method call. You've specified that you want only the TBaseGadget.ThisIsVirtual method called, so the compiler does exactly what you tell it to do. Dispatching this as a virtual method call may call some other version of that method, which would violate your explicit instructions. Except in special circumstances, you don't want this in your code because it defeats the whole purpose of making ThisIsVirtual virtual.

The Virtual Method Table

A Virtual Method Table (VMT) is an array of pointers to all the virtual methods defined in a class and all the virtual methods the class inherits from its ancestors. A VMT is created by the compiler for every class type, because all classes descend from TObject and TObject has a virtual destructor named Destroy. In Delphi, VMTs are stored in the program's code space. Only one VMT exists per class type; multiple instances of the same class type refer to the same VMT. At run time, the VMT is a read-only lookup table.

Structure of the VMT. The first four bytes of data in an object instance are a pointer to that class type's VMT. The VMT pointer points to the first entry in the VMT's list of four-byte pointers to the entry points of the class' virtual methods. Since methods can never be deleted in descendant classes, the location of a virtual method in the VMT is the same throughout all descendant classes. Thus, the compiler can view a virtual method simply as a unique entry in the class' VMT. As we'll see shortly, this is exactly how virtual method calls are dispatched. Thinking of virtual methods as indexes into an array of code pointers will also help us visualize how method name conflicts are resolved by the compiler.

The VMT does not contain information indicating how many virtual methods are stored in it or where the VMT ends. The VMT is constructed by the compiler and accessed by compiler-generated code, so it doesn't need to make notes to itself about size or number of entries. (This does, however, make it difficult for BASM code to call virtual methods.)

Optimization note. A descendant of a class with virtual methods gets a new copy of the ancestor's VMT table. The descendant can then add new virtual methods or override inherited virtual methods without affecting the ancestor's VMT. For example, if the ancestor has a 12-entry VMT, the descendant has at least a 12-entry VMT. Every descendant class type of that ancestor, and all descendants of those descendants, will have at least 12 entries in their individual VMTs.

All these VMTs occupy memory. For most programs, this won't be a problem, but extraordinarily large class types with thousands of virtual methods and/or thousands of descendants could consume quite a bit of memory, both in RAM and .EXE file size; dynamic methods are much more space efficient, but incur a slight execution speed penalty.

Now let's examine the mechanics behind the magic of virtual method calls.

Inside a virtual method call. When the compiler is compiling your source code and encounters a call to a virtual method identifier, it generates a special sequence of machine instructions that will unravel the appropriate call destination at run time. The following machine code snippets assume compiler optimizations are enabled, and stack frames are disabled:

// Machine code for statement P.SomeVirtualMethod;

{ Move instance data address (P^) into EAX }
MOV EAX, [EBP + 4]
{ Move instance's VMT address into ECX }
MOV ECX, [EAX]
{ Call address stored at VMT index 2 }
CALL[ECX + 08]

The VMT pointer is always stored at offset 0 (zero) in the instance data. In this example, the method being called is the third virtual method of a class, including inherited virtual methods. The first virtual method is at offset 0, the second at offset 4, and the third at offset 8.

Conclusion

That's it - all the magic of virtual methods and polymorphism boils down to this: the indicator of which virtual method to invoke on the instance data is stored in the instance data itself.

In Part II, we'll conclude our series with a discussion of abstract interfaces and how virtual methods can defeat and enhance "smart linking." See you then.

Nincsenek megjegyzések:

Megjegyzés küldése