2004. október 17., vasárnap

Inside Delphi's Classes and Interfaces Part I


Problem/Question/Abstract:

You've probably used classes & interfaces more than once in your delphi programs. Did you ever dtop to think how delphi implements this creatures ?

Answer:

A few words before we start :

First, I want to start this article by saying that all of the knowledge in this paper is derived from viewing the disassembler of Delphi5. Hence everything writen here is valid only for Delphi5 and might change by any upgrade / different version.
Second, inorder to fully understand what is writen in this article, you'll have to dive into some assembler code. I'll explain what the assembler code does, but be prepared, it might get messy.

And now to the real stuff. In delphi a class' instance is a simple pointer. That might seem odd to some people, since you've used instances in delphi many a time, and never had to treat them like pointers. That is correct, but only because boralnd was kind enough to wrap these pointers nicly up.
These pointers actually point to a complicated structor in memory, which we'll try and understand. First we'll look at some simple class' defenition :

TBoo1 = class
  FDataA, FDataB: Integer;
end;

var
  Boo1: TBoo1;
begin
  Boo1 := TBoo1.Create;
end;

Now let's look at what Boo1 points to (Boo1 is a pointer, remember ?) :

(Boo1 points to the following values, each 4 bytes long)
a Pointer to TBoo1's VMT
FDataA
FDataB

Now let's examine a decendant of TBoo1 :

TBoo2 = class(TBoo1)
  FDataC, FDataD: Integer;
end;

var
  Boo2: TBoo2;
begin
  Boo2 := TBoo2.Create;
end;

Boo2 will point to the following values in memory :
a Pointer to TBoo2's VMT
FDataA
FDataB
FDataC
FDataD

Notice that the values that Boo2 points to include some of the values that Boo1 points to. That's very easy to explain - TBoo2 inherites from TBoo1, therefor it must include all of the fields that TBoo1 has.

As a general case, we could state that each class instance points to the following values :

a pointer to the Class' VMT
a list of the Class' parent's fields
a list of the Class' fields

Now it's time to investigate interfaces. Before we can fully understand interfaces we must understand the way delphi makes a method call to a class' instance. What delphi actually does, is call a function with one more parameter than was declared, and that parameter is the instance itself. Let's look at an example :

TMoo = class
  FData: Integer;
  procedure Act(Value: Integer);
end;

procedure TMoo.Act(Value: Integer);
begin
  if FData = Value then
    FData := FData + 1
  else
    FData := Value;
end;

var
  Moo: TMoo;
begin
  Moo := TMoo.Create;
  Moo.Act(15);
end;

How does delphi implement this ? Simple, 'TMoo.Act' is actually compiled into a procedure that accepts two(!) parameters. One is the defined parameter -'Value' of type integer. The other is an instance of class TMoo. Every time delphi calls 'Moo.Act' it does some preprocessing before hand, that is, it passes the instance of TMoo that is making the call. Basically you could say that any call to a method of an object is translated to a regular call to a function / procedure that accepts the object making the call as a parameter.
In the previos example, 'TMoo.Act' is actually compiled to something like this :

procedure TMoo_Act(Self: TMoo; Value: Integer);
begin
  if Self.FData = Value then
    Self.FData := FData + 1
  else
    Self.FData := Value;
end;
  
It's time to go back to interfaces. Consider the following code :

IKoo = interface
  function Calculate(Value: Integer): Double;
end;

function Evaluate(Koo: IKoo; Value: Integer): Double;
begin
  Result := Koo.Calculate(Value);
end;

TKooA = class(TInterfacedObject, IKoo)
  function Calculate(Value: Integer): Double;
end;

TKooB = class(TInterfacedObject, IKoo)
  procedure DoNothing;
  function Calculate(Value: Integer): Double;
end;

Any class that supports IKoo can be passed as a variable to the function 'Evaluate'. When we pass an instance of TKooA to 'Evaluate' we need to call the first method of TKooA, but when we pass an instance of TKooB, we need to call the second method of TKooB ! How will delphi now which function to call at each time ?!

Inorder to understand the answer, we must review what an interface realy is (and how it is implemented in delphi). An interface is simply a list of methods that a class declares that it implements. That is, each method in the interface is implemented in the class. The way deplhi implements this is thus :

Each interface a class supports is actually a list of pointers to methods. Therefor, each time a method call is made to an interface, the interface actually diverts that call to one of it's pointers to method, thus giving the object that realy imlpements it the chance to act. I'll explain that via the 'Koo' example above :

Each time the function 'Evaluate' gets a parameter of type IKoo, it realy gets a list (with 4 items - IKoo inherites from IUnknown) of pointers to methods. If it got an IKoo interface that was implemented by TKooA, then the 4th item in the pointer-to-method list would point to  'TKooA.Calcualte'. Otherwise it would point to 'TKooB.Calcualte'. Therefor, when a call is made to 'IKoo.Calculate' what actually is called is what 'IKoo.Calcualte' points to (either 'TKooA.Calculate' or TKooB.Calculate'). Thus delphi implements interfaces.

And now to how delphi stores interfaces in memory. For each instance of a class that supports 'N' interfaces, we need 'N' different lists of pointer-to-method (for each interface we need a list of pointer-to-method). But these lists are the same in the scope of a single class, therefor inorder to save memory, we only hold 'N' pointers to these lists for each instance (instead of the lists themselves).

Consider the following code :

ILooA = interface
end;

ILooB = interface
end;

TLoo = class(TInterfacedObject, ILooA, ILooB)
  FLooA, FLooB: Integer;
end;

This is how an instance of TLoo would look in memory :

a pointer to TLoo's VMT
FRefcount
IUnknown
FLooA
FLooB
ILooB
ILooA

In general, any class' instance would look like this :

a poitner to the class' VMT
the class' parent's structor (except for the pointer to the VMT)
first data member of the class
.
.
last data member of the class
last interface in the class' interface list
.
.
first interface in the class' interface list

As I said at the begining of this article, inorder to realy grasp the way delphi implements class & interfaces we must look at the assembler code delphi produces.
First we'll learn a bit of assembler inorder to understand to code that will follow. In assembler there is a thing called 'Register'. A register is a place on the CPU that can hold a 32 bit value. On a Pentium CPU there are 8 main registers (EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP). Most actions that are done in assembler are done on registers. Here are a few commands in assembler :

(Moves the value into the register)
MOV Register, Value

(Moves the value in Register2 into Register1)
Mov Register1, Register2

(Moves the value that Register2 points to into Register1. This is the same as the followin code : 'Register1 := Register2^;')
Mov Register1, [Register2]

(Moves the value that Register2 + Value points to into Register1. The same as :
'Register1 := Pointer(Integer(Register2) + Value)^;')
Mov Register1, [Register2 + Value]
  
Eaxmples :

Mov EAX, 10
MOV EBX, EAX
MOV EAX, [EBX + 6]

EBX will hold the value 10 and EAX will hold the value that is in the address $10.

  Just inorder to make sure that you understood this part, I'll give an example of how delphi assignes a value to an instance's data member.

TGoo = class
  FDataA, FDataB: Integer;
end;

var
  Goo: TGoo;
begin
  Goo := TGoo.Create;
  Goo.FDataA := 5;
  Goo.FDataB := 7;
end;

If you'd open delphi's disassembler you'd see the following code :

//Goo.FDataA := 5;
mov eax, [ebp - $08]
mov[eax + $04], $00000005
//Goo.FDataB := 7;
mov eax, [ebp - $08]
mov[eax + $08], $00000007

Why move the value pointed by 'ebp-$08' ? Simple, that's where the variable Goo is stored. Notice that accessing FDataA is the same as accessing the address at 'eax + $04' and that accessing FDataB is the same as accessing the address at 'eax + $08'. That's because the address 'eax' points to is the pointer to the VMT of TGoo, and (as I mentioned before) the following values in memory are the data members of TGoo.

Let's go back to interfaces. Look at the following code :

IRoo = interface
end;

TRoo = class(TInterfacedObject, IRoo)
end;

var
  Roo: TRoo;
  RooIntf: IRoo;
begin
  Roo := TRoo.Create;
  RooIntf := Roo;
  RooIntf._AddRef;
end;

The following assembler code isn't exactly what delphi produces but it serves the same point :

// RooIntf := Roo;
// eax holds the value returned by TRoo.Create, that is, the variable Roo
// ecx holds the value that should later be assigned to RooIntf
mov ecx, eax
// This is the same as : 'ecx := ecx + $0C';
add ecx, $0C
// RooIntf._AddRef
// Push 'ecx' onto the CPU's stack
push ecx
mov ecx, [ecx]
// 'call' tells the CPU to jump to the address stored as a value in 'ecx'
call ecx

Let's look at the code that 'call ecx' brought us too :

// POP the value we pushed onto the stack into 'ecx'
pop ecx
// Same as : 'ecx := ecx - $0c;
sub ecx, $0C
// Call the method '_AddRef' with 'ecx' as a variable.
call TInterfacedObject._AddRef(ecx)

A Little explaination is due. Why did delphi add '$0C' to 'ecx' ? remember how Roo is stored in memory (a pointer to VMT, FRefCount (Of InterfacedObject), IUnknown (Of TInterfacedObject), IRoo). IRoo is the forth value in the list that 'ecx' points to. Each value is 4 bytes long, so IRoo is 12 (4*4) bytes after 'ecx', and '$0C' is 12 in exadecimel notation. So basically, adding '$0C' to 'ecx' just made 'ecx' point to the right value, that is, point to IRoo of Roo (an instance of TRoo).
  
Why do we push ecx into the stack ? That's cause we'll need to use it later, when calling the real '_AddRef' method. Remeber, 'ecx' is the value pointing to Roo + 12.
After that, we move into 'ecx' the value that 'ecx' pointed to. Remeber when I said that instead of holding the lists of pointer-to-method, delphi stores only the pointers to them (to save memory) ? That's why 'ecx' was actually a pointer, but now it holds the value it pointed to before.

  The next command, is to call the method that 'ecx' holds. Now we'll look at that method. It's very short. The only thing it does is modify the value of 'ecx' (after poping it from the stack) so it is equal to the value of Roo (that is, it points to the variable Roo). Then the method 'TInterfacedObject._AddRef' is called with 'ecx' (Roo) as a parameter. This is the same as when I've writen that delphi actually complies a Class' method into a regualr function / procedure that accepts one extra parameter - the instance of the class.
  What was that good for ? We added a value from a poitner then did this jump around in memory, then subtracted the same value from the pointer and called the function the pointer points too ! why bother ? we could simple call the function without adding and subtracting values !
  
  This is where the power of indirection comes into the game. Notice
that the call to 'RooIntf._AddRef' didn't know that RooIntf was actually of an instance of TRoo. It just called the method that was there to call. The Implementation of this method is where the reassigning of the value of the pointer was made. That is, only the implementation that RooIntf points to (IRoo of TRoo) knew how much was added or substracted from the pointer pushed to the stack. If we had another varaible of type TRoo2, that also implemented IRoo, and we would have made the following assignment 'RooIntf := varaible of type TRoo2', and would call the method 'RooIntf._AddRef' then a different value would be subtracted from the value in the stack. Thus making the method call go to the right place in the TRoo2 class.

Nincsenek megjegyzések:

Megjegyzés küldése