2007-03-28

Implementing Polymorphism in C

A few months ago I was doing some work on Linux /proc fs. As you may already know, procfs (just like any other filesystem in Linux) implements the VFS API. The VFS API groups together common concepts between filesystems (e.g. retrieving an i-node from a pathname, assigning permissions to an i-node etc.). This allows for some polymorphism in the kernel code: a module requiring a VFS operation upon a file, does not need to know the actual type or inner-workings of the filesystem hosting this file.

In order to apply Objected Oriented Programming concepts to C code, one could map classes to structs, attributes to struct members and methods to function pointers. Having said this, there exists a small issue here that can be quite disturbing (other than the OOP->C mapping :-). A function pointer restricts (via the function prototype) the type of parameters passed to the function it is pointing to.

One can use void pointer parameters to allow for varying parameter types, but this decreases the code readability (i.e. function prototypes include void pointer arguments instead of meaningful types) and disables parameter type checking.

Another approach is to introduce a common structure, which will be passed as parameter to the polymorphic function, hiding all type-dependant data in a special field of this structure. This special field is usually a void pointer or a union and is used as private storage space for each implementation of the polymorphic function. This is common practice and can be found in many TCP/IP stack implementations. Should a void pointer technique be used, there would be some code readability issues regarding the structure definition. The union approach, although more verbose in nature, ties the definition of the special field with all possible types of private data and in this way, forces an update on the struct definition each time a new type is introduced.

So, how can we keep the function prototype as generic as possible and at the same time have descriptive parameter types without changing too much code?

Instead of employing a one-level inheritance scheme, where a base struct contains a special field for use by "sub-classes", we can reverse this model to allow for arbitrary inheritance levels (and get multiple inheritance as a bonus!):

"struct mydriver" is a struct used by an imaginary device driver. This driver adheres to some driver API and as such inherits "struct driver_API" from this API. The driver makes use of a character device and hence inherits some utility functions from "struct char_driver". Thus, "struct mydriver" would look something like this:

struct mydriver {
struct driver_API d_api;
struct char_driver char_dev;
time_t timeout;
int fd;
};

struct members "timeout" and "fd" are mydriver-specific. Now, let's say that we wish to override a function found in the "struct driver_API" with a mydriver-specific one which makes use of the "timeout" field. The function pointer "update_pci_timeout" yields this function's prototype as follows:

struct driver_API {
...
time_t (*update_pci_timeout)(struct driver_API *d_api,
time_t new_timeout);
...
};


The mydriver implementation of this function would look something like this:

time_t mydriver_update_pci_timeout(struct driver_API *d_api,
time_t new_timeout)
{
struct mydriver *mydriver_data;
...
mydriver_data = get_mydriver_ref(d_api);
if ((mydriver_data->timeout) < 1900){
...
}
...
}


Hmmm, what is this magical "get_mydriver_ref"? It returns a pointer to the child (outer "struct mydriver") given a pointer to the parent (inner "struct driver_API"). Of course this only works if "struct driver_API" was previously instantiated as a member of a "struct mydriver", but this is a safe assumption since the scope of our code is limited to the "mydriver" driver.

What follows is the code for "get_mydriver_ref". It uses a bit of GCC magic (macro "container_of") to get the address of the outer struct.

#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})

struct mydriver *get_mydriver_ref(struct driver_API *api){
return container_of(api, struct mydriver, d_api);
}


"container_of" is defined in <linux/kernel.h> of the kernel sources. You can see an example of its use with PROC_I (returns procfs-specific data for an i-node) in linux/fs/proc/inode.c

6 comments:

vvas said...

There is a simpler approach, though less flexible I guess. In C, when the first field of a struct is also a struct, the language guarantees that the inner struct will reside in the same base address as the outer (enclosing) one. Therefore one can achieve up-/downcasting with simple typecasts. Glib/GTK+, amongst others, use this approach.

modal_echoes said...

Using the standard C casting operator to do the OO casting work is surely intriguing (speed+simplicity), BUT you can't use this if you have multiple inheritance (i.e. you can't jump from a random member to the enclosing struct).

Anonymous said...

Too much drugs!!!!!
Speedwise, how does this compare to C++?

modal_echoes said...

Since you asked...

For the scheme vvas described, getting hold of a child struct of a given parent struct is equivalent to a simple assignment of the form:

a = (struct other *) b;

This uses the same amount of instructions irrespective of the number of "child" structs we have to traverse (in theory) to reach our destination struct.

The same number of instructions are used in the void pointer case since we are merely casting the pointer to the desired type. This scheme considers all types flat and has no notion of structural hierarchy.

In the case of the common structure with the special field, you would have to first dereference the special field and then access the private data by means of a cast. This case allows for a limited (1-hop) hierarchy.

The "container_of" case allows for traversal when even multiple inheritance exists, but the traversing code must know the structure of the hierarchy in order to reach a higher node. At each hop of the hierarchy the traversing code must re-evaluate the address of the enclosing struct. Thus, the desired child struct will be finally reached in O(n) time (where n is the number of traversed hops).

vvas said...

Thanasi, C++ does similar things to what I describe; it just does them under the hood and you don't have to worry about them. So the performance should be quite similar (for the single inheritance case only obviously). The only exception I can think of offhand is calling the polymorphic functions: In the C way of doing things, you have a function pointer directly inside the structure, so it's just an indirect function call. C++ OTOH only has a single pointer per object, pointing to its Virtual Method Table, which in turn contains the virtual function pointers. So C++ is slightly slower there, because there's an extra level of indirection, but the objects can potentially be much smaller this way. The usual size-speed tradeoff.

nuclear said...

Very interesting idea. Most of the time, when I implement OOP in C, I use the simple method vvas described. I haven't thought of using offsetof to go to the parent class from the child class. However, I also haven't had the urge to use multiple inheritance either, so the simple approach works fine :)