Clicky

Thursday 12 August 2010

Structs and Classes


.Net provides two basic object types that can be used to construct more complex objects: Structs and Classes. These object types are made up of similar features, and can be used to fulfil many of the same tasks as one another. There are, however, some important distinctions between what you can do with Structs and with Classes, and between how similar features are implemented differently by each object type. This article describes those similarities and differences, and provides some suggestions for when and how to use each type.



The Similarities


Structs and Classes are both capable of utilising Fields, Properties, Constants, Methods, Events, Handlers and certain types of Constructor. They are both capable of being instantiated, and they are each able to have variables declared as being of the type they define. Both can model data objects, using much the same syntax in each case, and indeed you can often convert a Struct to a Class and back simply by changing the word struct in its definition to class, or vice-versa. Both types of entity may also implement Interfaces.


The Differences


The most fundamental distinction between Structs and Classes is rooted in a .Net concept known as the Common Type System. When you compile a program written in any of the languages that target the .Net Framework, the first step that happens is that the code gets converted by the specific compiler for the high-level language (e.g., C#, VB, C++, or whatever) into something called Microsoft Intermediate Language (also known as the Common Intermediate Language, often abbreviated to ‘IL’ for short). The IL representation of a .Net program is then converted into a set of platform-specific machine code instructions by the .Net runtime execution environment, known as the Common Language Runtime (CLR).  [ NB: For Java developers, the concept of .Net’s IL is analogous to the way that Java compilers convert raw Java code into Bytecode, to be run within a Java Virtual Machine.] 

Part of what makes the process of converting any of the languages that target the .Net Framework into IL possible, is the fact that each of those different languages uses the Common Type System (CTS) mentioned above. The CTS defines the basic set of types (boolean, integer, etc) that IL understands and knows how to work with. [Actually, the set of types that IL understands is far larger than any of the languages that presently target the .Net Framework is capable of modelling, and not all of the languages that target the .Net Framework implement exactly the same subset of those IL types, leading to the potential for subtle language interoperability issues, but that’s a discussion for another day.]

The key point of relevance to the subject under discussion is, there are two core types defined within the CTS, which are known as Value Type and Reference Type respectively. Every type in any .Net program derives from one or other of these two possible base types, and which one they derive from dictates how they will be stored in memory, how the object may be inherited, and how objects that are of those types will behave.

Getting back to Structs and Classes, it’s the case that every Struct defined in a C# or VB program is considered to be a Value Type by the CTS, and every Class is considered to be a Reference Type at the very root of its inheritance chain. This difference in base type inheritance is what accounts for many of the differences between Structs and Classes, which are listed below:


Differences in Implementation Inheritance


As noted above, both Structs and Classes may implement Interfaces. However, only Classes may overtly use Implementation Inheritance to inherit from another object. Classes, unlike Structs, may also be inherited by another object.

At their root, Structs derive from a CTS type called System.ValueType, which in turn inherits from System.Object. Classes with no overtly-defined base inherit directly from System.Object (and, for Classes that don't inherit directly from System.Object, there may be any amount of Classes in between System.Object and themselves).

Concentrating on Structs, you may find it surprising to know that most of the defined system types in C# and VB (e.g., integer, double, decimal, etc), also derive from System.ValueType, and are in fact implemented as Structs. If you mouse over the keyword of any data type implemented in this way within C# or VB, this fact becomes readily apparent:


System.ValueType doesn’t add any functionality to Structs that is not also available to Classes through their common inheritance from System.Object, but it does replace some of the functionality present in System.Object as virtual methods with implementations that are more appropriate for Structs.

Structs, in short, are intended to be used for tasks that require allocating small, fixed-size chunks of data, that need to be able to be accessed and changed rapidly, without incurring an undue performance hit in managing the memory resources that are allocated to them. That’s why they’re used to implement the basic data types such as integer, etc. : when a developer defines an integer variable, they expect to be able to begin using that variable to store numbers within the permitted range for that type immediately, without needing to mess around with allocating space in memory by using a new() operator as a separate activity each and every time, and without checking whether any modifications their application makes to the value stored in the variable has had any implications for the amount of memory required. They want to be able to access any values placed in their variable readily, without having to rely on the underlying implementation to use pointers to look up an address in memory, then decide what has been stored there, as a two-stage process. Structs enable developers to achieve this rapidity of declaration and access to memory chunks of a fixed size, but the trade-off is that they can’t be used to achieve true Implementation Inheritance, which would require management of an area of memory of a variable size.

So, whilst Structs are easy to consume and fast to access, if you need to use Implementation Inheritance, that’s a clear sign you need to use a Class instead.   


Structs can be consumed without explicit instantiation, Classes cannot


When you create a variable of a type defined by a Class, you need to use the new() operator to instantiate that object before you can access the properties of the object  the variable represents. With Structs, this is not the case. Take the following simple example, using a struct:

public class StructsAndClassesDemo 
{
    static void Main()
    {
        MyStruct myVariable;
        myVariable.SomeInteger = 1;
        Console.WriteLine(myVariable.SomeInteger);
    }
}

public class MyClass 
{
    public int SomeInteger;
}

public struct MyStruct 
{
    public int SomeInteger;
}

This will compile and run correctly, outputting “1” in the Console window. If, on the other hand, we change the Main() Method to:


a compilation error will occur, since we have tried to use an uninstantiated variable of a Reference Type.

NB: You still need to instantiate an object of type struct before you may access Properties (as distinct from Fields) of that object.


Variable Assignment works differently for Classes than for Structs


Because Structs are Value Type objects, variables of types defined by Structs behave somewhat differently than variables of types defined by Classes. Consider the following simple program:

public class StructsAndClassesDemo 
{
    static void Main()
    {
        MyClass myVariable1 = new MyClass();
        MyClass myVariable2;

        myVariable1.SomeInteger = 1;
        myVariable2 = myVariable1;
        myVariable1.SomeInteger = 2;

        Console.WriteLine(myVariable2.SomeInteger);
        Console.ReadLine();
    }
}

The above example would output “2” in the console window.  If we instead exchange the usage of a Class type object for a Struct, and keep the variable assignment logic exactly the same:

public class StructsAndClassesDemo 
{
    static void Main()
    {
        MyStruct myVariable1;
        MyStruct myVariable2;

        myVariable1.SomeInteger = 1;
        myVariable2 = myVariable1;
        myVariable1.SomeInteger = 2;

        Console.WriteLine(myVariable2.SomeInteger);
        Console.ReadLine();
    }
}

The revised program will output a “1” in the console window instead.

This highlights a key difference in the way that Value Type and Reference Type objects are treated in memory. Reference Type objects (in this case, Classes) actually represent a pointer to an address in a part of the computer’s memory known as the Heap. When assigning one variable to be equivalent to another variable, as happens in the line  myVariable2 = myVariable1, what happens in the case of the variables concerned being of a Reference Type is that it is the pointer to the relevant address on the Heap that gets copied to the variable being assigned to, not the actual data that is contained at the address itself. An upshot of this is that when the variable that has been assigned from, in this case myVariable1,  subsequently gets updated so that its SomeInteger field takes on the value “2”, myVariable2 in effect gets its value updated as well, since it is the object that resides at the location in memory that is being pointed at by both myVariable1 and myVariable2 that has been modified.

Values associated with variables of Value Types such as Structs, are stored in a different place in memory, known as the Stack. In the case of Value Types, assigning one variable to be equivalent to another does copy the actual data from the object being assigned from to the object being assigned to. Because of this, when the object that has been assigned from is subsequently updated, the effect of this change is not felt by any variable that had previously been assigned to by that variable. This is why the value “1” is returned in Console window in the second example above, in contrast to what happens when the types being considered are Reference Type variables based on custom Classes.


Classes can have overt Parameterless Constructors, but Structs cannot


As was mentioned earlier, you can use a Struct without explicitly instantiating it using the new() operator. It’s also the case that you may use the new() operator if you wish (which has the effect of instantiating the whole object for use, including any Fields the Struct may contain, by calling an instrinsic parameterless Constructor for the Struct). However, when you’re designing a Struct, it is erroneous to overtly define a parameterless Constructor for the Struct, since one is declared behind the scenes by the C# or VB compiler for you. You may still create additional Constructors that do take parameters. So, whilst the Constructor:

public class MyClass 
{
    public MyClass()
    {
        SomeInteger = 99;
    }

    public int SomeInteger;
}

is fine in a class, a similar Constructor for a struct would result in a compilation error. You may only provide a parametered Constructor for a Struct, like so:

public struct MyStruct 
{
    public MyStruct(int intialIntegerValue)
    {
        SomeInteger = intialIntegerValue;
    }

    public int SomeInteger;
}

The reason for disallowing overtly-defined parameterless Constructors in the case of Structs is that the CLR, as part of that trade off of rapidity of access for completeness of memory management mentioned earlier, does not guarantee to call such Constructors for Value Types, and so any code you place in such a parameterless Constructor would not be guaranteed to run once the program was fully compiled. This is down to the same phenomenon that makes it possible for the Fields of Structs to be accessed and set without initialisation; if the compiler spots an optimisation for your code such that you are only accessing the Fields of a variable that is of a type defined in a Struct, it will not bother generating the IL to call the Constructor for that Struct, and will instead make assumptions that the Fields you are accessing will contain meaningful values. Curiously, if creating the IL directly yourself, it would be perfectly possible to create a parameterless Constructor, and have it run every time, regardless of any apparent optimisation that may be possible, but then if you had a task that required a Constructor to run every time, that’d be a pretty clear sign that what you really needed to use was a Class. For now, in the trade off of best speed for true object orientation in the case of Structs, the designers of the VB and C# languages have decided to take the simplest route, and have prevented parameterless Constructors from being defined for Structs altogether.

You cannot initialise a field within the definition of a Struct


Both Structs and Classes can have Fields. In the case of Classes, it is possible to give a Field an initial value within the definition of the Class itself, like so:

public class MyClass 
{
    public int SomeInteger = 99;
}

Trying to do the same within the definition of a Struct, however, will produce a compilation error:


The reasons behind this difference in the way that Structs and Classes are permitted to operate cuts right to the heart of the different ways these distinct types of object are treated in memory. For Classes, because their values are physically stored on the hardware’s Managed Heap, and because .Net’s Garbage Collection mechanism deals with managing resources on the Heap, the CLR can guarantee to have cleared the area in which objects defined by Classes will be stored before placing any new object in that area. Because of the Garbage Collection process, the CLR can be sure that the Properties and Fields of any objects defined as Classes will either be null, or that any values they do hold will have been deliberately assigned to them. CLR is therefore happy to have Class –type objects that have a mixture of assigned and as-yet-unassigned Fields and Properties.

The Stack, on the other hand, where Structs are stored, is much less well managed (and, as a trade-off, it is generally much quicker to access and modify resources placed there). However, because there is no Garbage Collection process going on in the background to manage that area of memory, Structs will generally be assigned to a random area of the Stack, big enough to contain the Struct, but where the CLR cannot guarantee that any values placed in that area of the Stack have been deliberately assigned to the Struct that now resides there, or whether the area in memory that happens to currently be assigned to a given Field or Property of a Struct was actually a piece of data from another unmanaged object that was left behind when that object was cleared from the Stack at some prior point.

In an effort to mitigate the uncertainty about the whether parts of a Struct have been deliberately set or are merely a throwback from prior use of the same location of the Stack by some other unrelated object, Structs are restricted to having their fields initialised in a certain controlled way, that can be efficiently checked by the compiler. If you use the implicitly-defined Constructor, new MyStruct(), to create a new instance of a Struct, this has the effect of setting all of the Struct’s Fields and Properties to their default values (so, 0 for int -type Fields, etc). If you define your own parametered Constructor for a Struct, the compiler will warn you if you fail to initialise any of the Struct’s Fields within that custom Constructor:


As mentioned earlier, you don’t need to initialise a Struct at all before you begin to assign values to it. If you do use a Struct –type object in this way, and you fail to assign a value to one of the Fields of that Struct variable before assigning the value of that variable to another variable of the same type, you will also get a compilation error:


By removing the possibility that Fields may be set within the definition of the Struct itself, it allows the compiler to efficiently focus on other more straightforward possible places (i.e., within a Constructor, if one is used, or within the scope of a variable defined to be of the type the Struct declares, if no Constructor is used), to determine whether all of the Fields of a Struct-type object have been fully initialised before those Fields can be accessed by or assigned to any other object.

Structs may not have a Destructor


Because instances of Structs exist on the Stack and not on the managed Heap, the concept of Destructors (i.e., an overt override of the Garbage Collection mechanism to allow the developer to specify how to dispose of temporary resources) clearly does not apply to them:





So, in summary, the differences between Classes and Structs are:

·        Classes are Reference types, whilst Structs are Value Types.

·        Classes are physically stored on the Managed Help, whilst Structs are stored on the Stack.

·        The effect of variable assignment works differently for Classes and Structs.

·        Both types of object can implement Interfaces, but only Classes may overtly employ Implementation Inheritance.

·        Structs cannot have overt parameterless Constructors, Classes can.

·        When you define your own parametered Constructor for a Struct, the compiler enforces that you must explicitly set the value of any Fields within that Constructor to avoid said Fields containing spurious values. For Classes, the compiler entrusts the Garbage Collection process with this task.

·        You must instantiate a variable based on a Class, whilst you may instantiate an object based on a Struct (though you don’t have to).

·        Structs may not have default values for Fields, Classes may.

·        Where you choose not to instantiate a Struct –type variable, the compiler will enforce that you must set each Field of that instance of the Struct before you can use the variable for assignment.

·        Structs may not utilise Destructors.


Because of these different characteristics, Structs should be used most often for small structures, where the values contained therein will not be frequently re-assigned between variables, thereby creating a lot of duplicate data in memory, and where there is no need to use Implementation Inheritance.

Classes should be used where a more complex object that will be frequently passed around between parts of the program needs to be modelled, and in instances where it doesn’t matter that assignment between variables copies a reference to the object being assigned from, rather than the underlying data of the object itself being assigned.