Coding for performance: Struct vs Class


Instances of a class are always allocated on the heap and accessed via a pointer dereference. Passing them around is cheap because it is just a copy of the pointer (4 or 8 bytes). However, an object also has some fixed overhead: 8 bytes for 32-bit processes and 16 bytes for 64-bit processes. This overhead includes the pointer to the method table and a sync block field that is used for multiple purposes. However, if you examined an object that had no fields in the debugger, you would see that the size is reported as 12 bytes (32-bit) or 24 bytes (64-bit). Why is that? .NET will align all objects in memory and these are the effective minimum object sizes.

A struct has no overhead at all and its memory usage is a sum of the size of all its fields. If a struct is declared as a local variable in a method, then the struct is allocated on the stack. If the struct is declared as part of a class, then the struct’s memory will be part of that class’s memory layout (and thus exist on the heap). When you pass a struct to a method it is copied byte for byte. Because it is not on the heap, allocating a struct will never cause a garbage collection.

There is thus a tradeoff here. You can find various pieces of advice about the maximum recommended size of a struct, but I would not get caught up on the exact number. In most cases you will want to keep struct sizes very small, especially if they are passed around, but you can also pass structs by reference so the size may not be an important issue to you. The only way to know for sure whether it benefits you is to consider your usage pattern and do your own profiling.

There is a huge difference in efficiency in some cases. While the overhead of an object might not seem like very much, consider an array of objects and compare it to an array of structs. Assume the data structure contains 16 bytes of data, the array length is 1,000,000, and this is a 32-bit system.

For an array of objects the total space usage is:

8 bytes array overhead +
(4 byte pointer size × 1,000,000) + 
((8 bytes overhead + 16 bytes data) × 1,000,000)
 = 28 MB

For an array of structs, the results are dramatically different:

8 bytes array overhead + 
(16 bytes data × 1,000,000) 
= 16 MB

With a 64-bit process, the object array takes over 40 MB while the struct array still requires only 16 MB.

As you can see, in an array of structs, the same size of data takes less amount of memory. With the overhead of objects, you are also inviting a higher rate of garbage collections just from the added memory pressure.

Aside from space, there is also the matter of CPU efficiency. CPUs have multiple levels of caches. Those closest to the processor are very small, but extremely fast and optimized for sequential access.

An array of structs has many sequential values in memory. Accessing an item in the struct array is very simple. Once the correct entry is found, the right value is there already. This can mean a huge difference in access times when iterating over a large array. If the value is already in the CPU’s cache, it can be accessed an order of magnitude faster than if it were in RAM.

To access an item in the object array requires an access into the array’s memory, then a dereference of that pointer to the item elsewhere in the heap. Iterating over object arrays dereferences an extra pointer, jumps around in the heap, and evicts the CPU’s cache more often, potentially squandering more useful data.

This lack of overhead for both CPU and memory is a prime reason to favor structs in many circumstances—it can buy you significant performance gains when used intelligently because of the improved memory locality.

Because structs are always copied by value, you can create some interesting situations for yourself if you are not careful. For example, see this buggy code which will not compile:

struct Point {
public int x; public int y;
}
public static void Main()
{
List<Point> points = new List<Point>();
points.Add(new Point() { x = 1, y = 2 });
points[0].x = 3; }

The problem is the last line, which attempts to modify the existing Point in the list. This is not possible because calling points[0] returns a copy of the original value, which is not stored anywhere permanent. The correct way to modify the Point is:

Point p = points[0];
p.x = 3;
points[0] = p;

However, it may be wise to adopt an even more stringent policy: make your structs immutable. Once created, they can never change value. This removes the above situation from even being a possibility and generally simplifies struct usage.

I mentioned earlier that structs should be kept small to avoid spending significant time copying them, but there are occasional uses for large structs. Consider an object that tracks a lot of details of some commercial process, such as a lot of time stamps.

class Order
{
public DateTime ReceivedTime {get;set;}
public DateTime AcknowledgeTime {get;set;}
public DateTime ProcessBeginTime {get;set;}
public DateTime WarehouseReceiveTime {get;set;}
public DateTime WarehouseRunnerReceiveTime {get;set;}
public DateTime WarehouseRunnerCompletionTime {get;set;}
public DateTime PackingBeginTime {get;set;}
public DateTime PackingEndTime {get;set;}
public DateTime LabelPrintTime {get;set;}
public DateTime CarrierNotifyTime {get;set;}
public DateTime ProcessEndTime {get;set;}
public DateTime EmailSentToCustomerTime {get;set;}
public DateTime CarrerPickupTime {get;set;}
// lots of other data ...
}

To simplify your code, it would be nice to segregate all of those times into their own sub-structure, still accessible via the Order class via some code like this:

Order order = new Order();
Order.Times.ReceivedTime = DateTime.UtcNow;

You could put all of them into their own class.

class OrderTimes
{
public DateTime ReceivedTime {get;set;}
public DateTime AcknowledgeTime {get;set;}
public DateTime ProcessBeginTime {get;set;}
public DateTime WarehouseReceiveTime {get;set;}
public DateTime WarehouseRunnerReceiveTime {get;set;}
public DateTime WarehouseRunnerCompletionTime {get;set;}
public DateTime PackingBeginTime {get;set;}
public DateTime PackingEndTime {get;set;}
public DateTime LabelPrintTime {get;set;}
public DateTime CarrierNotifyTime {get;set;}
public DateTime ProcessEndTime {get;set;}
public DateTime EmailSentToCustomerTime {get;set;}
public DateTime CarrerPickupTime {get;set;}
}
class Order
{
public OrderTimes Times;
}

However, this does introduce an additional 12 or 24-bytes of overhead for every Order object. If you need to pass the OrderTimes object as a whole to various methods, maybe this makes sense, but why not just pass the reference to the entire Order object itself? If you have thousands of Order objects being processed simultaneously, this can cause more garbage collections to be induced. It is also an extra memory dereference.

Instead, change OrderTimes to be a struct. Accessing the individual properties of the OrderTimes struct via a property on Order (e.g., order.Times.ReceivedTime) will not result in a copy of the struct (.NET optimizes that reasonable scenario). This way, the OrderTimes struct becomes essentially part of the memory layout for the Order class almost exactly like it was with no substructure and you get to have better-looking code as well.

This technique does violate the principle of immutable structs, but the trick here is to treat the fields of the OrderTimes struct just as if they were fields on the Order object. You do not need to pass around the OrderTimes struct as an entity in and of itself—it is just an organization mechanism.

Source :  : Performance Considerations of Class Design and Gen

For all your application development needs, visit www.verbat.com for a fiscally conscious proposal that meets your needs ( So I can keep this blog going as well!!!!)

Alternatively click through the link   if you found this article interesting. (This will help the companies Search engine rankings)

One thought on “Coding for performance: Struct vs Class

Add yours

Leave a comment

Website Powered by WordPress.com.

Up ↑