How to create a deep copy of an object
Introduction
There might be some situations where we want to create an exact copy of a complex object graph. Like other coding languages C# does't provide any out of the box solution for this. We therefor have to come up with our own one.
When it comes to cloning we first have to understand the difference between a value type
and a reference type
. Also we need to distinguish between a shallow copy
and a deep copy
.
Value & reference types
I wont go into the details of which are stored on the stack and which on the heap, as I think it's more conductive to think of the differences between value types
and reference types
in terms of their semantic rather their implementation details.
A variable of a value type
directly holds a value of its type. If you assign it to another variable, the value is copied directly and both variables work independently. That is, they are always copied by value.
A variable of a reference type
in contrast doesn't store its value directly, but a reference (pointer/address) to a location where the value is stored. Because reference types
represent the address of the variable rather than the data itself, assigning a reference variable to another doesn't copy the object it is pointing to. Instead it creates a second copy of the reference, which points to the same object as the original value.
So to copy a value type
its enough to assign it to a new variable of the same type. However to copy a reference type
we have to allocate memory (read: the GC will have to), copy the content and take care of further references this reference type may point to. That's why the term object graph
is used.
For the sake of completeness here is a list of the built-in types:
Value types | Reference types | Comments |
---|---|---|
bool | class | |
byte, char, int, long | interface | |
sbyte, short | delegate | |
uint, ulong, ushort | dynamic | |
decimal, double, float | object | |
enum | string | behaves like a value type |
struct | array | even if their elements are value types! |
Shallow & deep copy
When creating a shallow copy
of an object graph the value and reference fields are getting copied to a new object. As a reference is a memory address, only the address is getting copied but not the actual data it is pointing to. This implies that a change to one of this reference fields will also affect all (shallow) copies of the original object.
A deep copy in contrast is independent from the original object. All containing reference types within an object graph are getting copied and are therefor pointing to its own memory addresses holding a real copy of the original data, which therefor can be altered without affecting the orignal one.
I hope I haven't bored the hell out of you so far. Now lets have a look at some possible solutions.
Example object graph
In the spirit of the DRY principle this is an example object graph I will refer to throughout this post. I left aside a complete list of all value types
like enums
as they behave the same in this context.
public delegate void Handler(string m);
[Serializable]
public class Course
{
public Course(string title)
{
this.Title = title;
}
public string Title { get; set; }
}
[Serializable]
public class Address
{
public int Housenumber { get; set; }
public string Street { get; set; }
}
[Serializable]
class Student
{
private readonly double Pi = 3.14159;
public const int UniversalAnswer = 42;
public Handler Handler { get; set; }
public Action<string> OutputHandler { get; set; }
public Course[] Subscriptions { get; set; }
public double[] Grades { get; set; }
public int Age { get; set; }
public string Name { get; set; }
public Address Address { get; set; }
public void PrintSummary(Action<string> action)
{
var message = $"Name: {this.Name} Age: {this.Age}";
this.OutputHandler?.Invoke(message);
}
}
Serialize and deserialize your instance
Consider the following generic extension method, which uses the BinaryFormatter
class. This serializes and deserializes an object, or an entire graph of connected objects in binary format.
The class provides two relevant methods:
- public void Serialize (System.IO.Stream serializationStream, object graph);
- public object Deserialize (System.IO.Stream serializationStream);
As you can see from the signatures both methods require an instance of type System.IO.Stream
to work and last but not least our object we would like to copy. For our purpose the MemoryStream
class comes in handy. It inherits from System.IO.Stream
and creates a stream whose backing store is in memory. This is just perfect as we don't want to write to disc or anything else. In order to deserialize from the stream just after the serialization part we need to set the streams position to the beginning position. This is done by memoryStream.Position = 0
.
public static class DeepCloneExtensions
{
public static T DeepCloneByStream<T>(this T obj)
{
using (var memoryStream = new MemoryStream())
{
var formatter = new BinaryFormatter();
formatter.Serialize(memoryStream, obj);
memoryStream.Position = 0;
return (T)formatter.Deserialize(memoryStream);
}
}
}
Now lets have a look at an usage example.
[Fact]
public void Should_Clone_By_Stream()
{
// Arrange
var student = new Student
{
Age = 30,
Name = "Bommelmaier",
Address = new Address { Street = "Baker", Housenumber = 12 },
Grades = new[] { 1.0, 2.0 },
Subscriptions = new[] { new Course("Mathematics") },
OutputHandler = OutputToDebug
};
// Act
var copycat = student.DeepCloneByStream();
student.Age = 40;
student.Name = "Smith";
student.Address.Housenumber = 99;
student.Grades[0] = 3.0;
student.Subscriptions[0] = new Course("Ethics");
student.OutputHandler = OutputToConsole;
// Assert
copycat.Age.Should().NotBe(student.Age);
copycat.Name.Should().NotBe(student.Name);
copycat.Address.Should().NotBeSameAs(student.Address);
copycat.Grades.Should().NotBeSameAs(student.Grades);
copycat.Subscriptions.Should().NotBeSameAs(student.Subscriptions);
copycat.OutputHandler.Should().NotBeSameAs(student.OutputHandler);
}
private void OutputToDebug(string m)
{
Debug.WriteLine(m);
}
private void OutputToConsole(string m)
{
Console.WriteLine(m);
}
This actually works, but faces a few drawbacks...
First it requires every class in the object graph to be marked as [Serializable]
. Only marking the top most class as such isn't enough and will result in an System.Runtime.Serialization.SerializationException
. This is no big deal in this simple example, but will become unpractical if not impossible if you want to clone a reference type which is out of your control.
Secondly the serialization of our delegate property named Handler
won't work on a project targeting .Net Core
and will result in a SerializationException
saying Serializing delegates is not supported on this platform.
. The same applies for the generic Action<string> OutputHandler
delegate.
That would have been too easy. So lets look at other possible technics.
Using reflection and recursion
Now lets have a look at more complex approach. The following extension method uses reflection and recursion to walk through an object graph. Remeber when I said that semantic is more important then implementation details? This is especially true for System.String
. It is a reference type
but behaves like a value type
. That's why we can handle all value types
and string
the same way. However we need to take special care for delegates
and arrays
, where I am using ICloneable.Clone()
to create copies.
public static public static class DeepCloneExtensions
{
public static T DeepCloneByReflection<T>(this T source)
{
var type = source.GetType();
var target = Activator.CreateInstance(type);
foreach (var propertyInfo in type.GetProperties())
{
// Handle value types and string
if (propertyInfo.PropertyType.IsValueType ||
propertyInfo.PropertyType == typeof(string))
{
propertyInfo.SetValue(target, propertyInfo.GetValue(source));
}
// Handle delegates
else if (propertyInfo.PropertyType.IsSubclassOf(typeof(Delegate)))
{
var value = (Delegate)propertyInfo.GetValue(source);
if (value != null)
{
propertyInfo.SetValue(target, value.Clone());
}
}
// Handle arrays
else if (propertyInfo.PropertyType.IsSubclassOf(typeof(Array)))
{
var value = (Array)propertyInfo.GetValue(source);
if (value != null)
{
propertyInfo.SetValue(target, value.Clone());
}
}
// Handle objects
else
{
var value = propertyInfo.GetValue(source);
if (value != null)
{
propertyInfo.SetValue(target, value.DeepCloneByReflection());
}
}
}
return (T)target;
}
}
This seems the best approach so far, as we don't have to declare all our classes as Serializable
. Also we are able to handle (multicast-) delegates.
But as always there is still room for improvement. The recursion loop will produce a StackoverflowException
as a result of an infinite loop in case our source graph contains a circular reference. Secondly the method can't properly handle variables of type dynamic
.
I hope this was somehow helpful and will leave an insight into MemberwiseClone()
and the ICloneable
interface for another post.
Best, Matthias