Code annotations for making parallel programming painless


Many years CPUs were improving performance by increasing clock rate until they reached ceiling of frequency for semiconductor devices. As humanity is far from inventing good alternative, vendors choose another strategy – freeze frequency, but put few cores in a single CPU. Today you can’t already surprise anyone by phone with dual-core CPU. Theoretically, increasing number of cores can proportionally increase performance, but achieving this goal lies totally on us – on software developers. To get advantages of few cores, we need to design software in such a way that it will execute in parallel and effectively use all of available cores.

What does it mean on practice? If you are web-developer or developer of multi-user service – you are lucky. All (or almost all) job of paralleling processing of requests will catch web-server and database management system. But if you process a large amount of data at your own or create that web-servers and DBMSs, problem of parallelism will worry you very much.


Fortunately, progress occurs not only in the world of hardware, but in world of software as well. And Microsoft pay a big attention to problem of parallelism. Last released by the moment .NET framework 4.0 introduces two significant innovations in this area: Parallel LINQ and Task Parallel Library. Even more, C# 5.0 which is now cooked in Microsoft kitchen will give us asynchronous features even on compiler-level. All this features simplify running code in parallel very much.
But the main problem is still not resolved: how can we ensure that calling some code in parallel will not create side effects? Cow can we pretend and avoid race conditions at the moment? No answer at the moment. When you creating code that work in parallel, experience and intuition are two major things you can base on. Don’t you think that intuition is not the best thing we can base on, especially when we are talking about such progressive area of industry as software development? Even more, you maybe agree with me that bugs in paralleled code are one of the hardest bugs to find.
How can we detect potential sources of bugs in parallel code? We need to gather additional information about method we are going to call in parallel. But enough words for now, let’s jump to some sample code.
I’ve created Person class to demonstrate my idea on practice:

public class Person
{
    private readonly HashSet<Person> _friends = new HashSet<Person>();
   
    /// <summary> clock rate
    /// Add friend for current person
    /// </summary>
    public void AddFriend(Person person)
    {
        _friends.Add(person);
    }

    /// <summary>
    /// Get friends of current person
    /// </summary>
    public HashSet<Person> GetFriends()
    {
        return new HashSet<Person>(_friends);
    }

    /// <summary>
    /// Add friends of specified person to friends of current person.
    /// Your friends are my freinds.
    /// </summary>
    public void AddFriendsOf(Person person)
    {
        foreach (Person f in person.GetFriends())
        {
            Person friend = f; // Modified closure
            ThreadPool.QueueUserWorkItem(o => AddFriend(friend));
        }
    }

    /// <summary>
    /// Get friends of friends of current person
    /// </summary>
    public HashSet<Person> GetFriendsOfFriends()
    {
        HashSet<Person> friendsOfFriends = new HashSet<Person>();
        foreach (Person f in GetFriends())
        {
            Person friend = f; // Modified closure
            ThreadPool.QueueUserWorkItem(o => friend.AddFriendsToCollection(friendsOfFriends));
        }
        return friendsOfFriends;
    }

    /// <summary>
    /// Add friends of current person
    /// </summary>
    private void AddFriendsToCollection(HashSet<Person> collection)
    {
        collection.UnionWith(GetFriends());
    }
}


Look carefully at this example (actually, don’t be too careful, because it will make things below not so interesting). Do you see any issues with multithreading? Actually, they are in both of methods that use parallelism: AddFriendsOf and AddFriendsToCollection. In the example both of them are caused by multithreaded code working with not thread-safe collection of type HashSet<T>If we use DBMS, we can eliminate those problems, but now we need those issues for demonstration.
So, let’s find each of those issues. We start with method AddFriendsOf. As you can see, it iterates through friends of specified person and asynchronously adds them to friends of current person using AddFriend method. Take a note that if we need to use variable from foreach in closure (anonymous method or lambda), we need to copy it to local variable. Then we look at AddFriend implementation and see that it adds people to _friends collection using HashSet<T>.Add method. And MSDN says about HashSet<T> that “Any instance members are not guaranteed to be thread safe”.
What about method GetFriendsOfFriends, it iterates through person’s friends and for each of them collects his friends. To collect friends of all persons in a single collection, we use method AddFriendsToCollection and pass collection to it. And this method again is not thread-safe because it modifies parameter common for all threads in a not thread-safe way. The reason why modifying common parameter of type HashSet<T> is not thread-safe is the same as for AddFriend. But I think you may have already forgotten what the issue with AddFriend was. That is exactly the problem!
When you work with some code, you need too keep all the information about thread safety of this code in your head. The idea is to provide developers with some mechanism of defining if some code is thread-safe in some way or not. If so, when creating new method you can review your code and annotations of methods used by it, and so, annotate your new method the same way. Ideally, almost always this process can be automated by some tools like JetBrains ReSharper.
If talking about ReSharper, we need to remember about great code annotating system it uses – JetBrains Code Annotations, which provides two ways of annotating code:
·         Custom attributes in the JetBrains.Annotations namespace. The best way to annotate your new code.
·         A set of external XML files. Useful if you want to annotate existing code, i.e. .NET Framework or third-party libraries.
Annotation framework allows you to declare such facts as parameter or returned value of method can be null or not, string is used as format string, etc.
We can define our own annotation attributes the same way. It is predictable that we will need attributes like this:

/// <summary>
/// Method/property is thread-safe
/// </summary>
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Property)]
public class ThreadSafeAttribute : Attribute
{ 
}

/// <summary>
/// Method/property is not thread-safe
/// </summary>
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Property)]
public class NotThreadSafeAttribute : Attribute
{ 
}

/// <summary>
/// Method/property does not modify objects data.
/// Parameter is not modified by method.
/// </summary>
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Property | AttributeTargets.Parameter)]
public class ReadOnlyAttribute : Attribute
{
}

/// <summary>
/// Method/property modifies objects data.
/// Parameter is modified by method.
/// </summary>
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Property | AttributeTargets.Parameter)]
public class ModifiedAttribute : Attribute
{
}

Let’s add these annotations to our Person class:

public class Person
{
    private readonly HashSet<Person> _friends = new HashSet<Person>();

    /// <summary>
    /// Add friend for current person
    /// </summary>
    [Modified]
    [NotThreadSafe]
    public void AddFriend([ReadOnlyPerson person)
    {
        _friends.Add(person);
    }

    /// <summary>
    /// Get friends of current person
    /// </summary>
    [ReadOnly]
    [NotThreadSafe]
    public HashSet<Person> GetFriends()
    {
        return new HashSet<Person>(_friends);
    }

    /// <summary>
    /// Add friends of specified person to friends of current person (Your friends are my freinds)
    /// </summary>
    [Modified]
    [NotThreadSafe]
    public void AddFriendsOf([ReadOnlyPerson person)
    {
        foreach (Person f in person.GetFriends())
        {
            Person friend = f; // Modified closure
            ThreadPool.QueueUserWorkItem(o => AddFriend(friend));
        }
    }

    /// <summary>
    /// Get friends of friends of current person
    /// </summary>
    [ReadOnly]
    [NotThreadSafe]
    public HashSet<Person> GetFriendsOfFriends()
    {
        HashSet<Person> friendsOfFriends = new HashSet<Person>();
        foreach (Person f in GetFriends())
        {
            Person friend = f; // Modified closure
            ThreadPool.QueueUserWorkItem(o => friend.AddFriendsToCollection(friendsOfFriends));
        }
        return friendsOfFriends;
    }

    /// <summary>
    /// Add friends of current person
    /// </summary>
    [ReadOnly]
    [NotThreadSafe]
    private void AddFriendsToCollection([ModifiedHashSet<Person> collection)
    {
        collection.UnionWith(GetFriends());
    }
}

Now I’ll explain each attribute of each method in detail. AddFriend method does not modify person parameter, so that parameter can be marked as ReadOnly. In the same time, AddFriend modifies containing person, so it is marked Modified, and as it calls not thread-safe method of HashSet, we need to mark it asNotThreadSafe as well.
Method GetFriends creates copy of HashSet in a NotThreadSafe way, and method does not modify anything so it can be marked as ReadOnly.
AddFriendsOf modifies containing object, so we mark it Modified, at the same time parameter person, passed to it, will not be modified, or ReadOnly. Method uses GetFriends method, so it is not thread-safe as well.
GetFriendsOfFriends does not change state of current person in any way, so it is ReadOnly. Method uses GetFriends method, so it is not thread-safe as well.
We will talk about thread safety of methods AddFriendsOf and AddFriendsOf a little bit later.
AddFriendsToCollection does not change state of the current object, so it is ReadOnly, but it modifies parameter collection in a not thread-safe way.
Now look again at the code same code, but with added annotations. Contains much more information regarding thread safety, yeah? Specifically, you can see that both asynchronous operations launched through ThreadPool.QueueUserWorkItem use thread-unsafe methods! Now it is not so hard to implement this kind of checks automatically and produce warnings during compilation.
Although, it is time to fix found issues. I’ll provide most obvious solutions using lock construction, but you can think about more performance-optimized algorithms.
First of all we will fix AddFriend method which uses not thread-safe Add method of HashSet:
[Modified]
[ThreadSafe]
public void AddFriend([ReadOnlyPerson person)
{
    lock (_friends)
    {
        _friends.Add(person);
    }
}

Looks not hard. The next goal is GetFriends method, which returns copy of HashSet _friends:
[ReadOnly]
[ThreadSafe]
public HashSet<Person> GetFriends()
{
    lock (_friends)
    {
        return new HashSet<Person>(_friends);
    }
}

Now each call to GetFriends copies collection of friends in a thread-safe manner, and nobody else will get the same set so you can be sure nobody will scarify your copy.
AddFriendsToCollection is the last easy change:
[ReadOnly]
[ThreadSafe]
private void AddFriendsToCollection([ModifiedHashSet<Person> collection)
{
    lock (collection)
    {
        collection.UnionWith(GetFriends());
    }
}

AddFriendsOf seems to be thread-safe now, as all its methods are thread safe. The same situation is with method GetFriendsOfFriends.
Here I need to make significant note: If all method calls within some method are thread safe, it does not mean this method is also thread-safe. The easiest example to prove this is incrementing some value from few threads simultaneously. Reading value to register, increasing it and writing back are all thread-safe operations, but you well know that union of them is not thread-safe. Vice-versa, if some not thread-safe methods are called from some method without necessary synchronization, this method is also not thread-sefe.
I need to note that correct asynchronous implementation of AddFriendsOf and GetFriendsOfFriends are possible here as we use HashSet to store set of friends, and HashSet does not preserve order. If we uses List and order is significant, we would need to run calculations in separate threads, then join threads and add results in the correct order. Honestly, when I started to write this article, I used List, but then realized this problem and switched to HashSet to simplify example. But we will not mix things up, as correct implementation of asynchronous algorithm and thread safety of this algorithm (possibility to run few algorithms simultaneously) are different things.
After reading this, I hope you feel the idea and how it can simplify parallel programming. Although annotating of code does not add functionality, it can significantly simplify your life and lives of other people who will maintain this code later. You can think about this as about perfective maintenance like refactoring or adding unit-tests. You will to fight against management, if they don’t approve work that does not add functionality, and, of course, with your own laziness. But this is the price we pay to create better quality code.
BTW, C# already has some kind of code annotation keyword – readonly, despite it is used only for checks during compile time and does not affect produced IL code. Hope some time later JetBrains or Microsoft will implement paralleling annotations for code so it will become de-facto of parallel programming and make parallel programming less painful. As for now, we can only use some self-made annotations to make our code a little bit more self-describing.

Comments