Monday, November 25, 2013

StampedLock.tryOptimisticRead() and Invariants

On this post, we are going to talk about the dangers of using optimistic read-locks, namely, how sensitive they are to the invariants.

One noticeable example of an optimistic read-lock, is the StampedLock in JDK 8 when used with the API tryOptimisticRead()/validate(). If you don't know what we're talking about, then take a look at the JDK documentation, or at the presentation by Heinz Kabutz on the links below:
http://download.java.net/jdk8/docs/api/java/util/concurrent/locks/StampedLock.html
http://vimeo.com/74553130  (the relevant part for this post starts at minute 28)

IMO, locks with optimistic reading have three major characteristics (issues?) that anyone using them should be aware of:
  1. Atomicity
  2. Memory Management
  3. Invariants
... actually there is one extra thing to be aware of, which is Barriers (or Fences, if you prefer), but only the developers implementing these kind of locks need to worry about them, and not the end users. If you're interested, you can check out more on a presentation by Hans Boehm here http://concurrencyfreaks.blogspot.fr/2013/10/hans-boehm-on-reader-writer-locks.html


Atomicity

Any variable that is read within a block of tryOptimisticRead() / validate() should be atomic, this is particularly important for references. In Java and Scala this kind of guarantee is given by the JVM itself, but on C11 or C++1x this means that all variables read within an optimistic read lock code block must be of type atomic<>, and in order to take full advantage of the lock, it is better to not use barriers by reading those variables with atomic<>.load(std::memory_order_relaxed)
http://en.cppreference.com/w/cpp/atomic/memory_order
I know what some of you are thinking, that the JVM doesn't give atomicity guarantees for long or double. Well, officially that's true, but in practice it mostly gives those guarantees, but it's not such a big deal on the atomicity itself. Check the Invariants section below.


Memory Management

Again here, the JVM takes care of this problem thanks to the automatic Garbage Collector, which means this is not an issue in Java or Scala.
In other languages like C11 or C++1x this is mostly a deal breaker to the usage of optimistic read locks.
And why is that you may ask?
Well, suppose we are freeing/deleting an object using the write-modify lock of the rw-lock and at the same time there may be another thread doing an optimistic read try to access (read-only) to such an object... boom!!! your app just gave a core dump.

How do you fix this?
You either never cleanup any object that may be accessed in any way by an optimistic read lock, or you have to start using some sort of memory management technique like "Hazard Pointers" or "Drop the Anchor". In C++1x you can probably get away with a shared_ptr.
If you're using an optimistic locking mechanism, it means you're interested in high performance, so using one of these solutions will defeat the purpose (even the shared_ptr?).


Invariants

The first two issues are fairly obvious, but this one has some macabre subtleties. So what do we mean by "Invariants"?
Your code, the one you protect with the write-lock and the one you protect with a read-lock, is easy to reason about when using regular (pessimistic) read-locks, because it will behave the same way as a single-threaded code, this means it behaves as you expect it to behave. The problem happens if you use an optimistic read-lock because, it can happen that another thread is holding the write-lock and modifying some of the variables at the same time as the thread that is attempting the optimistic read-lock. And what's wrong with that, you may ask?
Let's look at an example:
class Boooom {
    int x = 0;
    int y = 1;
}


where one thread is doing this:
while (true) {
    stampedLock.writeLock();
    x++;
    y++;
    stampedLock.writeUnlock();
}


and another thread is doing this:
do {
    stamp = stampedLock.tryOptimisticRead();
    z = 1/(x-y);
} while (stampedLock.validate(stamp));


so what's wrong with the code above?
If we were using a regular (pessimistic) read-lock this would be fine, because the x would always be seen as being y-1. For an optimistic read-lock, this is no longer true, and the x and y can be completely unrelated, due to y being incremented several times after x is read, or the compiling reordering the reads of the x and y variables... pretty much anything can happen. In fact, one of the most likely outcomes is for x to be equal to y, which means that  z = 1/0 which gives an exception in Java, and in other languages it just blows up in your face.

This is just one example of code where some invariant no longer holds when using optimistic read locks.
So how do you fix this?
hummm I guess the old "How do porcupines mate" joke applies here. You have to be extra careful on what code you place in the write-lock and the code that is in the optimistic read-lock. There are so many things that can go wrong that it's not even possible to give you some "rule of thumb" for it :/



So now what?

Well, this is where Double Instance Locking comes to the rescue: http://concurrencyfreaks.blogspot.fr/2013/11/double-instance-locking.html

Double Instance Locking is a pessimistic technique, and just like on a regular read-lock, you don't have to worry about Atomicity, or Memory Management, or Invariants. And on top of that, it is lock-free for read-operations, which is one of the theoretically big advantages of the optimistic read-locks. In practice, optimistic read-locks can easily starve even with low contention on Writers, which the Double Instance Locking will almost never do (depends on the rw-lock it uses).

Let's do a summary table to compare both techniques then:


So what's the conclusion?
Depends on what you need, take a look a the table above, and see which one matches your needs the closest.
If I had to give a rule of thumb it would be: If you want to play it safe, or need low latency for read operations, better go with the "Double Instance Lock". If you need read performance, then none of the other two will beat the "Optimistic Read" approach. If you have a lot of Writers, then go for a regular Reader-Writer lock (or even just a mutual exclusion lock).









Friday, November 22, 2013

Double Instance Locking

Did you know you could make any read-only operation of any single-threaded data structure be lock-free in a multi-threaded program? And not just any data structure, but any object that can be replicated?

... yes, of course you knew that, you can just use a Copy-On-Write (COW) pattern to make a new copy of the object or data structure, thus making any read operation wait-free. This is in fact the main "trick" of the CopyOnWriteArrayList in java.util.concurrent

There are two main issues with the COW pattern though:
1st, it only works on a system with a Garbage Collector, otherwise you have to keep track of how many instances there are, and which threads are accessing which instances, and doing that is most certainly not wait-free (for example, google for "Hazard-Pointers", or "Drop The Anchor"). This means it is mostly unpractical in C/C++
2nd, it can be very slow for Readers (threads doing a read-only operation) because every time a modification occurs, a new copy will have to be made of the entire data structure, which will cause a lot of cache-misses on the Readers. See for example this blog post http://concurrencyfreaks.blogspot.fr/2013/10/immutable-data-structures-are-not-as.html

An alternative to COW is the Double Instance Locking, which is a name I made up, because as far we could tell, Andreia and I invented this technique so we get to name it .... It could be that this technique already exists, so if you heard of it before, please leave a comment below with a reference to it, so that we can acknowledge the true authors, otherwise, we keep using whatever crazy name we came up with  ;)

What's in a Double Instance Locking pattern?

Double Instance Locking requires only two replicas (instances) of the data structure you want to use, and it is perfectly safe to use even on systems without GC. This means it is trivial to use (and implement) in C and C++.
The components of this concurrency control pattern are:
  1. A mutual exclusion lock that serializes access by the mutable operations (Writers);
  2. Two exact instances of the object or data structure that is being protected;
  3. Two Reader-Writer locks (one to protect each instance of the data structure) that supports tryReadLock() and should have Writer-preference;

Simple huh?


How does it work?

Each Writer must work in exclusivity, and for that, it must first acquire the writer's mutex. Then, it must acquire the write-lock of the rw-lock on the first instance and do it's mutable operation on the first instance. It will then release the rw-lock of the first instance and acquire the write-lock of the rw-lock of the second instance. The Writer will then release the lock on the second rw-lock and then release the writer's mutex.
Notice that the Writer must release the first before acquiring the second, otherwise there could be an instant in time where both locks are held, which would "block" the  Readers causing them not to be able to acquire any of the two rw-locks.

The Reader starts by trying to acquire the read-only lock of the rw-lock of the first instance, and if it fails, tries the second rw-lock, and if it also fails, tries the first one again and then the second, and so on, until one is successfully acquired. Notice that the Reader only "reads" one of the instances, the one to which it successfully acquires the read-lock.

Here are the operations in more detail:

Write operation:
  1. Any thread wishing to do a mutable operation must first acquire the writersMutex;
  2. Acquire the write lock on the rw-lock of the first instance;
  3. Execute the mutable operation on the first instance;
  4. Release the write lock on the rw-lock of the first instance;
  5. Acquire the write lock on the rw-lock of the second instance;
  6. Execute exactly the same mutable operation;
  7. Release the write lock on the rw-lock of the second instance;
  8. Release the writersMutex;

Read operation:
  1. Do a tryReadLock on the rw-lock of the first instance, and if it fails then try the same on the second instance, and if it fails then try again the first instance and so on until one of the tryReadLock suceeds;
  2. Execute the read-only operation on the instance corresponding to the lock that was acquired;
  3. Release the lock


What is Double Instance Locking?

- It is a pessimistic concurrency control pattern;
- It is blocking for mutable operations (Writers) and lock-free for read-only operations (Readers);
- It does not need a GC, which means it can easily be implemented and used in C and C++ (and Java, Scala, Python, C#, whatever);
- Needs two identical instances of whatever is going to be protected;


In what scenarios should this pattern be used?

As usual, there are no "silver bullets", the Double Instance Locking is just another tool to help you solve problems in Concurrency.
Below is a table to help us compare a few different techniques:
The optimistic rw-lock technique is the one that is used by Java's StampedLock tryOptimisticRead()/validate().


To sum up this table, if you're using C/C++ then you're pretty much stuck with a "regular" Reader-Writer Lock or with the Double Instance Locking.
If you're using Java/Scala then you have a little bit more choice, but keep in mind that the Copy-On-Write may be significantly slower than the other techniques, and that the "RW-Lock with Optimistic Reads" technique requires some careful usage as can be seen in the talk given by Hans Boehm here http://concurrencyfreaks.blogspot.fr/2013/10/hans-boehm-on-reader-writer-locks.html

The main advantage of the Double Instance Locking is that it is similar to a Reader-Writer lock but allows Readers to run at the same time as a Writer (each on a different instance).
The trade-off is in memory consumption (twice the size of the data structure, but not the user data), and twice the work for write operations. Neither of these should be an issue because memory is cheap, and most data structures have a write-few-read-many access pattern, so the number of write operations should be low anyways.


How come I've never heard of this pattern?

Well, this is something new, and we have an even better technique, that is not just lock-free, but wait-free population oblivious, and it's actually fast. We named that technique "Left-Right" and the paper should be out soon, so stay tuned  ;)


Where can I get more info and source code?

There is a powerpoint presentation at the link below:
https://sourceforge.net/projects/ccfreaks/files/papers/DoubleInstance/DoubleInstanceLocking.pptx

Java code:
http://sourceforge.net/projects/ccfreaks/files/papers/DoubleInstance/DoubleInstanceLockGuard.java
http://sourceforge.net/projects/ccfreaks/files/papers/DoubleInstance/DITreeSet.java

C (pthreads) code:
http://sourceforge.net/projects/ccfreaks/files/papers/DoubleInstance/di_rwlock.c
http://sourceforge.net/projects/ccfreaks/files/papers/DoubleInstance/di_rwlock.h
http://sourceforge.net/projects/ccfreaks/files/papers/DoubleInstance/di_rwlock_example.c

Wednesday, November 13, 2013

Friday, November 8, 2013

Compiler Confidential - GoingNative2013

Here is another talk from Eric Brumer also at GoingNative 2013
http://channel9.msdn.com/Events/GoingNative/2013/Compiler-Confidential
http://channel9.msdn.com/Events/GoingNative/2013/Compiler-Confidential

Topics include:
Modern CPUs, compiler optimizations, instruction pipelines, code generation, vectorization.

If you want to know how a compiler works, this is definitely one of the talks you have to watch.