Monday, February 27, 2017

BitNext - A Lock-Free queue

This post is about BitNext, a new lock-free queue that Andreia and I designed, based on the lock-free linked list by Tim Harris.

Before we start, if you don't know what Tim Harris's lock-free linked list is, then go and read his paper:

Ok, so BitNext is an MPMC queue, with lock-free progress for enqueue() and dequeue(), linearizable consistency, and memory unbounded (node based).
It is easy to add Hazard Pointers to it and the C++ code we're going to show has them embedded.
Here is the link to our paper on BitNext:

If you're a Java coder then sorry, but this post won't be of much use to you, not that BitNext is not implementable in Java, it's just that similarly to Harris's linked list, it requires some modifications to the algorithm to get it working in systems with a GC (like Java, Scala, D, etc). Such modifications are based on RTTI or AtomicMarkableReference and would take an entire post on itself, but if you're interested, you can take a look at this post

Before removing a node/key from Harris's list, we need to "mark" the next pointer by setting a bit in it. Touching bits in references is a no-no in GC systems and that's why this is a "native-only" approach.
The main trick in the Harris list is that setting this pointer will cause an insertion on the following node to have its CAS fail.
Harris list allows insertions to occurs anywhere on the list, and in a lock-free way. This is better than the Michael-Scott list which only allows insertions at the end (tail) of the list, thus creating a lot of contention on the tail node and tail pointer.
The thing about the Michael-Scott algorithm is that it was designed to be a queue, so we started wondering whether it would be possible to make a queue with Harris list as well. Obviously, it is possible, by always inserting a new node at the tail of the list, but the real question was:

"Can we insert new nodes not after the last node and still maintain FIFO order with linearizable consistency"?
and the answer is, yes, using the BitNext algorithm.

In BitNext, dequeuers start by advancing head only if its next is marked. If next is not marked, the dequeuer will attempt to mark the bit with a CAS.
A node with a marked bit is said to be logically dequeued. If the CAS to set the marked bit on fails, it means another dequeuer was successful, or an enqueuer inserted a new node immediately after node.
Here is what the dequeue() looks like in C++:

For enqueuers there are two scenarios, both starting with the enqueuer reading the current tail into the local variable ltail.
When the ltail is the last node on the list, is null, then, similarly to the Michael-Scott queue, the enqueuer will attempt to insert its node by doing a CAS on from null to its node.
When the ltail is not the last node on the list, the enqueuer will advance the tail to the next node,, and then attempt to insert its node in with a CAS.
This attempt will be done at most maxThreads-1 times, after which, either the CAS is successful, or the ltail node has been logically dequeued, thus disallowing an insertion in If the enqueuer were to insert a node in even if it had a marked bit, this would provide linearizable behavior as long as the next node had not been logically dequeued. Given that we can't provide such a guarantee, we forbid the enqueuer from inserting a node after a logically dequeued node, unless is null, in which case there is no node after it.
Here is what the enqueue() looks like in C++:

Let's walk through the enqueue() code to understand a bit better.
We start by protecting the node we read as tail with an hazard pointer (lines 5 and 6).
Then, we advance the tail by one node if it is not the last node (lines 8 and 9). Notice that we advance by a single node even though the tail may be lagging behind multiple nodes (at most maxThreads-1 nodes behind).
If tail is already the last node, then we use an algorithm similar to Michael-Scott queue insertion (lines 12 and 21).
Then, it's a matter of inserting our node after the node we saw as tail (line 29). Notice that ltail may not be the last node on the list, but that's ok because if this node was already dequeued then it would have its "next" with a marked bit, and the insertion would fail. Because the only way to observer order on the queue is to dequeue items from it, and because the dequeue always marks the bit on next to dequeue the item/node, we know that that an insertion in the middle of the list is only successful if the node's next is not marked, and therefore, it's ok to insert in the middle.
This is however not a sufficient condition to have linearizability: we must guarantee that calls to enqueue() which do not overlap in time (i.e. have an happens-before sequence) have a corresponding global order, and that is why each enqueuer reads the tail and advances it by one (if it is not already the last).

It's a funny trick but it works really well.
The end result is that we get better throughput and better (lower) latency in the tail of the distribution when the queue is under high contention, mostly because BitNext has a natural tendency to spread contention over multiple nodes, instead of all threads hammering on the last node attempting to insert their node.

One the next post we will look at a variant of BitNext which relaxes the advance of the head, improving the dequeue() as well.
For now, here is the C++ code for BitNext:

No comments:

Post a Comment