Saturday, November 19, 2016

Log2ArrayQueue - MPMC lock-free queue (part 2 of 4)

On the previous post we talked about the first of four linked-list-of-arrays-based queue, and today we're going to talk about the second variant, named Log2ArrayQueue, and you can get the Java source code here:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Java/com/concurrencyfreaks/queues/array/Log2ArrayQueue.java
and the C++ source code here (with memory reclamation):
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/CPP/queues/array/Log2ArrayQueue.hpp

Log2ArrayQueue is very similar to LinearArrayQueue, the only difference is that in LinearArrayQueue the enqueuers/dequeuers start from the entry zero in the array searching for a null/non-taken entry, while on Log2ArrayQueue they do a binary search on the array.
That's right, this is an array, and we know that there is one entry in the array that separates the available from the unavailable items, so what better than to do than a binary search?
A binary search will take at most log2 steps to complete, which is short even for a large array. For example, searching in an array with one million entries will require 20 atomic loads (std::memory_order_acq). This may seem fast (and it is) but unless the whole array is already in cache, it's going to give something close to 20 cache-misses, which is slow, therefore, it's better not to use a very big array for that reason.


A non-obvious detail is that the items that are available/unavailable are changing on the fly, concurrently with the binary search.
This may seem tricky to get right, but there is no problem because, once an entry changes from nullptr to something non-null, it will be so forever, and similarly for changing from something to "taken".
Enqueuers care only where is the first non-null entry. If they are "behind" the first non-null it's fine, they'll catch up doing a linear search from that one.


Here is the code for the binary search used by the enqueuers:
    long findFirstNull(Node* node) {
        if (node->items[0].load() == nullptr) return 0;
        long minPos = 0;
        long maxPos = BUFFER_SIZE-1;
        while (true) {
            long pos = (maxPos-minPos)/2 + minPos;
            if (node->items[getIndex(pos)].load() == nullptr) {
                maxPos = pos;
            } else {
                minPos = pos;
            }
            if (maxPos-minPos <= 3) return minPos;
        }
    }


Dequeuers care only where is the first non-taken entry. If they are "behind" the first non-taken it's fine, they'll catch up doing a linear search from the first non-taken. If they find a null entry, it means the queue is empty and they return nullptr to indicate so.
Schematic 2


Here is the code for the binary search used by the dequeuers:
    long findLastTaken(Node* node) {
        if (node->items[BUFFER_SIZE-1].load() == taken) return BUFFER_SIZE-1;
        long minPos = 0;
        long maxPos = BUFFER_SIZE-1;
        while (true) {
            long pos = (maxPos-minPos)/2 + minPos;
            if (node->items[getIndex(pos)].load() == taken) {
                minPos = pos;
            } else {
                maxPos = pos;
            }
            if (maxPos-minPos <= 3) return minPos;
        }
    }

   
Here are some microbenchmarks comparing the Log2ArrayQueue lock-free queue using different array sizes:





 


On the next post we will look at the third variant of these array queues.

2 comments:

  1. In enqueue(), it is possible that more than one thread that detects a full node will new up a new Node but only one of them wins, therefore the others are wasted (more GC?). I've been experimenting with MPSC Linked-Array Queue myself (https://gist.github.com/akarnokd/0699387c4ed43f5d4386) some time ago and I received suggestions from Nitsan Wakart (JCTools) that the code could spin on a special "allocating" marker in "next". (In fact, the idea of linked "islands" in an SPSC queue came from Nitsan and I tried expanding that into MPSC with unsatisfactory results.)

    ReplyDelete
    Replies
    1. Yes the GC will have to do more work to cleanup the unused node. One alternative is to insert it one the next node, of if that fails, on the next-to-next node, and so on. I'll have to try that as well to see if it makes a difference in the benchmarks.
      Thanks for pointing it out!

      I would assume that if the code is "spinning" on a marker, then it means it's blocking, not lock-free, and we're only interested in lock-free or wait-free queues.

      Delete