Sunday, June 8, 2014

CLH Reader-Writer Lock


Still on the topic of locks, we decided to use the idea of the CLH lock and adapt it to create a Reader-Writer Lock, which might be new, but we're not sure.
We named this algorithm "CLH Reader Writer Lock" (not very original, I know). We think this is a new algorithm because we don't know of any such approach, but if you find anything similar in the existing literature or on the internet, please send us a link or add it on the comments section.
PS: As Dave Dice has pointed out (thanks Dave!), this algorithm is the combination of the CLH lock on a previous post, with the C-RW-NP described in the paper NUMA-Aware Reader-Writer Locks.

So how does it work?

Similarly to the CLH mutex, each node contains only one field, named succ_must_wait:

typedef struct clh_rwlock_node_ clh_rwlock_node_t;

struct clh_rwlock_node_
{
    _Atomic char succ_must_wait;
};


The lock instance itself contains one extra field, named readers_counter, and some extra padding to reduce false sharing between the three members of clh_rwlock_t:

typedef struct
{
    clh_rwlock_node_t * mynode;
    char padding1[64]; 
    _Atomic (clh_rwlock_node_t *) tail;
    char padding2[64]; 
    _Atomic long readers_counter;
} clh_rwlock_t ;


The basic idea, is that each thread does the atomic_exchange() on the tail, regardless of it being a Reader or a Writer.
To allow multiple Readers simultaneously, when a Reader acquires the read-lock, it will immediately set the succ_must_wait=0 but before it does so, it will increment the readers_counter by one. When a Reader exits, it decrements the readers_counter by one.
The Writer will wait for its turn to acquire the write-lock, spinning on the previous' node succ_must_wait, and once it is zero, it will spin again waiting for the readers_counter to reach zero. This will allow multiple Readers to execute concurrently, while preventing a Writer to enter before its turn.
The following powerpoint presentation shows an animation of how this works:

https://github.com/pramalhe/ConcurrencyFreaks/blob/master/Presentations/CLH-RWLock.pptx



The code for the Readers looks like this:

void clh_rwlock_readlock(clh_rwlock_t * self)
{
    // Create the new node locked by default, setting succ_must_wait=1
    clh_rwlock_node_t *mynode = clh_rwlock_create_node(1);
    clh_rwlock_node_t *prev = atomic_exchange(&self->tail, mynode);

    char prev_islocked = atomic_load_explicit(&prev->succ_must_wait, memory_order_relaxed);
    if (prev_islocked) {
        while (prev_islocked) {
            sched_yield();
            prev_islocked = atomic_load(&prev->succ_must_wait);
        }
    }

    // Incrementing the readers_counter will prevent a Writer from going in
    atomic_fetch_add(&self->readers_counter, 1);
    // This will allow the next thread to go in, but only if it is a Reader
    atomic_store(&mynode->succ_must_wait, 0);
    // This thread has acquired the lock and it is now safe to
    // cleanup the memory of the previous node.
    free(prev);
}

void clh_rwlock_readunlock(clh_rwlock_t * self)
{
    atomic_fetch_add(&self->readers_counter, -1);
}
Notice that the Reader must first increment readers_counter and only afterwards set the succ_must_wait to zero, otherwise a Writer might see the succ_must_wait at zero, and then immediately check readers_counter, see it is zero, and enter the critical section, immediately followed by a Reader which has just incremented readers_counter, and this would be incorrect. Also, the Reader will free() the memory of the previous node on the lock() function.


The code for the Writers looks like this:

void clh_rwlock_writelock(clh_rwlock_t * self)
{
    // Create the new node locked by default, setting succ_must_wait=1
    clh_rwlock_node_t *mynode = clh_rwlock_create_node(1);
    clh_rwlock_node_t *prev = atomic_exchange(&self->tail, mynode);

    // This thread's node is now in the queue, so wait until it is its turn
    char prev_islocked = atomic_load_explicit(&prev->succ_must_wait, memory_order_relaxed);
    if (prev_islocked) {
        while (prev_islocked) {
            sched_yield(); 
            prev_islocked = atomic_load(&prev->succ_must_wait);
        }
    }

    // Even though succ_must_wait is 0, there may be unfinished Readers, so spin/wait
    // until they're over.
    long readers_counter = atomic_load_explicit(&self->readers_counter, memory_order_relaxed);
    if (readers_counter != 0) {
        while (readers_counter != 0) {
            sched_yield();
            readers_counter = atomic_load(&self->readers_counter);
        }
    }
    // This thread has acquired the lock

    self->mynode = mynode;
    free(prev);
}

void clh_rwlock_writeunlock(clh_rwlock_t * self)
{
    atomic_store(&self->mynode->succ_must_wait, 0);
}

As can be seen from the code above, each Writer may have to spin twice: first to wait for its turn, spinning until the succ_must_wait of the previous node becomes zero, and second time waiting for any unfinished Readers to complete.

One very important point is that this Reader-Writer Lock is strictly fair and starvation-free.
Unlike simple spin-lock based RW-Locks, a Writer will never be forced to wait for a late coming Reader (or other Writer). Once the Writer or Reader executes the atomic_exchange() on the tail, it will only have to wait for threads that executed the atomic_exchange() before it, and never for older threads.

Also, notice that we make usage of relaxed atomics in several places, which shows, once again, that relaxed atomics are useful in creating practical implementations of synchronization mechanisms.


Link to C11 source code on github:
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/clh_rwlock.h
https://github.com/pramalhe/ConcurrencyFreaks/blob/master/C11/locks/clh_rwlock.c


It is important to emphasize that the clh_rwlock might not scale well for Readers. We could replace the readers_counter with a scalable counter like the Distributed Cache Line Counter , or LongAdder, or a SNZI, or some other kind of readIndicator mechanism, which should increase the performance for Readers, but there would always be a bottleneck for Readers when doing the atomic_exchange() on the tail. Maybe there is a way to overcome this obstacle, but I haven't figured it out yet.
If you really need a lock that is scalable for Readers, then better go with one of the variants of the Scalable RW-Lock, (named C-RW-WP on this paper)  which we covered on previous posts.

No comments:

Post a Comment