Writing a Concurrent Stack

The stack is one of the most basic data structures and also one of the most widely used. In this post, I am going to take a stab at writing a concurrent stack – one that may be mutated simultaneously by more than one thread. I have provided a sample C++ implementation at my GitHub page .

The nodes of a stack can be represented by the following struct:

template<typename T>

struct Node
{
T value;
Node<T>* next;
};

The three main operations performed on a stack are push, pop and top (to peek at the top of the stack).

A typical push operation involves:

  1. Creating a new node.
  2. Setting the next pointer of the new node to the top of the stack
  3. Modifying the pointer to the top of the stack to the new node.

In a concurrent environment, we have a race condition between steps 2.) and 3.)

Consider a thread T1 which has run till step 2. At this point, another thread, say T2 is scheduled and it runs all the three steps above. This will cause the pointer to the top of the stack to mutate and hold the new value inserted by T2. T2 unwinds and T1 is resumed. T1 has the stale address of the stack top. It runs step 3. After this, we have lost the node inserted by T2. Apart from causing undefined behavior, this will also cause a memory leak since the memory allocated for the node by T2 will never be freed.

There are multiple ways to solve this issue. The simplest and the most obvious is to serialize the threads when executing steps 2 and 3 by putting them inside a lock (or some other critical section primitive). However, in a highly concurrent environment, this can impact performance due to high contention.

We can achieve higher performance by using non-blocking synchronization using the CAS pattern. In Win32, we have the Interlocked family of functions which implement this.

What we need to do is, before executing step 3, we need to check whether the stack top was modified. If yes, we loop back and execute step 2 again. If it was not modified, we can safely execute step 3. This is achieved using the following:

do
{
newNode->next = stackTop;
} while (InterlockedCompareExchangePointer((volatile PVOID*)&stackTop, newNode, newNode->next) != newNode->next);

A typical pop operation involves:

  1. Caching the pointer to the stack top.
  2. Advancing the stack top to the next node.
  3. Deleting the cached pointer.

Again, there is a race condition between steps 2 and 3. It can cause a double deletion of the same memory address if two threads read in the same value for the stack top. Another subtle issue is that there is also a race condition between steps 1 and 2 when the stack has just one element. Consider two threads, say T1 and T2. T1 has executed step 1. At this point, T2 comes in and executes all the steps. At a future instant, the CPU reschedules T1. T1 tries to execute step 2 and it causes a null reference exception (the stack top has been set to null by T2).

We can again use InterlockedCompareExchange to solve the race conditions. We also need to perform a null check between steps 1 and 2:

Node<T>* topElem;
do
{
topElem = stackTop;
if (NULL == topElem)
return;
} while (InterlockedCompareExchangePointer((volatile PVOID*) &stackTop, stackTop->next, topElem) != topElem);
delete topElem;

The implementation provided in the GitHub page has a few other functions which you can try out.

I will soon be writing another post to show the race conditions by putting the code inside the debugger.

NOTE: There is a problem named as ‘ABA’ which is present in most lock-free implementations of a linked list in an unmanaged language like C or C++. Since this is a simple introduction to the world of non-blocking synchronization, I have not attempted to solve the ‘ABA’ problem. There is a ton of material out there which you can Bing/Google to find out more about it and how to solve it.

Hope you liked this post!

4 thoughts on “Writing a Concurrent Stack

  1. Girish says:

    In the pop routine I see a race condition for single element stack. If two threads sample the single element into top element variable and then before t1 executes the swap/delete logic, if t2 goes first then stack top will be set to null and t1 will dereference null stack top leading to a crash. What do you think?

    Like

Leave a comment