Dissecting the Disruptor: What's so special about a ring buffer?

June 22, 2011

Recently we open sourced the LMAX Disruptor, the key to what makes our exchange so fast. Why did we open source it? Well, we've realised that conventional wisdom around high performance programming is... a bit wrong. We've come up with a better, faster way to share data between threads, and it would be selfish not to share it with the world. Plus it makes us look dead clever.

On the site you can download a technical article explaining what the Disruptor is and why it's so clever and fast. I even get a writing credit on it, which is gratifying when all I really did is insert commas and re-phrase sentences I didn't understand.

However I find the whole thing a bit much to digest all at once, so I'm going to explain it in smaller pieces, as suits my NADD audience.

First up - the ring buffer. Initially I was under the impression the Disruptor was just the ring buffer. But I've come to realise that while this data structure is at the heart of the pattern, the clever bit about the Disruptor is controlling access to it.

What on earth is a ring buffer?
Well, it does what it says on the tin - it's a ring (it's circular and wraps), and you use it as a buffer to pass stuff from one context (one thread) to another:

(OK, I drew it in Paint. I'm experimenting with sketch styles and hoping my OCD doesn't kick in and demand perfect circles and straight lines at precise angles).

So basically it's an array with a pointer to the next available slot.

As you keep filling up the buffer (and presumable reading from it too), the sequence keeps incrementing, wrapping around the ring:

To find the slot in the array that the current sequence points to you use a mod operation:

sequence mod array length = array index

So for the above ring buffer (using Java mod syntax): 12 % 10 = 2. Easy.

Actually it was a total accident that the picture had ten slots. Powers of two work better because computers think in binary.

So what?
If you look at Wikipedia's entry on Circular Buffers, you'll see one major difference to the way we've implemented ours - we don't have a pointer to the end. We only have the next available sequence number. This is deliberate - the original reason we chose a ring buffer was so we could support reliable messaging. We needed a store of the messages the service had sent, so when another service sent a nak to say they hadn't received some messages, it would be able to resend them.

The ring buffer seems ideal for this. It stores the sequence to show where the end of the buffer is, and if it gets a nak it can replay everything from that point to the current sequence:

The difference between the ring buffer as we've implemented it, and the queues we had traditionally been using, is that we don't consume the items in the buffer - they stay there until they get over-written. Which is why we don't need the "end" pointer you see in the Wikipedia version. Deciding whether it's OK to wrap or not is managed outside of the data structure itself (this is part of the producer and consumer behaviour - if you can't wait for me to get round to blogging about it, check out the Disruptor site).

And it's so great because...?
So we use this data structure because it gives us some nice behaviour for reliable messaging. It turns out though that it has some other nice characteristics.

Firstly, it's faster than something like a linked list because it's an array, and has a predictable pattern of access. This is nice and CPU-cache-friendly - at the hardware level the entries can be pre-loaded, so the machine is not constantly going back to main memory to load the next item in the ring.

Secondly, it's an array and you can pre-allocate it up front, making the objects effectively immortal. This means the garbage collector has pretty much nothing to do here. Again, unlike a linked list which creates objects for every item added to the list - these then all need to be cleaned up when the item is no longer in the list.

The missing pieces
I haven't talked about how to prevent the ring wrapping, or specifics around how to write stuff to and read things from the ring buffer. You'll also notice I've been comparing it to a data structure like a linked list, which I don't think anyone believes is the answer to the world's problems.

The interesting part comes when you compare the Disruptor with an implementation like a queue. Queues usually take care of all the stuff like the start and end of the queue, adding and consuming items, and so forth. All the stuff I haven't really touched on with the ring buffer. That's because the ring buffer itself isn't responsible for these things, we've moved these concerns outside of the data structure.

For more details you're just going to have to read the paper or check out the code. Or watch Mike and Martin at QCon San Francisco last year. Or wait for me to have a spare five minutes to get my head around the rest of it.

Comments

Jon Harrop4 July 2011 at 10:11
If you don't consume elements from your ring buffer then you're keeping them reachable and preventing them from being deallocated. This can obviously have an adverse effect on the throughput and latency of the garbage collector.

Writing references into different locations in your ring buffer incurs the write barrier, which can also adversely affect throughput and latency.

I wonder what the trade-offs are concerning these disadvantages and when they come into play.
ReplyDelete
Replies
Michael Barker4 July 2011 at 21:36
With regards to use of memory, no real trade offs are made by the Disruptor. Unlike a queue, you have a choice about how to make use of memory. If solution is a soft real-time system, reducing GC pauses is paramount. Therefore you can re-use the entries in the ring buffer, e.g. copying byte arrays to and from network I/O buffers in and out of the ring buffer (our most common usage pattern). As the amount of memory used by the system remains static is reduces the frequency of garbage collection.

It is also possible to implement an Entry that contains a reference to an immutable object. However in that situation it may be necessary for the consumer to null out the message object to reduce the amount of memory that needs to be promoted from Eden. So a little more effort is required from the programmer to build the most appropriate solution. We believe that the flexibility provided justifies this small bit of extra effort.

Considering the write barrier, the primary goal of the Disruptor is to pass messages between threads. We make no trade offs regarding ordering or consistency, therefore it is necessary to use memory barriers in the appropriate places. We've done our utmost to keep this to a minimum. However, we are many times faster than the popular alternatives as most of them use locks provide consistency.

Mike.
ReplyDelete
Replies
sauron26 July 2011 at 08:45
How does this approach compare to the Pool approach and other approaches used here: http://cacm.acm.org/magazines/2011/3/105308-data-structures-in-the-multicore-age/fulltext

Why not use a Pool instead of a queue? Is the LIFO requirement essential?
ReplyDelete
Replies
Trisha26 July 2011 at 09:34
Unfortunately I can't read that article because I don't have an account at that site.

FIFO (not LIFO) is absolutely essential - our exchange depends upon predictable ordering, and if you play the same events into it you will always get the same outcome. The Disruptor ensures this ordering without taking the performance penalties usually associated with FIFO structures.
ReplyDelete
Replies
Shane Tolmie4 August 2011 at 16:40
@Flying Frog Consultancy said "If you don't consume elements from your ring buffer then you're keeping them reachable and preventing them from being deallocated. This can obviously have an adverse effect on the throughput and latency of the garbage collector."

The whole point is to *not* to invoke the garbage collector. The Disruptor pattern allows data to be passed between CPU's at pretty much the theoretical maximum of the hardware - its been well thought out
ReplyDelete
Replies
Anonymous6 December 2011 at 13:03
I'm new to Disruptor pattern. I have a very basic question. How do I add messages to the Ring Buffer from a Multi Threaded Producer. Should the add calls to the Ring Buffer be synchronized?

Thank you.
ReplyDelete
Replies
Trisha7 December 2011 at 14:48
Generally the aim is not to run anything multi-threaded. Producers and consumers should be single-threaded. But you can have more than one producer: http://mechanitis.blogspot.com/2011/07/dissecting-disruptor-writing-to-ring.html - this is a slightly out of date post, the naming conventions have changed and the producer barrier is now managed by the ring buffer, but I think this might be a good place to start to think about how to solve your problem.
ReplyDelete
Replies
Anonymous21 December 2011 at 08:40
Thank you!
ReplyDelete
Replies
Sourabh Ghose1 February 2012 at 04:23
Hi Trisha,

Thanks for interesting the article.
I am not sure if I understand, but the concept of holding on to memory and reusing already allocated objects to avoid GC pauses does not seem to be new. How is the ring buffer different from an object pool?

Thanks!
ReplyDelete
Replies
Trisha1 February 2012 at 07:55
Hi Sourabh,

Avoiding GC is not the main aim of the RingBuffer, although it does help towards the speed of the Disruptor. The interesting characteristics of the RingBuffer are that it's FIFO, and it enables some really nice batching when you read from it. The RingBuffer is not the secret sauce in the Disruptor's performance, in fact in the current version of the Disruptor you don't need it at all.

It's worth noting that there's nothing new in the Disruptor at all, in fact many of the ideas have been around for years. But I don't think there are any other frameworks in Java that pull together these concepts in this way to give the kind of performance we see when using the Disruptor.
ReplyDelete
Replies
Unknown26 March 2012 at 09:25
Hi Trisha,
From a few days I discovered the LMAX architecture and disruptor also, It's not so clearly to my how exactly the consumers extract the messages from RingBuffer and how exactly a consumer, for example C1, know which messages are for it and not for other consumer, C2 ?
Thanks
Sorin.
ReplyDelete
Replies
Prasanth Rajan4 October 2012 at 11:24
Hi,

Great Work.

Could you tell me, How RingBuffer handles overflow?

Thanks
ReplyDelete
Replies
Nikhil10 June 2013 at 08:58
This comment has been removed by the author.
ReplyDelete
Replies
Nikhil10 June 2013 at 09:03
Hi Trisha,
Thanks for this and other presentations.

I have a question regarding the disruptor which is rather basic.

The consumers (event processors) are not implementing any of the Callable or Runnable interfaces they implement EventHandler, Then how can they run in parallel, so for example I have a disruptor implementation where there is a diamond pattern like this

P1 - [c1,c2,c3] - c4 - c5

Where c1 to c3 can work in parallel after p1, and C4 and C5 work after them.

So conventionally I'd have something like this (with P1 and C1-C5 being runnables/callables)

p1.start();
p1.join();

c1.start();
c2.start();
c3.start();
c1.join();
c2.join();
c3.join();

c4.start();
c4.join();
c5.start();
c5.join();

But in case of the Disruptor none of my event handlers implement Runnable or Callable, so how'd the disruptor framework end up running them in parallel?

Take following sceanrio:

My consumer C2 requires to make a webservice call for some annotation to the Event, In SEDA I can startup 10 threads for such 10 C2 requests [for pulling the message out of queue + make Webservice Call and update the next SEDA Queue] and that will make sure that I don't sequentially wait for a web service response for each of the 10 requests
where as in this case my eventprocessor C2 (if) being the single instance would wait sequentially for 10 C2 requests.

http://stackoverflow.com/questions/17019203/disruptor-are-the-consumers-multithreaded

-Regards
Nikhil
ReplyDelete
Replies
Richard1 September 2013 at 06:20
Hi Trisha,

In Java, creating an array of Java objects does not allocate memory for the objects. It only allocates memory for references to the objects. How does an array of object references help improve CPU caching efficiency because the actual objects are still scattered in the heap?

Thanks,
Richard
ReplyDelete
Replies
Unknown23 September 2013 at 19:59
Hi Trisha;

I've just spend a good part of my day going through your disruptor and I can see from your example the hundreds of millions of ops per second and also from the presentation 6 million trades per second. I've just written an example with the producer retrieving the operation request from a webservice with 2 consumers one for marshaling and the other for business logic and my throughput is just over 1000 ops per second source (https://github.com/ejosiah/activemq-vs-distruptor)

My question does any of your metrics include I/O operations from other consumers such as those for (Journalling, replication, serialization, etc)
ReplyDelete
Replies

Add comment

Trisha's Ramblings

This Blog Has Moved!

Dissecting the Disruptor: What's so special about a ring buffer?

Comments

Post a Comment

Popular posts from this blog

Dissecting the Disruptor: Writing to the ring buffer

Dissecting the Disruptor: Why it's so fast (part one) - Locks Are Bad