about writing reading

Writing a more efficient orderbook

It's been just over a year since I last wrote a simple order-book[1] and recently, I've decided it would be a good idea to make another attempt at writing one over a weekend to see how my general knowledge of C and data-structures have improved. My original motivation for this project was to get a grasp of some basic C++ syntax while exploring some of the core mechanics behind market-making systems. While I believe matching-engine patterns are mostly a solved problem in software (where your edge is more likely to come from hardware rather than software), building one encourages you to think critically of your data-structure choices while being considerate of the memory-footprint around such structures. The need for efficiency is due to the fact that matching-engines must handle order-matching sequentially (handling 100,000's of messages per second) and as such, can quickly become a bottleneck for an exchange.

What does an efficient orderbook look like?

Before evaluating my original implementation along with the changes made in my second attempt, I'll quickly explain some of the requirements expected of an order-book[2]. We should aim to implement order operations such as add, cancel and execute in O(1) time. Notably, wkselph[2] raises an interesting point, stating that your most common operations are going to be adds and cancels, with executions coming a distant third. This will be an important point to consider later on. With regards to limit operations, it's important to ensure that the top of the book (min-asks/max-bids) are tracked in O(1) time—allowing new orders crossing the order-book to be matched with their opposition (min-ask or max-bid) as quickly as possible—and limit volumes should be tracked so that order-book volume can be discovered efficiently. Both individual limits and orders should be accessible in O(1) time respectively.

My attempts

For my first attempt, I managed to cobble together a very basic matching-engine that handled FIFO order execution using limit and market orders. Despite learning a lot from this attempt, I struggled to yield an efficient implementation and my choice of data-structures had quite a negative impact on performance. For example, limits were stored using a hash-map, with each limit being keyed by it's price point (allowing limit and order adds in O(1) time). A benefit of using hash-maps to track price-points is that we can accommodate large gaps between limits, so sparsely populated books will consume less memory. This allows times of price-volatility to be handled conveniently too, however, all of this comes at the cost of dynamic allocation. While limits were tracked using a map, orders were allocated at runtime and stored in a doubly-linked-list under each limit. This made tracking orders difficult, where removing an order would take O(n) time (n representing the total number of orders and m being total limits). From an implementation perspective, using these data-structures kept the code very simple, however, the penance for taking the easier route here was that most (if not all) of the critical objects involved in this engine were heap-allocated.

This brings me to my second attempt that I quickly hacked together over the weekend. Being a year since my last approach, I've had some time to reflect on how I can make a better attempt at the original implementation. Something I would like to focus on with this attempt is to try and keep all object initialisation static, improving the runtime complexity for locating orders and to provide a more optimal way of sorting ask/bid limits. All of the above was written in C during this attempt, which mean't that many of the required data-structures needed to be built from scratch. To achieve this, orders and limits are now initialised in pre-defined arenas (arrays) which are fixed in size. Limits are keyed into the arena using their price-point as the key—this comes with the disadvantage of being less flexible in times of excessive price-volatility as the price-point range is now fixed rather than dynamic—and allows limits and orders to be accessed (sequentially if we're lucky) in O(1) time respectively. Limits are stored in priority-queues[3] using their price-points only, which means that we maintain a sorted order of active bid and ask limits in O(logm) time. This also allows us to retain our top-of-book access times at O(1) too!

Free orders are stored in a stack (free_orders) using the order's array index as the key. This allows efficient re-use of recently "freed" orders, where more recently used orders appear toward the top of the stack. What's nice about this approach is that more recently used orders have a higher likelihood of remaining within the (hot) cache and should therefore allow for more efficient access times because they have not yet been ejected from the cache yet (leveraging spatial and temporal locality). Similar to the limit arena, we are limited to the number of orders we can support for a given ticker-symbol, but this can be resized on a per-symbol-basis based on expected volume. Finally, all structs have been packed to reduce memory slop[4] [5].

Looking ahead

Ultimately, the final result came out quite well for a short weekend project and served as a nice intro into arena allocation, structure-packing and priority queues. If I were to expand upon this project further, I would like to make a simple event system where executed trades are written to lockfree buffer and later handled by another thread accordingly, but that can be a task for another day. The order-book code can be found on github for anyone interested and feel free to reach out if you see any mistakes or improvements.

[1] Simple order-book

[2] How to build a fast limit order-book

[3] Priority queues

[4] What every programmer should know about memory

[5] The lost art of structure packing