about writing reading

Figuring out threadpools

I have found myself crossing-paths with threadpools quite a lot over the last few weeks. For my first encounter, I found myself pouring over the fast-api docs in an attempt to better understand how it dispatches functions to it's external threadpool (outside the main event-loop). Not long afterwards (during my orderbook rewrite[1]), I began to think about matching-engines again, specifically, high-throughput message handling once an order has been executed (@ 100k's messages p/s). As a matching-engine operates on orders sequentially—where resource allocation is focused on processing incoming orders as fast as possible—post-trade execution requires a pragmatic approach to handling those trade messages reliably outside of the core matching-engine logic (at least that's how I've always interpreted it). This is where threadpools began to show their relevance once more (frequency illusion at it's finest). What this post entails is a brief overview of threadpools, how I went about implementing one and some of the interesting things I learned along the way 🕳️🐇.

What are threadpools and why use them?

Using multiple threads (of execution) allows for multiple tasks to be scheduled within a single process in such a way that they can be handled concurrently. What this means is that rather than waiting for a single function to execute sequentially before moving to the next, a process can try to optimize it's pipeline by scheduling tasks in such a way as to minimize idle time between instructions (e.g. executing instructions in another function while waiting for an asynchronous call to complete). An important cavaet here is that instantiating a thread is an expensive endeavour. A thread runtime requires it's own stack along with copying over any relevant state from the parent thread before beginning execution, all of which requires kernel involvement for scheduler and memory allocation.

T5
T4
T1
T3
T2

Threadpool overview: Each task (T) is enqueued before being executed by a thread
(note the fixed number of threads in the pool)

This is where threadpools come into play. Threadpools can offer an efficient implementation of the traditional multi-threaded model by providing a (fixed) series of long-running threads of execution (pool), rather than going through unnecessary and repeated creation and teardown of threads as work arises. Limiting the number of threads being created can also help mitigate potential resource contention, consumption and excessive context-switching. The important thing to note when using threadpools is that they are rather inelastic, by that I mean, we want to have a relatively good idea of our average workload before deciding to use a threadpool. Afterall, this inelasticity is what can make a threadpool more efficient than spawning threads on demand.

Threadpool considerations

I'll try not to get into the details of threadpool implementations too much as I believe the design considerations offer a better insight into how they operate. The program was written in C and turned out to be much more compact and simple than I anticipated (which I'm super happy about). For those curious about the technical details, I'll provide a link to the source code of thool below[2]. For now, here are some of the considerations and common patterns that I found interesting while writing thool.

These were some of the main considerations when putting thool together. What I found nice about discovering the above is that it distilled the fundamental concepts of a threadpool quite succinctly and gave me enough guidance to go out and implement a very simple version of one!

Further thoughts

While I'm happy with the finished product, I think there are some interesting areas that I would like to explore with thool. I would like to consider adding in a force_add option when queueing tasks onto the threadpool. While this would require a malloc it would hopefully be logarithmic. Finally, I think it would be nice to expand the thool library such that it could be used on Windows machines. This would require writing some kind of wrapper around the pthread module as that is not available on Windows runtimes. Anyways, I hope someone found this useful in some way, I really enjoyed the process 🤙.

[1] Orderbook rewrite

[2] Thool: A lightweight threadpool library

[3] John's blog