How we built a queue on top of Kafka

Hey everyone! :wave:

Just sharing a new engineering blogpost on How we built a queue on top of Kafka

Quickly, a little bit about me. I’m Constantijn. I joined Monzo in 2017 and have been part of our Backend engineering discipline ever since. I’ve worked on many of your (hopefully) favorite features in that time, but that is for a different discussion. :see_no_evil:

Back to the blog post - Monzo introduced Kafka to its technology stack around the time that I joined the company. By pure coincidence I was on the first product squad (one of only two!) to use Kafka for real. I felt first hand the pain of trying to use the library we’d crafted to use Kafka as a queue, though we didn’t consciously realise that’s what we were doing at the time. First I made a few small changes for my own benefit and before I knew it I was a primary maintainer together with Kieran Gorman who had done the same thing. Fast-forward to now and in many ways the two of us still are. This blog post is something I’ve wanted to share for years and covers core parts of the journey we embarked on when we both naively raised our first “small change” pull requests.

I’m excited to hear what you think and answer any questions you might have.

7 Likes

Getting through to support is kafkaesque indeed, so you chose the platform well.

10 Likes

Sounds great! Must admit I’m surprised that rabbitMQ wasn’t used.

I decided to do some archaeology off the back of this comment. Turns out the very first version of Monzo did use RabbitMQ for synchronous inter-service communication. The original team started there because that is what they were familiar with. They ultimately removed it because getting it to run in cluster mode was challenging. Here is a quote I got from one of our founding engineers:

Anyone who has run rabbitmq at scale probably won’t want to do that a second time :joy: Their clustering page literally used to say that clustering mode only worked on reliable networks…

A lot may have changed since then, but I hope this shines some light on why we didn’t reconsider it when looking for an NSQ alternative.

4 Likes

Valid point, no networks are stable :wink:

I do believe however it has got a lot better.

1 Like

@constantijn Just curious, any plans to move to Apache Pulsar?