Monzo Staff Weekly Q&A - Oliver Beattie (Head of Engineering) - Reliability Report Special!

:wave:Hi everyone, excited to be doing Q&A for the first time. Reading that I’ve been here since 2015 above was a little mindblowing, even for me. Time flies! :older_man:


We have a vast amount of internal knowledge. It’s important that it’s accessible to everyone at the company, especially since we’re growing so quickly. An internal wiki is a way many companies will address this, but we’ve found most wiki software very cumbersome to the point that people avoid using it.

Notion has recently started to fill this gap for us nicely – and people seem to love using it!


Teaching humans to do the things you can gives you enormous leverage, just like writing software.


All of them :wink: But more seriously, and perhaps unsurprisingly, I think our Mastercard systems have been the most tricky. Card schemes are huge, global networks with a lot of inherent complexity and a lot of features that have evolved over decades. In many cases, it can be hard to find anyone who understands how aspects of them work! We have some fantastic people working on these who I would now consider to be world experts, but we’re still finding new edge cases and better ways of doing things.


Pineapple on pizza isn’t right. Sorry Jonas :see_no_evil:


Our choice of a microservices architecture wasn’t really driven by performance considerations: a network RPC call is never going to be as fast as a local function call. It really excels for us because it lets independent teams work on different areas of the codebase without a lot of co-ordination overhead. It certainly comes with costs, but we are really reaping the benefits now we have a team of engineers more than 100-strong.


As Monzo continues to mature, I get more and more confident that we can handle Really Bad™️ situations.

We’ve made a really deliberate effort to improve reliability over the last 12 months, and I think the data speak for themselves that it’s working. We’ve put a lot of effort into making issues less frequent, but just as importantly we’ve improved our ability to deal with issues swiftly and effectively when they (inevitably) do happen. I’m really proud of all the people that who are making this happen. :heart:


I hear you. Our communications during outages need to get clearer, and we’re working hard to make that happen :muscle:

There are some situations where we can’t share as much information as we’d like to, and they frustrate us too. However, there is a lot of room for improvement within the bounds of what we can do. We’re looking at putting information in the most relevant places within our apps when specific features are having problems. And sharing whether a problem is likely to just affect Monzo or whether other banks might be impacted as well.

I also hope that by sharing our internal data over the last year, and committing to do this on an ongoing basis, we help people see that our reliability really is improving and is comparable to that of other banks.


Connecting to more and more local payment networks around the works will be a lot of work and will put a lot of new demands on our platform.

I’ve also recently started working with COps; I’m learning there are a lot of really fascinating technical challenges that come with delivering the world’s best customer support at a large scale. :policewoman:


I haven’t had chance to look at it a great deal yet: for the most part our services are fairly agnostic of the cloud infrastructure they’re running on already. There does seem to be a fair amount of buzz around the library thoughm and I found this Tweetstorm by Rob Pike interesting:

https://twitter.com/rob_pike/status/1021913256204460032

10 Likes