Monzo Staff Weekly Q&A - Oliver Beattie (Head of Engineering) - Reliability Report Special!


(Simon B) #1

Note : All Monzo Q&As to date can be found here :grinning:


Readers, Revellers, and Rebels, let’s gather round to reason and reflect, for the time is right to return.

tenor-180797668

It’s the Monzo Weekly Staff Q&A!!!

We’ve just published our Reliability Report - go check it out if you haven’t already. As you may know, we believe in being radically transparent. Not just when things go right, but also when there are challenges. These might include service interruptions and/or downtime. These things happen, but we don’t believe you should be kept in the dark! When there is an issue that might affect customers, it’s our responsibility to be clear, informative and helpful in our communication, keep you in the loop and let you know what we’re doing to fix any situation.

That said, it’s important that keeping you in the loop doesn’t create a perception that we suffer from any more service interruptions or outages than anyone else, because we simply don’t! That’s the risk - in being communicative, you may unknowingly create a false narrative. In fact, we believe our platform is inherently more stable due to the way we’ve built it.

Hence, the Reliability Report! With this happening, it was only right to call on this week’s guest to do Q&A.

BUT FIRST!

tenor-78253550

Catch up with all previous Q&A’s below!


Week 1 : Chris MacLean, Customer Operations & Vulnerable Customers :santa:t2:
Week 2 : James Nicholson, iOS Engineer :green_apple:
Week 3 : Tara Mansfield, People Operations Manager :woman_technologist:t5::man_technologist:t3:
Week 4 : James Routley, Backend Engineer :hammer_and_wrench:
Week 5 : Hugh Wells, Customer Operations :policeman:t3:‍♂️
Week 6 : Naz Malik, Technical Specialist :computer:
Week 7 : Fred Morgan, COps Squad Captain (Calls & Social Media) :telephone_receiver:
Week 8 : Emma Northcott, COps Scaling Team :balance_scale:
Week 9 : Jarno Wolf, COps Financial Crime Specialist & Squad Captain :wolf:
Week 10 : Maria Campbell, Head of People :woman_office_worker:t2::man_office_worker:t4:
Week 11 : Jim Amey, Night COps Captain :bat: :crescent_moon:
Week 12 : Richard Cook, Online Community Manager :man_cook:
Week 13 : Beatrice Borbon, Content & Press Manager :newspaper:
Week 14 : Tom Blomfield, CEO :crown:
Week 15 : Ella Johanny, COps/Hiring :handshake:
Week 16 : Harry Ashbridge, Writer :writing_hand:t3:
Week 17 : Beth Scott, Overnight COps :cat2:
Week 18 : Georgie Parmenter, Executive Assistant to the Founders :blonde_woman:
Week 19 : Vulnerable Customers Team :sunflower:
Week 20 : Leah Templeman, Interim VP People :sun_with_face:
Week 21 : Daniel Chatfield, Backend Engineer, Fincrime & Security :closed_lock_with_key:
Week 22 : Valerio Magliulo, Product Manager - Revenue Team :money_with_wings:
Week 23 : Sam Watkin, Operations Analyst :thinking:
Week 24 : Kieran McHugh, Backend Engineer :desktop_computer:
Week 25 : Jonas Huckestein, Co-Founder and Chief Technical Officer :computer_mouse:
Week 26 : Annual Report Edition with Tristan Thomas and Julie Oey :calendar:
Week 27 : Zander Brade, Lead Product Designer :pencil2:
Week 28 : Richard Dingwall, Payments Engineer :moneybag: :wrench:


This week in the Hot Coral Hot Seat™️ we’ve got our Head of Engineering, Oliver Beattie!!!

Oliver has been here at Monzo since June 2015. He’s currently working with our COps Scaling team to make some big improvements to how we do customer support at scale :muscle:

Fun Fact About Oliver
His first job was in Hawaii! :palm_tree:

His favourite thing about working for Monzo?
“There are so many domains of knowledge that I used to think sounded super dull, but I’ve been able to discover at Monzo that that isn’t the case and they’re actually fascinating. Financial crime, finance/accounting, and customer service, for instance.”

You know how it goes - get your questions in, and Oliver will be here tomorrow to answer them! We don’t have as much time as usual this week to get questions in, so get 'em in quick!


Monzo Staff Weekly Q&A - Dillon Van Auken (Vegas Project Lead & Remote COps Team Lead)
📜 Monzo Staff Weekly Q&A - All Q&A's to Date! (Updated List) 📜
Monzo Staff Weekly Q&A - Rhys Davies (COps, Cardiff Office)
Monzo Staff Weekly Q&A - Bruno Vaz Moço (Product Manager, Scalers)
(Nick Slade) #2

I’ll start this one off!

I’ve seen mentioned that staff use Slack to help them with their day to day…

I’m always on the look out for how software can make my life easier (hence my love on Monzo), and since discovering Slack and Trello I’ve found myself much more organised.

Do you have any other hidden gems worth looking at? And how do they help you and your team?


(Tom Coutts) #3

If you could travel back in time and give 2015 Oliver one piece of advice about the road ahead what would it be?


(Jack - Customer of Monzo) #4

What was the hardest aspect of the current account system to get into a reliable state?


#5

Obviously critical, Pineapple Pizza (Did you eat many of these in Hawaii?! :wink: ) & Cats?

Pure Trivial, 600 services, is this to maintain performance, or does completing a single bank transfer for example involve a large number of different services being called for the single task?


(Jack - Customer of Monzo) #6

What challenges (if any) does the Monzo architecture face that wouldn’t be apparent with traditional bank architecture?

What positives (if any) does the Monzo architecture face that wouldn’t be apparent with traditional bank architecture?


(Will flag Danny for cake) #7

The majority of the recent minor issues seem to have been bank transfer related. Apart from the big Faster Payment outage what has been the cause of the others? Is it a Monzo issue or someone else?


(Emily Jones) #8

Does anything in particular give you sleepless nights when it comes to reliability, such as impacting a VIP with large influence, impacting large volumes of customers in general or a smaller number but with a long downtime?


#9

Do you think factors beyond your control e.g Mastercard outages etc make it harder to keep Monzo’s reputation for reliability? Do you think that posting on twitter that it’s a general mastercard or faster payments issue and not Monzo specific sufficiently shows its not Monzo being unreliable but just that external system has gone down?


(Josh Price) #10

Hi Oliver :wave:,

To kick things off, your talk at KubeCon was awesome!!! :sunglasses: It was great to hear you share your experience of an outage and ultimately the lessons that were learnt.

At the moment the majority of Monzo’s infrastructure is based on AWS. Do you foresee using any other cloud providers in the future to aid reliability? If so, have you designed the back-end in a way where you could host across multiple cloud providers?

Do you think Monzo could be more transparent behind the reasons for outages? I’m more referring here to the postmortems which we’ve seen in the past but for some of the more recent outages we haven’t.

As Monzo starts to expand to other countries in the next few years, what do you think the biggest challenges will be from an engineering prospective? :world_map:


(Tom Coutts) #11

Throwing in a sneaky second question. Google’s new Go cloud library has me very excited I am curious what are your opinions?


(Nicholas Martin) #12

It’s been mentioned already, but the Mastercard outage affected Monzo and other banks too. Can you shed some light as to what this outage was and how Mastercard are ensuring it doesn’t happen again?

Also, I’ve spoken to a number of people who would love to give Monzo a try and potentially go full Monzo, but the constant issues are really holding them back especially given that there was promise that all the issues people faced would be resolved once CA is out in full.


(Leon McCabe) #13

You use metrics to track outages and issues and you mention categorising incidents according to assumed customer impact - do you use any other metrics on customer interactions integrated with this data to get an holistic view of how an incident has affected customers in the past? More generally, how do you use data to help your day-to-day role?


(Oliver Beattie) #16

:wave:Hi everyone, excited to be doing Q&A for the first time. Reading that I’ve been here since 2015 above was a little mindblowing, even for me. Time flies! :older_man:


We have a vast amount of internal knowledge. It’s important that it’s accessible to everyone at the company, especially since we’re growing so quickly. An internal wiki is a way many companies will address this, but we’ve found most wiki software very cumbersome to the point that people avoid using it.

Notion has recently started to fill this gap for us nicely – and people seem to love using it!


Teaching humans to do the things you can gives you enormous leverage, just like writing software.


All of them :wink: But more seriously, and perhaps unsurprisingly, I think our Mastercard systems have been the most tricky. Card schemes are huge, global networks with a lot of inherent complexity and a lot of features that have evolved over decades. In many cases, it can be hard to find anyone who understands how aspects of them work! We have some fantastic people working on these who I would now consider to be world experts, but we’re still finding new edge cases and better ways of doing things.


Pineapple on pizza isn’t right. Sorry Jonas :see_no_evil:


Our choice of a microservices architecture wasn’t really driven by performance considerations: a network RPC call is never going to be as fast as a local function call. It really excels for us because it lets independent teams work on different areas of the codebase without a lot of co-ordination overhead. It certainly comes with costs, but we are really reaping the benefits now we have a team of engineers more than 100-strong.


As Monzo continues to mature, I get more and more confident that we can handle Really Bad™️ situations.

We’ve made a really deliberate effort to improve reliability over the last 12 months, and I think the data speak for themselves that it’s working. We’ve put a lot of effort into making issues less frequent, but just as importantly we’ve improved our ability to deal with issues swiftly and effectively when they (inevitably) do happen. I’m really proud of all the people that who are making this happen. :heart:


I hear you. Our communications during outages need to get clearer, and we’re working hard to make that happen :muscle:

There are some situations where we can’t share as much information as we’d like to, and they frustrate us too. However, there is a lot of room for improvement within the bounds of what we can do. We’re looking at putting information in the most relevant places within our apps when specific features are having problems. And sharing whether a problem is likely to just affect Monzo or whether other banks might be impacted as well.

I also hope that by sharing our internal data over the last year, and committing to do this on an ongoing basis, we help people see that our reliability really is improving and is comparable to that of other banks.


Connecting to more and more local payment networks around the works will be a lot of work and will put a lot of new demands on our platform.

I’ve also recently started working with COps; I’m learning there are a lot of really fascinating technical challenges that come with delivering the world’s best customer support at a large scale. :policewoman:


I haven’t had chance to look at it a great deal yet: for the most part our services are fairly agnostic of the cloud infrastructure they’re running on already. There does seem to be a fair amount of buzz around the library thoughm and I found this Tweetstorm by Rob Pike interesting: