Tolerating full cloud outages with Monzo Stand-in

We’ve just published some information on our backup system (Monzo Stand-In) that some of you noticed that we used last year.

There might be some things I can’t go into lots of detail on, but feel free to ask any questions.

21 Likes

I prefer the stand-in interface to actual Monzo :slight_smile:

7 Likes

Very cool Daniel, that’s a great overview of your rationale and system!

1 Like

Nice to see Excalidraw being used!

Very nice read if I do say so.

3 Likes

Really interesting read!
Hope we get more blog posts like this

How does this work for an AWS outage? Wouldn’t the primary system be entirely unavailable?

The next section covers that:

This works for most incidents but for payments processing it doesn’t work well if our Primary Platform is completely down. In this case we’re able to connect the Stand-in Platform directly to payments networks via our Data Centres. This is a bit more of a heavy-handed option for us, as we have much less control over which customers or how much of the traffic is directed to the Stand-in Platform, but we wouldn’t be resilient without the option.

Basically stand-in payment processing has two modes of operation:

  • Indirect mode
  • Direct mode

Direct mode is required if AWS is fully down but that is likely to be a very small fraction of incidents. Where AWS is up (majority of incidents) indirect mode is preferable because:

  • It can be enabled more quickly (15s)
  • It can be controlled in a granular way – we can slowly transition load between the platforms, rather than one big lever
4 Likes

Ha, helps if I don’t speed read over sections! Thanks for the reply!

I think it’s great that Monzo have something like this, Where around with for banks, the app does not function at all if it goes down

However, I do realise that other banks have branches that people can walk into which Monzo do not to check their balances et cetera