Planning for an AWS Outage

So I think there has been two outages in the states now for AWS with the most recent being 2nd March (https://blog.thousandeyes.com/amazon-aws-outage-lesson-managing-cloud-first-risks/)

How would Monzo handle an AWS outage? Is Monzo operating from EU only? What are the plans post leaving the EU when there is only 1 Az in the UK?

I’m aware you have two datacenter’s for links into third parties? Other than that are you operating a hybrid approach of on-prem and off-prem with multiple cloud vendors?

How are you handling the backup of your data? Are you pushing this to other regions?

@simonb - Not sure who’s best to discuss this?

8 Likes

I think they are operating between two AZs, so hopefully they should remain available even if one AZ is out.

I would hope so too but ideally three :slight_smile: just be interesting to see official stance on it.

Are all workloads auto-scaling and fault resilient for example?

Probably best answered by @oliver as his head of tech

2 Likes

London now has 3 AWS availability zones

1 Like

Ooo, I did take a look but only saw one eyes closed

On the other hand I’m not sure if putting all your eggs into one basket (London) is a good solution. What if there’s a really major outage like city-wide power or network transit issues?

I’d say have one primary AZ in UK and have another in the US as a backup, this way no matter what happens in London you should still be good.

I think in those circumstances my main concern would be the impending Armageddon, rather than my Monzo balance being up to date :grimacing:

4 Likes

We rely on achieving cross-zone quorum for many operations internally, with the aim being to guarantee the durability of data if a zone is lost. Doing this means that we need to operate across at least three zones: under a two-zone setup, losing a zone would mean that we may no longer have at least 2n+1 nodes available.

Until very recently the London AWS region only had two zones, so it was unsuitable. Now that it has three zones it’s viable, but we don’t have any short-term plans to move to it. :desktop_computer:

9 Likes

Thanks :slight_smile: Do you simulate Zone failures?

Yes, we do. :zap:

1 Like

Excellent to hear :slight_smile:

I’m aware you have two datacenter’s for links into third parties? Other than that are you operating a hybrid approach of on-prem and off-prem with multiple cloud vendors?

How are you handling the backup of your data? Are you pushing this to other regions or just making the data available in all EU London Zones?

If your not using London AWS are you using the Ireland AZ’s to run Monzo?

1 Like

Right now, the vast majority of our platform is hosted in AWS’ eu-west-1 region. It’s also correct that we have several physical facilities which let us interconnect with private networks like payment schemes.

Our architecture has always been designed with the explicit intention to be able to expand across multiple regions and potentially cloud providers too, but this isn’t something we’ve done yet. :cloud:

3 Likes

Thanks for explanations.

It’s great to hear you’re doing it properly. I find it infuriating to hear when a company/customer is using PublicCloud and thinks that’s enough (even Amazons status page failed to display images a while back when S3 when down in one area reference link). I know multi AZ costs more, but it’s a must for a service such as this. Hats off to you Monzo for not cheaping out on that! :grin:

Does anyone use single AZ? Most AWS services are cross-AZ out-of-the-box? I know people have been stung a few times when an entire AWS region has gone down however.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.