Continuing the discussion from Why did Monzo choose AWS:
As I read through the post-mortem for the recent AWS outage, it got me thinking: what sort of mitigation strategies are in place if something like this were to affect Monzo systems? Does Monzo have the ability yet to run independently of AWS (perhaps even if only in a temporary, reduced capacity)? The linked post notes this as a long-term goal, which is great - has the recent AWS outage driven any Monzo infrastructure design review and/or accelerated any of these plans?
As someone who’s written similar post-mortem reports and been in the middle of some pretty ugly outages, I fully appreciate that even with multiple layers of redundancies, things can and do still go wrong. I’ve watched the talk @oliver gave and found it to be quite interesting - while it focused more on the software architecture side (I’m admittedly not terribly familiar with kubernetes), I can’t help but wonder what sort of resiliency Monzo has at the infrastructure level.
From an outsider’s perspective, it appears the Monzo team has so far successfully balanced risk mitigation and FinTech badassery (not an easy thing to do!). I can only hope resiliency at all levels remains a fundamental value as the organization continues to grow. Thanks!
Edit: I just came across Mondo infrastructure, so I guess my question might shift a bit to: since the forum post and the video about Go, have the Monzo infrastructure strategies had to change much, and if so, how?