Bringing Automated Rollbacks to 2,100+ services at Monzo

Hey everyone, I’m Will :wave: I work in the Backend Platform team at Monzo.

My colleague Joseph and I have written about how our team brought automated rollbacks to our deployment system. This is the most substantial change we’ve made to our deployment system in some time, so it was not without its challenges!

At the heart of this new feature is Argo Rollouts - a Kubernetes extension that supports advanced deployment strategies. In this post we dig into how we integrated Argo Rollouts with our existing deployment tooling, while keeping the Monzo delight factor. We show how we migrated all 2,000+ services to this new system and discuss the lessons we learnt along the way.

We’d love to hear your thoughts and questions.

13 Likes

Really interesting article. Thanks for going into the detail.

One question - I think from googling, Prometheus metrics is a tool? If so what metrics do you use to track things. DORA?

1 Like

Yes, we use Prometheus for metrics.

I wrote a bit about engineering effectiveness metrics we track here - there’s definitely room for improvement here. For those kind of metrics we use the data in our data warehouse (BigQuery / dbt / Looker).

2 Likes

Thanks for the reply, and also for the link, super interesting. I’m an agile coach by trade, and a bit of a DevOps geek on the side, so I found that really interesting.

I had always thought of DORA metrics as useful to track service delivery, but not always uptime etc, but that link to the thread really went into some good detail.

4 Likes