Very Robust etcd


I didn’t get all of that, but it’s nice to see monzo have proper well thought through back up plans.

a great write up by Filipe even if it goes over my head at points

Awesome post and wish I could do this kind of stuff!

interesting stuff. One thing, from it kind of looks like you’re using etcd without SSL?

Good point. As the README of the repository makes clear, this is a deliberately simplified deployment designed as a demonstration:

:warning:️ Warning
This is an illustration of how we run etcd clusters. This should not be deployed to production without first securing the infrastructure.

Since TLS wasn’t necessary to demonstrate these features, and would complicate people’s experience deploying a demonstration stack, we didn’t bother with TLS in that repository. You should not run this in production as-is!

1 Like

Cool. Probably worth having for etcd, even in an MVP as without it there’s no authentication and anyone who can hit the port can dump the entire cluster config (secrets and all) in one command…

hey, great article, thanks! Just out of curiosity, did you consider to use something like etcd-operator, as they already solved some of your requirements?

:wave: The etcd-operator does something very similar but is designed to operate etcd clusters running atop Kubernetes. This etcd cluster is the one that actually backs Kubernetes and although it is possible to self-host, this is not something we do at the moment. We do plan to use the operator for etcd clusters which back some of our services locking operations though!

hey @filipe, yes, It makes sense as self-hosted solutions are not GA yet. Thanks for the answer! :slight_smile:

1 Like

The etcd-operator is meant for operate etcd cluters running atop kubernetes. etcd is a consistent and highly-available key value store used as Kubernetes’ backing store for all cluster data.

Always have a backup plan for etcd’s data for your Kubernetes cluster. For in-depth information on etcd. Operating etcd with limited resources is suitable only for testing purposes. For deploying in production, advanced hardware configuration is required. Before deploying etcd in production



Thanks for providing this solution. It is an elegant way of dealing with ec2 instance failures in an etcd cluster.

I’m new to AWS but I’ve heard that volume attachment is not good and we should not rely on it. I was wondering if Monzo has run into issues around volume attachment.

Also there is no autoscaling when an AZ goes down. Would that be a concern?

Thanks for any insights

1 Like

We have run into various issues with specific EBS volumes, but I’m not aware of any systemic design problem that means we should avoid them completely.

Thanks so much for your response.

I had problems working with NVME volumes in Centos where the volume was only available on /dev/nvme1n1 and the creating the filesystem required specific parameters.

Can you please expand a bit more on any of the issues with specific EBS volumes?




Thank you for this article, very enlightening !

Sorry for reviving this topic after so many months. But I was wondering if you were still using CoreOS for your etcd cluster at Monzo? Since this os is nearing its end-of-life.
We are trying to use Fedora CoreOs instead, but the documentation is quite scarce…