How we monitor Monzo

(Beatrice Borbon) #1

Earlier this week we shared our Reliability Report, which shows that we’ve made Monzo more reliable in the last 12 months and explains how we’ve done it.

Monitoring helps us make sure everything’s working as expected, and alerts us when things do go wrong.

If you’ve ever wondered how we do it, Platform Team lead Chris digs into the details here :point_down:

(I'll flag any comment for 50p) #2

hundreds of servers:thinking: :face_with_monocle:

Pics or it’s lies :eyes:

(Andy) #3

Ah and hundreds of physical servers or a few physical servers running hundreds of virtualised instances? :smirk:

(I'll flag any comment for 50p) #4

I want it to be physical but it’s obvs gonna be virtual :frowning_face: :joy_cat:

(Peter Shillito) #5

(Tim) #7

While we do have some hardware in physical data centers to interconnect with payment schemes, most of our servers are virtual servers running on EC2, Amazon’s cloud.

Most of these servers are Kubernetes “worker” nodes. Each of these workers runs many different microservices, each in its own container. So we have containers on top of virtual servers on top of physical servers in a few Amazon data centers somewhere…

(Kelvin Papp) #8

Why do you “want” it to be physical?! :slight_smile:

(Andy) #9

Because a bank of physical servers with lots of cables and lights is nerd heaven :nerd_face:

(I'll flag any comment for 50p) #10

^^^ This, very much this

(Peter Roberts) #11

Only if the cable management is good otherwise it’s hell

(Daniel Chatfield) #12

Hundreds of virtualised servers running thousands of containers.

(Jorge) #13

Wow that blog post brings loads of questions to mind. Can we nominate Chris for the next Q&A? @simonb

(Chris) #14

On holiday for a week but happy to answer anything when I get back :slightly_smiling_face:

(Pete Leese) #15

This is great, just rolling out Prometheus and Thanos across my cloud.

Any chance you can share some further details of your custom template for slack notifications along with details on rules fetcher?

Sounds like a missing piece of the puzzle for my implementation :slight_smile: