Outage – 16th March 2018 (Resolved)


(Simon B) #1

Having a little bit of an outage at the moment folks…

Card payments, bank transfers, and top ups may fail.

Please keep an eye on https://monzo.statuspage.io/ for updates…


Resolved: Inbound and outbound bank transfers may fail
#2

Hopefully back by lunchtime…Happy St Patrick’s Day! :shamrock: :four_leaf_clover:


(Josh Price) #3

It would be super interesting to procedures followed when an outage like this occurs. For example; how do engineers find out? What do COps do about support? (I.e. can they still access information if everything is down for us?)


(Michael) #7

Does this raise concerns now for the stability of the Current Account? :thinking:


(Eve) #8

There’s a really interesting post-mortem on one of the outages before here


(Andre Borie) #9

Would be curious to know what happened this time around. But I’m happy to see the delay to recovery is much shorter than previous GPS outages on the prepaid.


(Simon B) #10

Happy to report that all is now resolved.


(Andre Borie) #11

Lovely - any clues on what exploded this time? :wink:


(Alex Sherwood) #12

Better communication & a quicker fix than when Monzo was using GPS, the team’s getting good at this!

I counted 9 people in Monzo’s mentions who reported an issue & about 15 who commented as though they may have been affected in total. It looks like this was pretty minor :slight_smile:


#13

I suspect the issue started on the 16th as I had a card payment fail earlier in the evening at the supermarket which was pretty embarrassing. I ended up paying with my legacy “spare” card which always works.,. same thing happened exactly a week earlier (Friday 9th) with a transaction that was declined at first attempt but subsequently successful on the second. I hope it’s not a pattern developing…


(Brandon Billingham) #14

Second time in under a week :confused:


(Tom ) #15

What happened the first time and how did it affect you?

On a side but related note - unbelievably my Nationwide account was inaccessible for four hours last night. Routine maintenance apparently.


(Dan) #16

I think I was affected by this… my card wouldn’t work in town… although I was too drunk to care and just went home :rofl::rofl:


(Oliver Beattie) #17

Hi everyone :wave: I’m Monzo’s Head of Engineering and I was involved in resolving this incident last night. I’m so sorry about these issues. As I hope we’ve demonstrated before, we have very high targets for the level of reliability we want for Monzo, and issues like this are as unacceptable to us as they are to our customers.

I’m afraid I can’t share a full report at the moment. In short though, we encountered an issue in one of our etcd clusters that made a number of our critical services unable to write new data. We fixed this issue within around 25 minutes, but the extra load this caused on a subset of services exposed some unrelated bugs in some libraries we use. It was about another 20 minutes until the platform was fully-available again.

I’ll share a more detailed update when I can, of course along with what we’re doing to prevent this recurring. I’d also like to commend our engineers and COps who helped with this; incidents like these are incredibly stressful, and this one came at the end of a very long week. Everyone was an absolute hero. :muscle:


(Andre Borie) #19

Legacy bank outages are rarer than next gen banks like Monzo but usually last a lot longer - when rust breaks it breaks hard and is a pain to get running again, while next generation systems can sometimes even recover automatically. I think it evens out in the end if we compare the total downtime over a long period (5 years?)


(Lawrence Ferguson) #20

It’s all well and good gleefully referring back to poor GPS etc. but from what I can see in-house has failed 3 times since launch and in the same period GPS once?

Maybe I’m wrong though…


( related to Monzo CEO, Investor in Monzo ) #21

why would you think you’re wrong ? its presumably logged on both providers websites on their status pages :slight_smile: though I can only see planned maintenance on Starlings status page so does that count ???


(Lawrence Ferguson) #22

I just checked Mondo v Monzo . I still couldnt work it out because some don’t affect payments.

I will check later when I’m home if I remember! :smile:


( related to Monzo CEO, Investor in Monzo ) #25

be good to know when you’ve checked

  • I think the reason for Monzo building their processors in house was given as they could affect repairs for outages more rapidly if they weren’t reliant on GPS, than having to wait for GPS to repair the problem which, if they hadn’t built their processor in house was totally out of their hands, and their customers would have to rely on another service provider for their cards to function …I could be wrong though…:slight_smile:

(Andre Borie) #26

Starling is erasing past incidents from their status page so it isn’t a good comparison; maybe we need to find a honest GPS customer and use their status page (does Revolut use GPS?)