Was the propagation because multiple things shared common NSQ infrastructure? I have no idea how NSQ scales upwards so sorry I can’t be more specific in my terminology
It’s a platform that provides messaging as a service. So you build services connected through it
Edit: there are loads of trendy queing / messaging things these days. I know a little about rabbitmq which implements AMQP. I bet NSQ does that too
NSQ is a messaging platform that our systems use to communicate and handle tasks.
The investigation is at an early stage, so I’m sure we’ll know more once the engineering team have been able to look more closely at the flow of events
This is a little bit my fault for being in the USA right now and 8 hours behind!
It looks like everything is up and running again now
We had two separate bouts of platform instability today resulting in the app/card being unresponsive. This was due to a fix we’d attempted to apply to another issue we’d been seeing (delayed feed items) causing additional problems.
Very conveniently (sarcasm!) - Verifone had an issue today around the same time as well. They produce point-of-sale terminals for literally thousands of UK businesses, and they had an issue today meaning their terminals didn’t work for a while too.
As my colleagues mentioned above, we did proactively reach out as soon as possible across the app, status page and social, but it shouldn’t have taken us this long to get on the forum and communicate to you all. We’re sorry for this, and we will do better!
While I’m a huge Monzo fan and love everything about it (including the book recommendations from Mikey through the in-app chat) issues like this are part of the reason I can’t yet ditch my Barclays card. As someone who no longer carries cash with them and only carries my phone and my Monzo card, I edge more towards keeping my account with Barclays and just using my Monzo card as a spending card. While this means I’ll miss out on some features, reliability is way more important to me than extra functionality.
With that said, can we expect an awesome deep delve by Oliver (or another engineer) about what went wrong here and what will be done in the future to prevent an outage like this? As someone who is technical, posts like this give me way more confidence in Monzo than some page which tells me something isn’t working!
Out of curiosity, how can Verifone have transient issues, since they make the hardware - they aren’t the processor?
Edit to add tweet.
To put today in perspective..
- This incident caused an issue for me
- This incident didn’t cause an issue for me
I cannot say because I do not 100% understand this stuff, others do though - but it was definitely the case
training wheels are still on
Exactly, Monzo are being open which I greatly appreciate. All cards decline randomly! Monzo is being honest with us and I love it!
Thanks! Looking on Twitter, this appears to relate to processing services provided by Verifone (they are in that business), not (semi-obviously) the hardware itself. I had no idea their processing services were this widely used! Interesting… thanks!
Oh dear…the Starling Feedback thread best start preparing for a kicking…
I’m sure it will all be explained later but couple things. I can see why NSQ would be used for non time sensitive parts of the system, notifications for example. Why is it impacting/used in Faster Payments or Card Payments?
Looking forward to the technical breakdown for sure
On a separate note completely unrelated to today’s issues, I’d be really interested to hear what’s the main benefit you get out of choosing NSQ over say RabbitMQ or Kafka?
Will you be providing a more detailed report of the events that lead to, and caused today’s issues? In the same way Oliver did here RESOLVED: Current account payments may fail - Major Outage (27/10/2017)
Thanks for keeping us updated with your findings!
Also interested in this. I’d hope payments aren’t being processed off the back of NSQ messages. And it they are, a good argument for doing so…
I wonder if the Verifone issue is why contactless payments (including Apple Pay ) weren’t working in Poundworld earlier…
Looking into Verifone a bit more, my best guess is that this is the service that went down:
If I’m right, I’m SHOCKED even chains as big as Waitrose are using it! I expected they’d have all that stuff in-house.
You’d be surprised at how many large merchants don’t control their card processing at all!
Which is probably a good thing… more stuff being in-house is part of why payments in the US are such a mess…
On the plus side, if so many are using their managed service it might mean we will start seeing newer terminals like the P200 and P400 sooner than I expected.
I did have a very strange dream last week, where I went into Tesco and they had replaced all their Ingenico iPP 350s with Verifone M400s (yes, M400, not P400). I was appalled. They looked terrible and bulky especially on self checkouts.
Anyone into dream interpretation wanna take a guess at the meaning of that one? LOL
I’ve marked this thread as Resolved as we’ve fixed the issues affecting payments, transfers, and top-ups. Feed items might take a while to catch up. But we’re on the case.
In the meantime, I’ll ask around to see if we can share any more information about what happened, and any changes we’re making to prevent it happening again.
Thanks @cookywook. Will certainly be good to get a write up like last time as people have previously said.