Facebook is down, discuss!

Hi Jasebwmn

Trying to work out how to describe this in a none techy way :nerd_face:

Facebook have basically withdrawn routes with its peers on the internet, so basically cutting the services off from DNS

BGP is Border Gateway Protocol - To save me typing out war and peace I will post this from CloudFlare

What is BGP? | BGP routing explained | Cloudflare

1 Like

My basic understanding is they’re the pathways to access their servers. You can see the pathways disappear. It’s the BGP that tells the packets where to go and they removed those instructions

1 Like

Hi :wave:

1 Like
2 Likes

Reddit’s r/sysadmin had someone from the investigation team hop on (then promptly nuke their account, presumably due to NDA- posts are no longer up)
Some interesting info though;

I’m very much reminded of this: (79) South Park: Internet Transients - YouTube

2 Likes

I saw that. don’t know why the name is Blurred, name was /u/Ramenporn

Yeah, odd. I guess they just wanted to protect the poster incase Facebook Legal goes after the OP. Screenshot was sourced from someone reposting the deleted account’s content.

1 Like

I’d like to see the process for MTS at Facebook or the Major Incident team

1 Like

They largely follow SRE (Site Reliability Engineering) practices. Google has a good workbook on this exact thing.

1 Like

I’m just gonna pop into my Oculus quest for a minute to get away from all this… ah no, I can’t do that

2 Likes

It’s caused complete chaos everywhere! Just had to disable Facebook IDP for our login systems at work and send out password reset links enmasse to those with FB as their only IDP to let them login without. It’s impact is a lot bigger than some people realise. It isn’t just Facebook, Whatsapp or Instagram. Complete nightmare…

1 Like

Is that like Facebook SSO/Login with Facebook? I’ve never seen a workplace use that seriously before and I think I can see why now

Edit: I did google it: Facebook IdP | Okta

Neat idea in theory

I think people only see the Brands and not the underlying infrastructure, which powers a lot through API’s and IDP services.

And im seeing a lot of people blaming the ISP’s and Mobile Networks and Service Desks as well…

And saying cant you just turn it back on

1 Like

Yep- it’s mainly for customer facing login and such (via auth0). Facebook workplace* exists though (we don’t use it) and I know that’s caused a nightmare for them at Facebook as most of the teams couldn’t communicate!

1 Like

Major infrastructure goes down from time to time, Cloudflare and AWS does too, but it never takes them this long to resolve the issue - we’re going into hour 6 I think right now.

Last time Cloudflare went down, it took them an hour tops, iirc

Even for Cloudflare, it was down to a faulty backbone router at one of their central east coast locations. Rerouting didn’t automatically kick in, but ended up being a quick fix once they got rid of the faulty location.

It’s extremely rare to see issues like this with BGP in such a huge business.

BGP is a nightmare, but a stable nightmare, unless your Facebook and automate it…

2 Likes