Facebook is down, discuss!

2 Likes

Reddit’s r/sysadmin had someone from the investigation team hop on (then promptly nuke their account, presumably due to NDA- posts are no longer up)
Some interesting info though;

I’m very much reminded of this: (79) South Park: Internet Transients - YouTube

2 Likes

I saw that. don’t know why the name is Blurred, name was /u/Ramenporn

Yeah, odd. I guess they just wanted to protect the poster incase Facebook Legal goes after the OP. Screenshot was sourced from someone reposting the deleted account’s content.

1 Like

I’d like to see the process for MTS at Facebook or the Major Incident team

1 Like

They largely follow SRE (Site Reliability Engineering) practices. Google has a good workbook on this exact thing.

1 Like

I’m just gonna pop into my Oculus quest for a minute to get away from all this… ah no, I can’t do that

2 Likes

It’s caused complete chaos everywhere! Just had to disable Facebook IDP for our login systems at work and send out password reset links enmasse to those with FB as their only IDP to let them login without. It’s impact is a lot bigger than some people realise. It isn’t just Facebook, Whatsapp or Instagram. Complete nightmare…

1 Like

Is that like Facebook SSO/Login with Facebook? I’ve never seen a workplace use that seriously before and I think I can see why now

Edit: I did google it: Facebook IdP | Okta

Neat idea in theory

I think people only see the Brands and not the underlying infrastructure, which powers a lot through API’s and IDP services.

And im seeing a lot of people blaming the ISP’s and Mobile Networks and Service Desks as well…

And saying cant you just turn it back on

1 Like

Yep- it’s mainly for customer facing login and such (via auth0). Facebook workplace* exists though (we don’t use it) and I know that’s caused a nightmare for them at Facebook as most of the teams couldn’t communicate!

1 Like

Major infrastructure goes down from time to time, Cloudflare and AWS does too, but it never takes them this long to resolve the issue - we’re going into hour 6 I think right now.

Last time Cloudflare went down, it took them an hour tops, iirc

Even for Cloudflare, it was down to a faulty backbone router at one of their central east coast locations. Rerouting didn’t automatically kick in, but ended up being a quick fix once they got rid of the faulty location.

It’s extremely rare to see issues like this with BGP in such a huge business.

BGP is a nightmare, but a stable nightmare, unless your Facebook and automate it…

2 Likes

So is this sabotage, a hacking, bad luck, bad practice or something else?

Looking like bad practice, until we know otherwise

1 Like

Botched BGP update according to a close source shown here

1 Like

With everything privileged likely locked down to trusted IPs you aren’t getting in anywhere, once they are in on one DC I’d guess they can start recovering but holy fuck is it going to be a long and tedious process.

An explanation on how Facebook can disappear

4 Likes