@oliver gave the keynote talk at KubeCon this year & here it is
This talk will dive into a production Kubernetes outage that Monzo experienced a few months ago, its causes and effects, and the architectural and operational lessons learned.
The slides from the talk are here -
& the post mortem that he mentioned in the talk is here -
Interesting watch, and some food for thought for my team at work (in terms of exposing more monitoring data and introducing more regular chaos engineering)