Handling Incidents and Outages

Tech Team Weekly

07-02-2022 • 26 mins

What do we tell people when things go wrong in our organisations? This week, there have been a couple of write-ups of recent high-profile outages at Roblox and Mozilla, which - when paired with the well-documented outage at Facebook that we discussed last season - gives us a fascinating glimpse into other companies' incident processes, on-call rotas and war rooms. Sanj, Gwen and Neil share their surprising love of being knee-deep in an incident, bringing some of their own recent experiences to the podcast.

In our workplace updates, there's lots of hiring, lots of shipping new features, everybody tries to coax Sanj into management, and Neil totally isn't doing any money laundering.

TIMESTAMPS:

00:00 Start
01:28 The Stand-Up
06:48 Social Engineering
08:25 This Week's Epic
25:44 The Wash-Up

LINKS DISCUSSED THIS WEEK:

Elucidat careers page

YouTube: Ozark Season 1 Trailer

LeadDev

Glean careers page

Greg McKeown - Effortless

Facebook Engineering: Update about the October 4th outage

Roblox Return to Service 10/28-10/31 2021

Mozilla Hacks: Retrospective and Technical Details on the recent Firefox Outage

Vox: Pokémon Go launched in 26 countries, and then its servers crashed

Down Detector

Down For Everyone Or Just Me

You Might Like

Darknet Diaries
Darknet Diaries
Jack Rhysider
Double Tap
Double Tap
Double Tap Productions Inc.
Acquired
Acquired
Ben Gilbert and David Rosenthal
This Week in Retro
This Week in Retro
Neil from RMCretro - The Cave, Chris from 005 AGIMA and Dave
The Vergecast
The Vergecast
The Verge
TechStuff
TechStuff
iHeartPodcasts
Hard Fork
Hard Fork
The New York Times
Talkin' Shop
Talkin' Shop
Eclipse Automotive Technology
Waveform: The MKBHD Podcast
Waveform: The MKBHD Podcast
Vox Media Podcast Network
Smashing Security
Smashing Security
Graham Cluley & Carole Theriault