| Image from Pixabay |
On board
flight KL1540 from Alicante to Amsterdam, a call was made for a medical doctor.
Moments later, the captain announced that the plane had to divert to Paris due
to a medical emergency.
And then
things suddenly go differently than you're used to. The tone shifts from
friendly-businesslike to measured-strict. The descent feels noticeably steeper
than usual. The cabin crew is instructed to check seatbelts and tray tables
"if time allows." There's no time left to collect trash with a cart.
Once on the ground, you're quickly parked and emergency services arrive.
After
everything around the patient is taken care of, you want to return to normal as
quickly as possible: onward to Amsterdam. For that, the captain had to
"make the necessary calls," for example to refuel and to arrange a
new landing slot at Schiphol. He also mentioned choosing not to order extra
catering, as that would take additional time. He did take a moment to walk
through the cabin to answer any questions.
In IT, you
sometimes have to divert too. Something stops working in one data center but
still works in another; it's redundantly designed, as we like to say. Failover
comes in different flavors. In some systems, it happens automatically and users
don’t notice a thing. The system detects something is wrong and switches to
"the other side." In other cases, administrators must detect the
issue and manually switch things over. And unfortunately, not every situation
allows for failover, and users must wait until the problem is resolved.
Just like
in aviation, in IT you want to return to normal as quickly as possible after a diversion.
You need to plan ahead, because there are often many dependencies that require
a specific order. You document the procedures in plans and – very importantly –
you regularly practice those plans. Partly to get familiar with them, and
partly to catch errors in the plans. Better to encounter those errors during
practice than in real life.
Sometimes
there's no time to practice – or rather, no time is made. Imagine if pilots
weren’t given time to train emergency procedures. And then during takeoff – a
fairly critical moment – an engine fails. You don’t want the pilots looking at
each other in confusion. No, they should routinely (on autopilot, so to speak)
perform the correct actions. Those actions have been thought out, documented,
and thoroughly practiced. So that things end well when something goes wrong.
But it can
get worse: when no attention is paid at all to the continuity of a process.
Sure, you can make the deliberate decision that it isn’t necessary, but in the
cases I’m referring to, the topic isn’t considered at all. Out of ignorance,
helplessness, lack of time – who knows. Maybe you're thinking of the recent
massive AWS outage (Amazon’s cloud service), but feel free to look around your
own organization too.
Flight
KL1540 arrived two hours later than planned at Schiphol. Not a big issue for
passengers whose final destination was Amsterdam. But there were also people on
board who had a connecting flight to Kristiansand, in southern Norway. Not many
flights go there from Amsterdam. I fear those passengers had to divert to a
hotel.
And in the big bad world…
- AWS was down for a while and it had a major impact.
- Some people ended up sleeping upright in overheated smart beds due to the AWS outage.
- We
could use a few more IT nerds in Parliament. [DUTCH]
- The UK government urges companies to make cybersecurity a top priority.
- You shouldn't rely too much on AI for your news coverage.
- Hackers breached a nuclear weapons facility.
- Law enforcement is increasingly interested in video doorbells.
- Microsoft secretly installs Gaming Copilot on your PC.
- You can also hack a card shuffling machine, of course.
No comments:
Post a Comment