It could have been a change, just one with a really bad payload. Then when the BGP update starts knocking all the systems offline you don't have access anymore to be able to roll back. And maybe the people with physical access didn't have the necessary level of access to roll back the changes, or something else with their design prevented an easy rollback in that specific failure mode.
I've had to write RCAs for failures that were a perfect storm of unusual circumstances before, though usually isolated to a single system. I agree would love to know the details, I assume that we'll get some level of explanation at some point.
__________________
Uncertainty is an uncomfortable position.
But certainty is an absurd one.
|