So today I had planned to spend the morning installing some new hardware at BitRelay: a server, a switch, and a backup disk array. It was all pretty innocuous, and I figured I’d be home a little after lunchtime to hang out with Laralee and Zaque and enjoy the Presidents’ Day holiday.
Well, I forgot to bring the disk array with me, which meant I had to make another trip and basically spend three hours in the car (it’s a 45-minute drive each way to the datacenter).
And when I installed the new switch, for reasons unknown to me, my entire network stack went down. That meant all of the servers were running but couldn’t be reached from the internet, so for about 15 minutes a whole bunch of clients were calling and emailing me and my team asking why their web sites were down. Meanwhile, I was frantically plugging and unplugging things, hoping that I could fix whatever had happened and get everything running again. In the end, I really don’t know what happened because rebooting the network stack twice seemed to resolve it. Ugh.
By the time I was finished it was evening, and then I had to write an explanation (and apology) to my clients so they understood what had happened and how I’d keep it from happening again. That’s never a good conversation to have.
In a little ray of sunshine, one of my clients responded with this:
Nice. That helped to remind me that despite my feeling that it had been a disaster of epic proportions, in the grand scheme of things, 15 minutes of downtime isn’t really all that big a deal.