3 hours downtime my ass!
Date: Saturday, November 03 @ 15:07:40 UTC
Talk about a server move gone wrong...
There was meant to be some routine maintenance in the datacentre the detbox is hosted in. Sadly, while the move seemed to work, something fishy happened with the box. This is doubly annoying because detnet has actually sneakily been on new hardware for a couple of months, I did the move early one morning.
The outage has been made worse because, well, I've been on holiday in Amsterdam for the last five days - I had limited Internet access while I was there (hmm....) and getting hold of help has been hard.
Everything should be back now. Read on for more info.
So the plan was to move a bunch of kit from my hosting provider, Xilo, from one datacentre to the other. Seems like an easy job, a simple lift and shift. Problem was once all the kit was up on the other side, the detbox wasn't coming back up. At all.
The display was dark, which is always a bad sign. At this time I'm on a plane to Amsterdam so it's fairly bricked.
My friend down in the datacentre began working out what was wrong. No typical warning signs, no beep code, no drive lights stuck on. WTF?
Turns out the box wouldnt boot with an ethernet cable plugged in. That makes loooooots of sense. Tried a different port on the switch and the server ... no difference. Changing the cable got it booting. Then, the machine was hanging booting the kernel. What gives?
Booting into an alternative kernel got me back up and running, but the terminal was flooded with a bunch of errors. Turns out theres ALSO a kernel bug in the version of the Linux kernel I'm using (debian version 2.6.18-5-686). Mother fucker. So I had to put a workaround in place, reinstall the broken kernel, reapply the workaround, and we're back.
Since I've got it working, I've power cycled it a couple times, done a few soft reboots, and everything comes back.
So ... detnet as we know it is back online.
Learns and Futures:
1) I will "Finish the job" and get the secondary box up and running as soon as I can. This will put it in a seperate geographical location, with a different provider. If the main site goes dark, I can flip DNS to the hot standby... MTTR being an hour then as the detonate.net zone has a 1800 second TTL.
2) If doing (1) takes too long, I'll fork out the cash to get a paid hosting account somewhere else. Can worry about getting the backup online later.
I am sorry - I know this has been a bitch. In fact, detnet is the only real forum I visit aside from Slashdot, so I've had to fill in the gaps with furious masturbation too.