We learn more from our mistakes than we do from our successes, but it is better to learn from other’s mistakes, because you don’t need to suffer.

In many encounters during my career, I have learnt from the failures that occur during disasters. The disasters in disaster recovery may not have all been from significant outages, and many are from “near miss” problems. Below are some war stories that you may be able to learn from.

Anti-Virus tools as a threat

I was working with a legacy system that could call external actions, but only by starting a file with an 8.3 filename. This was a bottleneck because we wanted to pass parameters, so instead I found a handy little tool that would let me package a script (in this case, a DOS command line) into an executable file. So, we rapidly went about creating hundreds of little single-line scripts that were packaged into .exe files, so that the legacy system could call them. However, someone else used the same tool to create a virus. The freely available tool had the same file header pattern, and the anti-virus software we had would helpfully go into all file systems to delete any file that had the same header. We first thought was a hacker, but after we restored all the files from backup and they were being deleted immediately, we then managed to track down the cause.

Something similar recently happened with some commercial software from BlueJeans, where one of their system files was considered a virus – but no, this was a false positive.

READ ARTICLE:   Is a Single VM on a Single Host viable?

Backup tapes cannot be recovered

So many stories about backup tapes… The time when a script would perform a backup, then a verify, and then an erase (the commands were in the wrong order!) and it was not discovered for months. The backup parameters that had excluded the Exchange filetype of MDB – because a bricks-level backup of the database was being performed – however this also applied to the fileserver where Access databases were stored and so not backed up…

I was working in a small business that was in a flood area, and we were informed that we needed to evacuate the site. We had enough time to move the 2 main servers (before rack-mounted servers) to another site, and managed to get it all up and running enough to run the business. We had off-site backups, but the delivery company refused to deliver to the new location. So, the junior IT guy (me) had to go to the old site on my bicycle to collect the tapes from the delivery driver, who was at the entrance to the site. Luckily he recognised me, but it shows the off-site storage company was doing the right thing.

In this article I talk about recovering a backup of a Windows 2000 server. This required re-installing Windows onto a fresh server, and then restoring the backup over the top of the running OS. The reason it failed is because Windows 2000 by default will install into C:\Windows, but the original server was an NT4 server that had been installed in to C:\WinNT, and Windows had been upgraded.

READ ARTICLE:   How to design for failures

When restoring an Exchange 5.5 server, I built the server with the same disk layout, but I chose to use C for Windows, D for data and L for logs. The original backup had used M for logs – and even though the files were recovered to L by the backup software, the Exchange data backup had a reference to M being used as the log drive, so would not mount.

Site access

This happened to me twice – a power overload in a shared building resulted in the safety circuits tripping. All power was out, and all that was required was a little switch to be pushed back into place – but the riser/cupboard was locked, and only the landlord had keys! A 5 second job ended up taking hours.

I was working on a project in the late 90’s where I had to test the Macintosh version of our software, and I wanted to do it one weekend from home. I had clearance from my boss to come and pick up the very expensive Macintosh from the office, and when I got there, the security guard helped hold the door open and helped me load it into my car. Once I closed the door to my car, I asked the security guard if he new who I was – he said he had no idea. I wonder how many other burglaries have happened right under the nose of security guards.

Share this knowledge