Thursday, October 24, 2013

When good backups just aren't enough

By Steve Endow

Today I was hoping to start a new development project.  I blocked out my day to focus on the project and was ready to dig into some code.

This morning I fired up one of my HyperV virtual servers that was setup for this particular type of development and opened Google Chrome to login to a web site for the project.  Chrome was setup to load by default, but when the browser opened, the page didn't display--it said something about "unable to establish ssl connection".  Hmmm.  I then tried a few other web sites.  Some would partially load, some would display text but no images, some would display scrambled images, and some wouldn't load at all.

I thought maybe it was a network issue--since I moved recently, all of my servers detected my new network and required me to select the network type before the network started working properly.  But even after checking the network settings, pages still wouldn't load.  I then tried Internet Explorer and Firefox, but both of those browsers exhibited the same strange symptoms.  So it seems the problem was with the machine, and was not a browser issue.

I then tried to install an application on the machine, but the setup.exe would immediately crash.  Hmmm.

I thought maybe there is a small chance that some type of virus or malware got onto the server, although that seems highly unlikely for this particular VM.  So I tried to install my anti-virus--but I couldn't install that either--the setup just crashed immediately no matter what I tried.  I tried several other things, but none of them worked.  This virtual machine that worked fine just a few months ago was now unusable.

So I then pulled up a backup.  Oh, did I mention I'm a big fan of backups?

In addition to the above backups for my workstation files, I have scripts that backup all of my HyperV virtual servers every week.  A copy of each VHD is saved to my file server, then the VHDs are compressed using 7-Zip.  A copy of the 7-Zip archives are then saved to an external drive.  And I have two external drives that I rotate each week to a fire safe.

So on my file server I grabbed the VHD backup from 10/11/2013.  I restored the VHD and restarted the VM.  But the same problems existed.

I then went back two weeks, and restored an older VHD, but it had the same issues.  I was then able to resurrect a backup from 9/14, nearly SIX WEEKS old, but alas, even that had the issue.

So this was an interesting case where despite my very comprehensive backup system that worked great and allowed me to easily restore a copy of a 64GB VHD from six weeks ago, it apparently was not old enough.

You might think that this is a rare exception, but based on my experience, it's actually fairly common.  I've had clients discover that they have a problem that is several months old.  Or they discover that something was deleted or changed or overwritten months ago.  In those cases, their 4 week backup rotation doesn't help them.  They would have needed monthly archives saved for 6 or more months.  In my experience, these types of "quiet" problems have been much more common than any dramatic disaster, drive failure, or server meltdown.  Many backup strategies focus on restoring a recent copy of a file, but fail to consider the possible need to restore an older copy of a file.  It's a more difficult backup strategy, requires more management, and requires more storage space.

But it can be valuable in situations like what I experienced today.

Steve Endow is a Dynamics GP Certified Trainer and Dynamics GP Certified IT Professional in Los Angeles.  He is also the owner of Precipio Services, which provides Dynamics GP integrations, customizations, and automation solutions.

You can also find him on Google+ and Twitter

No comments: