We are regularly told that checking our own bodies for signs of change is a good thing. Early diagnosis of disease gives more of a fighting chance of curing the problem. So, in the IT world, where we assume all of our backups have been taken successfully, how often should we be checking the results and ensuring the backup will work on the fateful day we need to do a restore? This question was posed by Federica Monsone on Twitter this week. Here’s an attempt to provide an answer.
First of all, let’s consider the whole point of taking backups. Excluding the inappropriate use of backup for archiving, the backup process is there to ensure you can maintain continuous access to your data in the event of unforseen circumstances. Usually (but not exclusively) these are data loss due to equipment or power failure, data corruption (whether software bug or malicious), accidental deletion or a need to return to a previous point in time for consistency purposes where there are multiple interrelated systems.
Backups will be used infrequently and inevitably, like insurance, you never know how good your backups are until you come to use them. I wouldn’t advocate crashing your car just to check your insurers will pay out, however periodic validation of backups and more importantly – the restore process - is a good thing. Why? Because in a recovery scenario, you want to be confident your backups have worked and that the process of recovery will be as smooth as possible. Recovering data is typically a time-critical operation. You’re recovering data because somebody needs the information quickly, or because a system is down. When the pressure is on, the process needs to work flawlessly. In addition to the time pressures, restores should be periodically checked because:
- Backup media deteriorates over time; you should be ensuring any failing media is replaced.
- Backup software upgrades can cause issues with restores is data formats are changed.
- Server software upgrades can cause issues with restores.
So, to the heart of the matter, how often to test restores. I believe restore testing should be based on the criticality of the data and of the complexity of the backup infrastructure. So, if data integrity is the essence of your business, test restores more often. If you have a shared backup infrastructure, test restores from that; if you have a more distributed design, you’ll need to test each backup component. Here are some thoughts:
- Test restores of individual files on a weekly basis
- Test restores of large volumes of data on a monthly basis
- Test whole system restores 1/2 yearly
- Randomly select media for restore; choose new and old media alike
- Test restores into your DR site (if you have one)
- Replace faulty media immediately
- Have a media retirement policy
- Have a backup onboarding policy
There’s no right or wrong way to approach testing restores; it’s all about building confidence in the restore process, so when you need it, you can be happy it will work for you.