Do I need backups?
Each time I am asked, I am stunned by the question “do we still need to do a backup if the server is replicated or has snapshots”? I believe that people have forgotten the importance of backups for compliance, item recovery and protection from data deletion/destruction.
Many times when I help out a company’s IT department, much of the battles are around people wanting to hang on to the old way of doing things, where they don’t embrace the capabilities of new technology, or apply old paradigms to new systems. However, for some reason, people are very keen to give up on something that needs to be held on to … data backups.
Do I need backups?
It’s easy to feel comfortable with the capabilities that are now available to provide a near-instant recovery of an entire server. Replication capabilities that make recovery as quick as a simple boot (faster even than a reboot) can make you believe that rolling back to another copy of your server is better than trying to recover the data from a backup. I, like many, were seduced by the stunning capability of a snapshot (where memory was included) returning a VM to a previous state without even rebooting – however there are many limitations to this, not just when using multi-tier applications that need multiple servers to be in sync.
All backups are not equal
Many of us have suffered with a restore not bringing a system back – failing to deliver on the capabilities that snapshots and replication can do now. It’s important to understand that just because you have a backup, does not mean it is restorable. There are different type of backup, and different use cases of when to select different types;
- A file backup – the most simple, and probably the most useful – for data recovery at least. Almost all backup systems will reset the archive bit to identify that the file has been backed up. Problems include backing up open files, large files that only have small sections changed (e.g. database files) that then indicates that they need to be backed up in entirety again, and keeping files consistent when either multiple filers are being changed, or large files have changes written to them before they have finished being backed up. The biggest problem is that often restoring a whole server that has been backed up as just files will not result in a working server (or working applications).
- System state backup – often requires an agent or integration with Windows VSS (Shadow Copies), and will back up a Windows server as a whole system, including the SiD and GUIDs of the server, system files and ensure that the disk is bootable. Problems include needing to restore the whole system state as the same server (same IP, same name, etc,), and if you have a physical server, you need to restore the whole backup onto identical hardware – or go through the pain of manually injecting drivers. I’ve always had problems in getting a working system from a System State backup, because it can be done wrong so easily.
- Application consistent backups – definitely requiring an agent, most often provided by the application itself. These backups understand the application and the requirement for maintaining consistency of files and multiple servers in a multi-tier application. However, problems include that this type of backup needs to be done in parallel with other types of backup – just because you have a SQL backup does not mean that the OS is able to boot, or even that the web presentation tier will work.
Furthermore, the backup scheme may impact on the recoverability;
- If you have done full backups, they may take longer to do, but at least every backup has all files. However, when you are backing up systems that are in use – are all the systems that are backed up consistent with each other? Have files on the server changed before the backup of that server completed?
- Incremental backups have been used, and so only the changed files are backed up each day. But, you need to restore the last full backup, and then every incremental backup since then to ensure that all data is restored.
- Or, differential backups have been employed, meaning that you need to restore the last full and then the last differential backup – so each day backups get bigger until you do a full backup. Incremental and Differential backups will probably include the same large files that have been modified during the backup and so are not fully recoverable
Aaarrgh! So nothing seems to be the right answer!
Better than backups – replicas…
So, I’ve painted the picture that backups are likely to not be able to restore your data. So, a replication of all the data to another copy, bit by bit and replayed as if it was normal disk I/O – that sounds like the right answer, doesn’t it? After all, you can just switch on the replicated server and everything is up to it’s last replication schedule that could be less than 15 minutes ago – fantastic, right? A Virtual Machine can boot faster than a physical machine, so you could be up and running exceptionally quickly indeed.
Why you need backups
Here’s the problem – how does a replica or snapshot cope with the following;
- Data being deleted
- Data being corrupted, damaged
- Data being modified, over-written
- A virus or malware on the operating system
Any of these could be done by the application, a user, a hacker – whatever. It will be replicated too…
You might also need data that is old – previous versions not just for legal compliance reasons!
The solution
There’s no one silver bullet, but one thing is apparent – get a belt and braces, do your snapshots and your replication, and then back that up! Use agent-based (or application controlled) backups for applications and their data, and file-level backups that will allow individual files or folders to be recovered. But, you really still do need backups…