In the Case of Ransomware, Blindly Restoring from Backup May be Bad for Business

First of all, I am going to assert, for the sake of your company, the straight-forward goals that you should know already when doing any kind of data recovery:

1. Lose as little data as possible (with the goal being zero)

2. Restore operations as quickly as possible

These two goals can sometimes be in conflict. If you are doing frequent backups or snapshots, say every five minutes, depending on how quickly you found that your site had ransomware and you quarantined the culprit, you'll likely have multiple snapshots or backups in trouble.

It's a complex issue. The restore part is just work, if you've been testing your backups. If you haven't been testing your backups and they aren't good, paying the ransom is your only option. The challenge is figuring out what to restore and making sure you are surgical in the restore to minimize data loss and cleaning up everything that was damaged is the challenge.

An IT person's go-to answer for recovering lost or corrupted data is either roll-back to a snapshot or restore from backup. However, this is only partly true. In certain cases, such as issues caused by something physical in the environment such as a corrupted disk, going to a backup on a different storage media is always safest (once you've corrected the physical issues). Also, if you want to restore an older version of something that changed, but you aren't sure of the name of what you are looking for, backups can be a good way to find things since they have a catalog of what changed. Recovering from a snapshot is appropriate if you know what you want, or you have a software issue causing the problem. This probably goes without saying, but you also know if you do full restore from backup or a rollback to a snapshot, you've gone back in time so you have very likely lost data. It's unlikely you'll know exactly what data you lost. Essentially, doing a full restore from backup or a snapshot rollback is sanctioned data loss, probably unknown data loss, and we'd all rather not think about this side effect.

But, if the damage is localized and easily identifiable, the recovery can perhaps be a simple as restoring from backup - if you are ok with losing some amount data. If the damage caused by ransomware leaks into the mainstream IT infrastructure, recovery becomes significantly more complex.

When the issue isn't localized, you will be flying blind on the restore unless you have tools in place tracking what happened, by whom, when, where and how. So, when asked to restore from backup or snapshot, the first question to consider is restore what from where? Like with any disaster recovery plan, and ransomware should be treated like a disaster. You need to understand what your data was exposure is. Depending on how long the ransomware went unchecked, there's a real possibility that your backups and disaster recovery site could also be impacted.

Knowing the strain of ransomware you were hit by can be helpful as well. For example, some delete the original files, some rename the files with a special extension. These will be clues out what data was impacted.

So now we need to figure out what is good and what is bad, you'll have some clues since users and applications could start reporting issues. Then again, maybe not, depends on the data impacted, and when it is next accessed.

Backups won't tell you what's been impacted, but they will tell you what's been changed, and some will tell you what was created or deleted. Some backup applications are starting to do entropy scans to help find suspect files. This is largely a work in progress.

The first step is identifying what user or users triggered the issue, and what time issues began. This may not be as straightforward as one may, think depending on what tools you put in place beforehand. Until you've shut down the root causes, moving forward with recovery won't be effective. So, now that we know the who, getting when is the next important piece of information. You'll need to know what was impacted. If you can figure out what resources the users could have touched, then at least you can do set elimination. You'll be able to isolate the problem to the set of resource that were potentially impacted.

This data could be available in your IT infrastructure. Are you running auditing of some kind? Either user auditing network auditing, or Active Directory (AD) auditing? AD auditing will tell you what shares the users authenticated against. Do you have data leak protection running? It will have the logs. If not, a brute approach might be disable the user's roaming profile and try and see to what users have access.

I bet that, if the sites that had significant outages knew the set of resources that needed recovery up front and what data was impacted, the recovery story would be much cleaner. Yes, backups and snapshots are a piece of the recovery. If you are going to be successful, you need to know what to restore, from where, and a timeframe. Otherwise, you could be doing lots of work and making little progress in the cleanup. I do recommend, if you have compliance team, you buy them coffee and ask them what they know about data changing. They probably have tools that can tell you who has been accessing what, or at least who could have accessed what. With this, you'll have some threads to pull on.