Data classification and DLP

It's data classification first and DLP second. At least that is the way it's supposed to be. Some organizations - in a hurry to protect sensitive data in an age of large-scale breaches and strict regulatory requirements - try to do it the other way. It is possible to come up with a scheme to get close to this backwards approach without actually risking future problems by missing something.

First, what, exactly, do we mean by data classification? Simply, it's tagging each data item in the organization with a meaningful description. That description tells at a glance what the sensitivity of the item is, and a quick look at the data classification policy tells what that means. Simple is better, especially in this case. So, we will start with an example that uses three levels of classification: public, internal use and confidential. The real crunch is for the confidential data items. Those are the ones the bad guys want and the regulatory requirements seek to protect. We'll start there.

Some data items scream out "confidential." Items such as credit card information, personal health information, personally identifiable information that could be used for identity theft or that must be protected by law, all are obvious candidates for the confidential classification. Fortunately, these are easy to locate within the organization, or, at least they should be. So, it becomes practical in most cases to tag these items and then configure the DLP system to behave appropriately with the confidential data type.

This, of course is not to say that we are finished with confidential. What we have done is stuck our finger in the virtual dike and started the process at the easy end. We might even have been able to do this manually, although we do not recommend that. It is a time-consuming, error-prone process done manually. The best bet is a solid data classification tool. This will allow you to continue the process with consistency and it certainly is easier to do than manually.

Now, assuming that you are using an appropriate tool, it's time to finish with confidential. Why? Because they are the organization's family jewels. The treasure of the organization likely is wrapped up in them. They may consist of intellectual property, trade secrets, or, in the case of a university, data beholden to FERPA (Family Educational Rights and Privacy Act). Other regulatory acts such as HIPAA, GLBA and others also mandate certain data that must be protected. These all are candidates for the confidential classification. But, unlike the obvious ones with which we started, these may be harder to find. That means tuning your classification tool so that it knows what your policy - or the law - considers sensitive and locates data items based on those criteria.

Your last step is to tell your classification tool to find and tag data items that your policy considers restricted to employee use. That might include such things as company phone books. Tag these also. Whatever is left over is public and you may or may not tag these at your discretion.

All through this process we have referred to policy. It should - but probably doesn't - go without saying that this entire process is policy-driven. If you don't have a solid, well-defined classification policy, all of the discussion above is for naught. Like the old farmer telling the city-slicker how to get to some small town in the boonies, "you can't get there from here." So start with the policies. Never try to perform the classification process by the seat of your pants, it simply won't work and it will, we assure you, end badly.

Now, it's time to implement. You have all of these nicely tagged data items. What will you do with them? There are a couple of reasons for data classification. The most obvious one is to tell users what the sensitivity level of the item is. But the more important reason is to control the exfiltration of those items that should not leave the organization or should be limited to readers with 'need to know'. We also want to ensure that the item is not altered in any way without a method of identifying the alteration.

We can define our classification system needs based on the size of our organization and what we want to do with it. For example, it is very useful to be able to de-duplicate emails and documents, especially in large environments. Defining our DLP needs seems straightforward, but it has one little wrinkle: it needs to be compatible with our classification system. In other words, it needs to spot our classifications and behave in accordance with our policy requirements for that classification.

What that means, simply, is if you have nothing, you should buy the two tools at the same time and ensure that they are compatible. If you have one and not the other, make sure of compatibility before you buy the remaining piece.

Specifications for data classification and DLP tools ●=yes ○=no

Product	Includes built-in templates to support regulatory compliance	Available as a physical appliance	Available as a cloud service	Available as a VM or standalone software package	Prevents exfiltration of classified file attachments	Performs network inspection
Boldon James	●	○	○	●	●	○
Code Green Networks	●	●	●	●	●	●
Identity Finder	●	○	○	●	●	●
TITUS	●	○	○	●	●	○
Varonis Systems	●	○	○	●	○	○