What makes DLP so hard?

At my university we have a mandate to deploy data loss prevention – DLP – but we are finding it quite a challenge. Why? Do other organizations have the same difficulty? The problem, actually, is not DLP, it's data classification. Without data classification you cannot get to DLP. And data classification might be the most difficult security task any organization undertakes. The reason is simple, though the problem is not. Nobody likes to take ownership of data that they must share with others. This is a perceived issue of responsibility without control, and it is not, of course, unique to this venue.

There are two phases in data classification for most organizations: legacy data and new data. New data is the easiest. There are several products available for assigning a classification at the time the data item is created. Oddly, most people don't seem to mind that. Odd, that is, when taken with the reluctance of most to take ownership of legacy data.

The first step in solving this dilemma is to define clearly the data one has. That by itself can be a massive undertaking. For example, I did an e-discovery project recently for a mid-sized organization. There were only two individuals whose email I was interested in. The emails went back less than 10 years. There were tens of thousands of items, including the various file attachments to some. Multiply that by all of the people in the organization and the number gets very large. Expand that to a large organization and the numbers are astronomical. Consider the Enron case for example.

There are two phases in data classification for most organizations: legacy data and new data.

How do you classify all of those data elements? More important, what about the data for which nobody wants ownership? Who own the organization's enterprise resource planning (ERP) system, for example? The knee-jerk reaction is that it belongs to the IP shop, but that is a very unsatisfying answer, especially for the folks who work in that department. It may live on their systems but they have little to no control over who uses it or how it is used. That is driven by business needs, not technical needs.

So, classifying legacy data actually has a couple of components: identification and classification. Classification itself is dependent on assigning ownership. To classify anything, one needs a classification scheme. I've found that a simple one – public, internal use and confidential – is easiest to implement. Once the classification scheme is established it's time to assign ownership and classify. Classification is easiest if that simple scheme also is documented thoroughly so that it becomes easy to tell which level of classification is appropriate for a particular item. Strangely, it also makes it easier to assign ownership.

Here is a radical approach: Pick a group within the organization that is appropriate for the task and, de facto, assign all legacy data ownership to them. This might be IT security, privacy or any other group that fits cleanly into your governance scheme. Draw a line in the sand that says that going forward, though, the worker who creates a data item owns it and must classify it. That person also is responsible for ensuring that the classification is correct and comports with the organization's policy. Select a tool that makes this easy and enforces the classification so that the owner – who probably has a lot more pressing things to do than following his or her data around to make sure that it is being used in compliance with its classification – does not need to.

This month we have some tools that can help make the DLP and data classification tasks much easier. Review honors were done by Sal Picheria and Ben Jones. Thanks, colleagues.