How is data lost?

When enterprise data met email and IM, important data began leaking out of corporations unnoticed and it spawned a new security problem called data leakage.

When thieves with insider credentials used those credentials to exploit large corporate databases, enterprises didn't see what was going on because somewhere between the firewall and the data servers they had no visibility into what was happening with data. As a result, a massive amount of data went missing. This is the data breach problem.

Traditional data security: a quick review
Traditional data security is focused on the perimeter of the enterprise data universe. The layered defenses commonly deployed to protect corporate data include: firewalls; identity and access management (IAM) systems; vulnerability assessment tools; network behavior anomaly detection (NBAD) utilities; intrusion detection and prevention systems; log aggregation and management tools (SIM/SEM). In some cases, encryption is also used to mitigate data risk by making data-at-rest visible to authorized users only. Content filtering/data leak prevention is sometimes deployed to help stop mischievous outbound email, IM and web traffic.

With all of these defenses in place to protect data, what has gone wrong? In light of high-profile data breaches such as TJX, Certegy and Monster.com, most people would like an answer. But the problem is a complicated one. Or is it?

A simple way of looking at data loss
Victims of data theft include the federal government, notable universities, retailers, job posting sites, brand name manufacturers and the millions of people whose personal data was compromised. At first blush, the data loss incidents seem as disparate as the victim types. On closer examination, the incidents can be divided into four categories of data loss: email, laptop theft, lost storage devices (tapes, etc.) and data center breaches. Examining data loss across these four categories not only highlights the similarities, but also provides a means to look past the incidents themselves to consider the actual amount of data lost. This perspective begs the question, how do the number of incidents in each category of data loss correlate with the volume of data actually lost? Answering this question is the first step toward mitigating the most pressing data security problems.

Data leak vs. data breach
First, let's examine a few notable data incidents to put data leak versus data breach into perspective.

A U.S. Department Veterans Affairs employee brought home a laptop containing sensitive information about veterans and their families. The laptop, which contained the names, social security numbers and dates of birth of as many as 26.5 million veterans and their families, was stolen creating what was believed to be the largest data breach in U.S. history (until TJX).
Choicepoint, a company that maintained a massive database that held the personal information of most Americans was compromised by criminals, putting the sensitive personal information of 145,000 American consumers at risk. The criminals posed as legitimate Choicepoint customers in order to get a toe in the door.

Thieves intercepted TJX's Wi-Fi network data and essentially listened in on users logging into TJX's sever. They set up their own accounts and use those accounts to log in whenever and from wherever they wanted to steal customer data. The data compromised is now estimated at 45.6 million credit and debit cards.

The personal financial data of 8.5 million consumers was stolen by an employee of Certegy Check Services, Inc/Fidelity National Information Services (Certegy/FIS). A database administrator allegedly sold the information to direct marketing firms and data brokers who, in turn, sold it to others. This employee used legitimate access for illegitimate activity.

Monster.com's resume library was breached by hackers who used stolen credentials to gain access to confidential data on 1.3 million job seekers. The hackers used the information to send phishing emails to the victims. The emails, which appeared to be from Monster.com, either asked for personal financial data including bank account numbers or enticed users to click on links that would result in their computers becoming infected with malicious software programs.

These are just a few examples of the 159 million compromised records that Privacy Rights Clearinghouse includes in their Chronology of Data Breaches. The list represents known data breaches since 2005 and an enormous amount of data lost. It illustrates one of the root causes of data breaches—credentialed access to large data stores containing sensitive information and the data holder's inability to automatically recognize the difference between routine and potentially damaging data access. In the cases of breaches where the root cause was credentialed access, some of the access was legitimate, but the action taken may have been against security or compliance policy. In other cases, the access looked legitimate, but the credentials were stolen or manufactured.
How the breach happens: insiders and outsiders

In the past, traditional security best practices relied on credentials with user authentication, server-level access and other controls that provided or denied access based on job role, security clearance or other determination. These methods, along with technologies like encryption and content inspection, did a reasonably good job of keeping bad people away from important data assets. The problem is that none of these tools can distinguish among legitimate users conducting authorized business, incompetent insiders innocently bypassing security policies and malicious intruders masquerading as legitimate users. Today data has rising street value and is a popular target for thieves who are smart enough to hijack or create credentials to break into data stores unnoticed. This is the insider/outsider problem.

In the case of the Department of Veterans Affairs, the user evidently was legitimate. But did the organization know that an employee was downloading millions of sensitive records onto his laptop? In most organizations this action would be against policy—no matter how trusted the employee. Despite the fact that this action would be deemed inappropriate, most organizations do not have the means to know immediately if an employee is putting a large amount of critical data at risk.

In the case of Certegy/FIS, purportedly a database administrator, someone who could easily cover his tracks, used his privileges to make a buck on the data in his employer's databases. Most organizations do not have the means to monitor the activity of their most privileged users, so they can't know if a privileged user has gone bad in time to stop them.

The Monster.com breach was a creative attack. The primary attack was reported to be a hack into a big database using stolen credentials; a data breach incident where credentials were obtained and a mass data theft ensued. The secondary attack used the insider information to entice victims into downloading malicious software or providing personal information. People are not typically concerned about information that is considered relatively public (resumes) as opposed to private (bank account numbers, Social Security Numbers and passwords). But in this case, the thieves collected enough insider information to hijack Monster's identity and use it to take data theft to the next level.

A pattern seems to be taking shape predicated on our inability to discern the good guys from the bad guys once our access security fails. This hole, this lack of visibility into what is happening with data, is big enough to let millions of records seep through. This is not to say that perimeter security isn't valuable or doing its job. It is on both counts. The problem is that smart thieves, inside and outside of companies, are finding ways to exploit the blind spot the weak link.

Data security: core vs. edge
Data leakage is a popular buzz word these days. Data leak prevention (DLP) products are typically edge solutions that monitor confidential data leaving enterprises via email, web or IM-type applications. Some DLP solutions monitor desktops and laptops. DLP can also be referred to as content monitoring and filtering (CMF) or extrusion prevention.
core data monitoring (data breach protection) on the other hand is typically achieved with data auditing and protection (DAP) solutions. DAP solutions are data center technologies that monitor access to data stored in databases and fileservers as well as track and alert on data breach activity. DAP sits close to the data source and knows when a user accesses sensitive content from databases, file servers and mainframes.

Traditional DLP solutions can monitor when the content leaves the enterprise via email from PCs, but it may not have visibility to know how data was accessed. DAP has visibility at the stored data level, so it knows who accessed which data from where and when. DLP requires straightforward intelligence to detect unencrypted credit cards or known data patterns. Data theft from data servers, however, is more complicated because most access to sensitive content looks legitimate. This means that DAP solutions must have the built-in intelligence to tell the difference between normal access and suspicious behavior.

So which direction does an enterprise take? DLP or DAP? Logic tells us both, but let's dig a little a deeper.

What you can't see can hurt you
Whether from a data leak or data breach, stopping theft and misuse of sensitive data is a top priority for enterprises. Based on recent, high profile data breaches, it is safe to surmise that the issue could be lack of visibility into what is actually happening to critical data between the places that it is stored or served and user interaction. This inability to see, in real-time, which users are accessing what data from where and when, has played a role in most of the mass data breaches that have occurred in recent years. The problem is that many companies don't know where to begin solving the problem.

DLP adoption is on the upswing over the last couple of years. Enterprises suspect that email and portable devices are the biggest data loss culprit, since they are fast and easy express trains out of the enterprise. But is most data loss actually occurring via laptops and email? Let's take a look at some readily available breach data to put things into perspective and help figure out the best way to attack the data theft problem.

The data loss hypothesis
Given that the Privacy Rights Clearinghouse has compiled a comprehensive list of data breach incidents, this is reasonable place to start in evaluating the problem from the perspective of how data is lost. If we take the raw data that their Chronology of Data Breaches provides and divide in into the four categories mentioned above — email, tape, laptops and databases — we can calculate the number of breaches that fall into each category. Using a total of 318 filtered incidents (as of June 2007), laptops are the No. 1 source of data breach incidents (47 percent), databases next (40 percent), then tapes (11 percent) and email (two percent). If we look at the same data from the point of view of the amount of data lost, the picture shifts somewhat — databases rank first (64 percent), laptops are next (25 percent), tapes (10 percent) and email (one percent). These figures do not take into consideration the value of data taken (we could make some statistical assumptions) and it is likely that there have been many smaller data breaches that have not been reported, but even so, this data is extremely telling.

It demonstrates that all data breaches are not equal — the source matters.

Preventing data loss: where do we go from here?
With escalating threats to data, existing systems, applications and processes that need to be considered, a growing number of legal requirements, shrinking IT budgets and a variety of opinions (from stakeholders inside and experts outside of organizations) about what data security means and where the emphasis should be placed, it's no wonder that the data breach problem is daunting.

In the case of data on laptops it is essential to know what is being downloaded onto these devices (core data monitoring and auditing) and have the ability to trigger an alert if the download violates company security or compliance policy. In the case of data loss via email, recognizing when specific types of sensitive data have found their way into an outgoing email or instant message and having the means to alert or stop it is key. To prevent data theft from the core data servers, enterprises need to look beyond perimeter security to technologies for auditing and analyzing user access to data. These solutions provide visibility into the difference between legitimate data access and malicious activity while being able to alert on potential data breaches in real time.

And with more attention devoted to monitoring the core; where sensitive data assets reside, data loss, especially mass data theft from corporate databases, will occur far less often - which means that there will be much more room on the front page for more positive business news.