Shadow data: The monster that isn't just under your bed

Rehan Jalil, CEO, Elastica
Rehan Jalil, CEO, Elastica

One of the foremost issues plaguing those responsible for safeguarding their organizations' information assets is the increased consumerization of IT as driven by the bring-your-own-x (BYOX) movement.  As end users bring their own devices, applications, and even networks into their employer's fray, hallowed IT security concepts like visibility, control and peace of mind are jettisoned out the window.

The result is a severe shadow IT problem for the organization. However, having a singular focus on shadow IT is tantamount to repeating a significant historical mistake. Specifically, the information security industry has long realized that devices, applications, networks, and so on, are ultimately touch points for what really matters most — the data. The focus should not be on touch points for their own sake, but rather because they are conduits for critical pieces of information.

In this vein, I posit that the real security concern around BYOX is not simply Shadow IT, but rather Shadow Data, which is sensitive data shared broadly within and outside organizations without the IT security team's knowledge.

Consider, for example, a cloud file-sharing application like Box, Dropbox, Google Drive, One Drive, Syncplicity, etc. Knowing whether one of these  applications is sanctioned by IT is important, but there is also critical need to know what data is passing through them and the accessibility of the underlying data. After all, security is fundamentally about ensuring that only the right person has access to the right information at the right time.

In the context of securing cloud file-sharing applications, several relevant questions must be answered. Among them: Which files are shared publicly? Which are shared company wide (i.e., just a stone's throw from being public)? And, which parties outside the company have access to files that originated in the company?

Beyond just access, however, the underlying content is critical. Many important questions arise in this vein as well. For example, does a file contain compliance-related data, such as protected health information (PHI), personally identifiable information (PII), or payment card information (PCI)? Does the file contain a design document, source code or some other form of proprietary information?

These questions around file access and file content go hand-in-hand. It might be OK if an engineer had access to a file containing source code but it would be highly questionable if that same file were accessible to a member of the marketing team. While these questions appear simple to ask, they are notoriously difficult to answer, especially for cloud file-sharing services.

Traditional approaches cannot secure all data in the modern age

Incumbent approaches for addressing data governance issues, such as data loss prevention (DLP) technologies, were not designed to address these issues for many reasons. First, DLP technologies examine raw traffic transmitted over a physical network. However, when files are shared via a file-sharing service, what is being distributed is not the file itself but rather a link to that file. Traditional technologies do not make the distinction and fail to proceed further.

That's not all. Traditional technologies for governing data loss were typically ensconced in physical appliances sitting on network boundaries. Traffic traversing these boundaries would be examined. In today's world, where files are easily shared via the cloud, the notion of a physical perimeter has irretrievably faded into oblivion. The data in question already does not reside on organizational IT assets. Consequently, it is outside the scrutiny of on-premises devices.

The final nail in the coffin of traditional approaches is that they were necessarily simplistic. Because they had to work within the confines of an on-premises appliance, the modus operandi involved looking for basic patterns – i.e., what we call regular expressions. Of course understanding, what's in a document goes far beyond simply examining the words.

A doctor's curriculum vitae and a patient's medical record will share many common terms. The former is meant to be public, while exposing the latter could get you thrown in jail. An elementary school student can distinguish between the two in a split second – shouldn't we expect computers to do the same?

Protecting data today requires a new approach

Fortunately, with advances in areas of artificial intelligence, including machine learning and natural language processing, we can start to achieve this goal. Moreover, if we leverage web-scale systems rather than on-premises appliances, we can bring more computational horsepower to bear.

We are standing at a phenomenal juncture in the evolution of information technology. While the amount of data in the world has exploded to unprecedented level, that rise has been met with commensurate increases in our ability to process and understand that data.

It is no longer acceptable to apply 20th century tools to the security problems associated with 21st century IT infrastructures. Information security has always been and always will be about protecting data – that immutable premise will continue to hold tomorrow as much as it has yesterday and today.
You must be a registered member of SC Magazine to post a comment.

Sign up to our newsletters

TOP COMMENTS