It has been said that encryption simply trades one secret (the data) for another (the key). In the same way, encrypting data naturally shifts attention to that which is not protected: the metadata. As several high profile stories in 2013 can attest, metadata is clearly in the sights of the tireless and opportunistic adversary, who is now leveraging metadata to identify targeted content. As organizations look to implement data-centric protection and as adoption of encryption accelerates, it is critical to consider how metadata is handled and secured.
Often, metadata stays in the clear while the content is encrypted. In fact, in some encryption schemes, the metadata travels with the encrypted data along with the key. Think about that for a second. The data is encrypted to protect it from harm, but metadata that comes along with it may provide an attacker the information needed to identify if the encrypted content is worth pursuing.
In some interesting cases, the metadata may tell as much of a story as the actual protected content. Consider a phone call, text message or email. There are methods for encrypting the content of each of these. But the metadata about the communication – the recipient, the time of the communication, the location of the recipient or sender through IP location – is in fact behavioral information. Collect enough metadata and one could build a picture of someone's activities and interactions. It has been suggested that the recent revelations of the NSA collecting wireless phone carrier data is much more about the behavioral data inherent in the metadata than the content. In such cases, the metadata is the content. The recent announcement of the Dark Mail Alliance lists the protection of metadata from leakage as a primary objective.
There are implementation choices that offer perceived advantages in managing metadata. Using a server-based approach that maintains the separation of certain areas of concern – the physical separation of the keys, metadata and access controls from the encrypted data. With this implementation model, the encrypted data is sent as a separate and distinct payload. The metadata is stored in the server that manages the keys, and is retrievable by authenticated parties authorized to access the data. If the data leaks or is exfiltrated through malicious means, there is no metadata to identify the contents or leverage for its behavioral content. This does not preclude the use of malicious means to intercept the metadata, but it does mean that it is not directly connected to the encrypted file in the clear.
Now take this idea of separating certain areas of concern and apply it to cloud storage. The server-based approach, if properly implemented, enables organizations to store encrypted data in the cloud and keep the keys, access controls and metadata behind the on-premise firewall. Such an approach empowers organizations to reap the financial and operational advantages of cloud storage while retaining control over the data and the metadata. In other words, data leakage in the cloud does not expose the metadata in the process.
Metadata also comes into play when considering the challenge of searching encrypted content. A simple, relatively unsophisticated approach is to use metadata to store index terms for use by the search process. This is an obvious tradeoff because the search terms in clear text are themselves a form of data leakage. In response, some organizations have implemented controlled searching that prevents an untrusted service or entity from searching the metadata. Others are looking toward techniques such as bloom searches and secure indexes, and Symmetric Searchable Encryption. Suffice to say that as Big Data continues to be a mega-trend, the ability to search encrypted files will become a serious issue for consideration.
As data moves away from the center of the organization and businesses look toward encryption to protect sensitive information, metadata will increasingly be a vulnerability that must be addressed.