Imagine in a 2016 remake of the classic film Gaslight, a young security professional is driven to the brink of insanity – and impending disaster – by a cyber schemer who unbeknownst to IT security has over time moved around and corrupted bits of data, manipulating, let’s say, the design of a jumbo jetliner or perhaps the composition of a vaccine, to execute an unspeakable attack. 

A little dramatic? Not really. While cybercriminals to date have mostly focused on stealing credentials for financial gain or disrupting businesses or organizations, the corruption of data, sometimes years in advance of an attack, is a growing – and more challenging – threat.

The volume, variety and velocity inherent in big data makes it difficult to ensure the integrity of all the pieces of data, says Oliver Tavakoli, chief technology officer, Vectra Networks a San Jose, Calif.-based vendor of automated threat management solutions. “Think of any big data cluster being fed by a broad supply chain of big data – each data source may have a potential supply chain integrity issue and there are many independent parts of the chain and they are moving very quickly. And given that the data in a cluster is usually viewed through the lens of results of analytics rather than being looked at directly, data integrity issues may have subtle but important effects on the final results.”

The increasing volume of data that is being gathered about users by agencies, banks, websites and mobile applications has greatly increased in value, says Michael Taylor, applications and product development lead at Rook Security, a managed security services provider (MSSP) based in Indianapolis. “The analysis of user data allows organizations to better understand and anticipate the future needs of its users. Maintaining the security and integrity of this data will need to be protected with the same urgency of other personally identifiable information (PII).” 

It would be difficult to argue that the world is in a good state when it comes to data protection, agrees Josh Shaul, vice president of web security at Akamai, a content delivery network and cloud services provider headquartered in Cambridge, Mass. “We have more tools and technologies at our disposal than ever before, yet the drumbeat of major data breaches rolls on.”

The problem is complex, Shaul says. “Security is all about the weakest link in the chain and attackers have proven very adept at finding the weak links,” he says. “The world has done a relatively good job of protecting our sensitive communications on the internet using cryptography, so attackers have moved to target the sources, destinations and data stores – where the data must be decrypted to be used.”

This forces organizations to look at the entire security lifecycle, which for many is too costly and complex to be achievable, Shaul points out.

John Avellanet,
Cerulean Associates

And there’s another challenge, says John Avellanet, managing director and principal, Cerulean Associates, a Williamsburg, Va.-based consultancy and the author of several books on compliance, and that involves people. “We don’t know what we don’t know,” he says. “How many IT change control requests to update a system or upgrade a system or replace a system include data regression testing to verify that data sitting on the system from yesterday, last month, last year, can still be accessed? Is still complete? Is still all available, consistent, attributable? 

The answer is very few. And Avellanet has the numbers to quantify his thesis: “Of the 20 data integrity audits that I conducted just last year for clients, just one firm had a change control process that required data regression testing, and they’d just implemented it and weren’t certain yet how to do it. So, we’re making progress, but we’ve a long way to go.”

In the big data/SaaS world, Lucas Moody, CISO, Palo Alto Networks, says it seems as if we’ve created a giant game of telephone, but the reality is all the parties engaged have a vested interest in ensuring the integrity of the game and the final outcome.

And while integrity in big data environments has been a debate in recent months, particularly in use cases involving massive compute operations, genetic research and clinical studies among others, the pollution or injection of small amounts of data are oftentimes inconsequential when dealing with large data sets, as the law of large numbers would indicate, Moody says. That said, in environments where data integrity is paramount, data at rest, data in transit and control around those who have the capability to manipulate data has to be considered in a comprehensive information security strategy, he says.

Protecting the integrity of big data is a much larger and more complex problem than that of traditional PII, says Michael Taylor, applications and product development lead at Rook Security. A single record of information about an individual may contain data like street address, date of birth and Social Security number, he points out. “In a big data context, a single user may generate many thousands of times that volume of data through their every day use of a website, app or service. This larger volume of data will typically be generated and piped through several different resources.”

Verifying the integrity of data as it passes through multiple tools is where the increased complexity comes into play, he adds. “Ensuring that the data generated on the user application side has not been manipulated inadvertently or maliciously before arriving at the final data store requires external monitoring and sampling of the data in motion and at rest.”

The state of data integrity is not very good, says Tavakoli. “We’re in the early stages of understanding the implications of data integrity issues. While data engineering teams have been trained to cleanse data (throw some of it out because it lacks certain key fields) and normalize it, they have not been trained to look for signs of tampering with the data. It’s akin to the early days of cybersecurity when there were weaknesses in the way code was developed and the SDL acronym hadn’t been invented yet.”

Akamai’s Shaul says we have so many things to secure. Organizations need to take a holistic view of their data, he explains. “They must ask: Where is the sensitive information stored? How is it used, processed and transmitted? Who has access at each level – and more importantly, who should have access?”

Lucas Moody, CISO,
Palo Alto Networks

“Security always starts with understanding your own estate and building a threat model that helps you understand what and where an attacker is likely to target,” Shaul says. “With an understanding of which data and systems need protection, organizations must think about people, process and technology – as all three must be aligned to be effective.

Finally, it comes to implementation, Shaul says. His firm advocates a layered defense model, where security controls get stricter and more difficult to bypass as one gets closer to the sensitive data in question.

Rook Security’s Taylor adds that companies will need to purchase or develop tools to monitor the data flow into their data stores. “These tools will need to be able to observe and validate chain of custody of the data being generated by a user until it is entered into the data store. The protections that are used to protect user PII will need to be put into place for this data as well because this data can be used to definitively identify an individual user.” The data must be protected whether it is destined for a cloud-based solution or a local data center, he says. 

Tavakoli admits there is no easy answer and no silver bullet. “Regarding your big data clusters, be keenly aware of where your data is coming from,” he says. “Encrypt and/or sign the data in transit so it can’t be tampered with by an unauthorized middleman (you will still be susceptible to breaches of authorized middlemen from whom you get data). Encrypt the data at rest in your cluster. Implement strong access control to ensure limited write access to the data cluster. Record all modifications of the data store. Audit your controls frequently. Follow much the same playbook for any data (including corporate emails) you keep in the cloud.”

If we truly wish to protect these things, we need to first understand what and why we are exposing them outside of our direct realms of control, and to whom we are exposing them, says Michael Angelo, chief security architect, Micro Focus. “Then we should re-examine our desired end goal and make sure the risks are outweighed by the benefits. Of course we can use encryption to protect them, but does this really provide the protection we think it does? Even if we assume that the keys are strong and well protected, the materials are still subject to brute force attacks.”

If we talk about cloud issues, adds Angelo, assume IaaS or PaaS or HaaS: the data is not encrypted while the cloud is active. “Therefore, anyone with access to the cloud infrastructure (or virtual machines) can get and/or modify the data.” 

Confidentiality and integrity are attained through a comprehensive information security strategy that includes considerations for strong access control, enlistment of entire employee communities and data protection through the use of encryption, says Moody. “It starts with strong access control and identity management, ensuring that sender and recipient are who they say they are, and secured by adopting “zero-trust” principles. This is augmented by using the right technologies deployed with a preventative mindset to securely segment critical environments thereby establishing an architecture designed to prevent breaches.”

This approach, adds Moody, must include strong employee education and cyber hygiene as paths to data are not strictly technical, but often involve the compromise of the people involved.

And, finally, Moody adds, all of this needs to be backed up with an encryption strategy that acts as a last line of defense.