Imagine using faulty information in creating a building design or developing a product or running a political campaign or formulating a new drug. That's exactly what can happen – with devastating results – when hackers or other malfeasants infiltrate an organization and corrupt its data.
To ensure the highest degree of data security, the integrity of electronic records must be reliable across their entire lifecycle – from the time a file is generated through transference and its storage in databases or archives. Complying with the alphabet soup of regulatory requirements regarding data use and protection, while a thorn in the side of most organizations, can go a long way in safeguarding critical information from malicious threats. But mushrooming data volumes and confusion over how to execute a compliance plan and who's responsible can be an overwhelming task for any organization.
Access to data is no longer restricted to, and controlled by, a few departments of an organization, says Ravi Rao, senior vice president, pre-sales at Infogix. The volume and variety of data that is increasingly available, as well as the need for rapid decision-making in this data-aware world, has necessitated broad access to, and dissemination of, data and information – not only within the organization but to the consumer as well, he explains.
"This has placed a bigger emphasis on data integrity than ever before," Rao says. "Organizations are being forced to be more accountable for the data content, mainly because there are so many more eyes examining and analyzing the vast amounts of data."
This means, he says, that it is no longer sufficient to have a reactive approach to data integrity. "It will negatively and tangibly impact the business. In addition, it may be too expensive or worse, not even feasible to try to fix data integrity issues after the fact." So, he adds, organizations are seeing the value of building in data integrity and quality as a consideration in the design/build/deploy phase.
"If we agree that data integrity means, in essence, data value trustworthiness (e.g., I entered the address on Tuesday, and I can trust that two years from now, that address will still be exactly as I entered it), then I would say we have a long way to go," says John Avellanet, managing director and principal, Cerulean Associates, a Williamsburg, Va.-based consultancy and the author of several books on compliance.
Michael Angelo, chief security architect, Micro Focus
John Avellanet, managing director and principal, Cerulean Associates
Lucas Moody, CISO, Palo Alto Networks
Ravi Rao, SVP pre-sales, Infogix
Josh Shaul, vice president of web security, Akamai
Oliver Tavakoli, chief technology officer, Vectra Networks
Michael Taylor, applications and product development lead, Rook Security
Part of that challenge is technological transformation, he says. "It's never very clear if one moves data from System A to System B if the data – and its associated metadata – will be complete, accurate, consistent," says Avellanet. He points out that he is not referring here to PC to PC, but more like SQL to Oracle or Mac to PC, or SQL 2005 to SQL 2008. "That cannot be a certainty today and must have some level of testing associated with it to ensure that the characteristics that make up trustworthy data are still present."
The data and metadata must all still be present and available, complete, accurate and attributable in these instances, he says. "Let's face it, you can't throw a stick without hitting a story about lost or truncated data during a data move or migration, so this is still a very common issue across all industries."
Part of that challenge is technological obsolescence, he says, as we don't know that a PDF or a Word doc or a .TIF or a MPEG-2 or -4 file is going to be readable 10 years from now, much less 30 years from now. And yet, he adds, many regulations and statutes around the world call for data retention for years – in some cases as long as 25 years.
"If I wrote a report in the most popular word processing software in 1994, WordPerfect, and went to open it today in the most popular word processing software today, Microsoft Word, could I open it?," he asks. "Likely no. Microsoft stopped supporting the WordPerfect file format years ago. And yet millions of companies have WordPerfect files they are required to retain sitting in their archives or on their networks today."
The challenge, Avellanet says, is how would one produce that record in court or in a government investigation. "Well, without significant cost, you wouldn't. And can you even guarantee that the file has corrupted over the decades? So there is a disconnect between what our regulators expect/require and what we can actually produce."
When it comes to safeguarding databases or email or cloud implementations, Avellanet believe it boils down to at least four elements:
Technological or automated controls – This is everything from periodic data integrity checks run by systems to checksums, to security controls, to good disaster recovery backup procedures, to good long-term electronic data archival processes (and making sure not confuse short-term, disaster recovery backups with long-term data archives), and so on.
Manual or procedural controls – You have to have policies and procedures, everything from a good data integrity practices policy to a procedure on sampling long-term data archives for stability and integrity, a procedure on scanning paper documents to turn them into certified or true copy digital documents, etc.
Contractual controls – What happens to my data if the hosting company goes bankrupt? If they get bought and the purchaser of the hosting company decides to shut off the servers and doesn't realize my data is on them? Is the hosting provider allowed to sub-contract my data off to another sub-contracted hosting company? Is my data allowed to leave the EU or North America? Am I allowed to audit the hosting site? Frankly, this is not an area of expertise or much knowledge for most corporate lawyers, so it's imperative to get your legal department involved sooner rather than later because they'll likely need to go to outside counsel for advice and expertise, and that takes time.
Ongoing monitoring and trending – There is obviously a lot that goes on here. The key is understanding that when you outsource your data to someone else, you still retain the original risk and you add risk because now you need to have ongoing oversight and monitoring into how your data is being managed by a firm that might be around the other side of the globe, that might not speak your same language, etc.
Let's assume big data can be broken into two areas, PII and technical, says Michael Angelo (left), chief security architect, Micro Focus. "The current attack focus is not about data integrity as much as being about data access. The direct target (currently) is theft for financial gain. If we look at theft, it has yet to evolve to a point where your data is modified as an attack."
This having been said, imagine PII being modified for impersonation or as an offensive – say to disallow you access to your identity, Angelo says. "Or, what would happen if the data specification for a product were to be modified? Could it be made to not function or simply fail? What happens if the decisions, derived from big data analysis, were derived from faulty data? They would be incorrect and any subsequent actions would not necessarily be correct," he says.
We might be naturally inclined to assume that each time data changes hands, it loses some of its original integrity, says Lucas Moody, CISO at Palo Alto Networks, a Santa Clara, Calif.-based network and enterprise security company. "Many of us grew up playing the game “telephone,” where a message is relayed around a circle until it returns to the original source, where it is validated or debunked. Usually, someone along the sequence hears the message wrong, or intentionally changes it to make the outcome of the game more enjoyable for all. That's people."
In the arena of data and information systems, Moody says we're talking about machines that rely on the accuracy and structure of data to perform their intended functions to achieve the desired outcome. "Machines only change data if we tell them to in support of a desired outcome, or if someone has intentionally manipulated the data to change the outcome."
Rao at Infogix (right) agrees that the increasing availability of data is creating never-ending headaches for those responsible for securing the data. "As we have seen in the last few years, systems are being hacked and compromised with alarming frequency," he says. "Cyber, network, system and data security will continue to be enduring challenges. There are established approaches and standards that one can and must employ to safeguard databases, emails and cloud implementations. As these evolve over time, it is important to stay updated in security policies."
Expertise is widely available to help create, deploy and maintain such policies, Rao says. However, he adds, an often overlooked aspect of safeguarding data, whether in the cloud or not, is being able to organize it in the right way, both logically and physically, to prevent unnecessary overlap of data across necessary boundaries. "For example, having multi-tenant capability on top of “clean,” verified data allows for separation of data such that one group/department does not access data that they do not need access to, without requiring the unnecessary cost overheads and other inefficiencies of physical separation of data," Rao says. "Again, such approaches to safeguarding data are highly dependent on the integrity of the data."
Transfer of data beyond U.S. borders
As far as the evolving regulations governing transfer of data between the United States and Europe, there are some real legal issues here, says Avellanet at Cerulean Associates. A recent court case between Microsoft and the U.S. government is likely going to have to go to the Supreme Court (and possibly then to Congress), he says. "We cannot have any level of sustainable, fair, equitable justice in our society if all I need to do is keep all of my records and communications of wrong-doing in a country outside of the U.S. That's a recipe for all the law-abiding citizens and companies to store data in the U.S. and all those intending on not following the rules and the laws to storing their data outside the U.S."
As well, there's the new EU data privacy directive that comes into effect in May 2018 that is going to have major ramifications for cloud providers and other multinational firms. A number of Avellanet's multinational clients, he explains, have their primary EU datacenter in the U.K. With Brexit, that won't be part of the EU any longer, so now all that's up the air.
Josh Shaul, vice president of web security at Akamai, agrees that there are many regulations that govern the transfer of data internationally – all of which are heavily dependent on the type of data in question. "Most people would intuitively recognize that the use and transfer of military secrets and other top secret data is heavily governed by law and regulation, but lots of other data is regulated as well," he says. He points out that a number of countries have strict regulations around the transfer of personally identifiable information across national borders, and that the transfer of intellectual property may also be tightly controlled in certain circumstances. "In today's world, it's crucial for organizations to have a legal team that is well versed in international data regulations and legislation," says Shaul.
Companies located in the U.S. which transfer data outside its borders or house it internationally will need to comply with the local government laws regarding the PII and metadata that their applications generate," says Michael Taylor, applications and product development lead, Rook Security. "The European Union has existing laws surrounding the right to be forgotten. These laws dictate that a company must erase data at the request of an individual user. If a company has not performed validation surrounding its data to tightly constrain data to individual users, they will not be able to accurately comply with these regulations."
"The European Union has existing laws surrounding the right to be forgotten."
– Michael Taylor, applications and product development lead, Rook Security
The biggest issue for American companies has to do with privacy regulations in the EU, says Oliver Tavakoli, chief technology officer, Vectra Networks. The International Safe Harbor Privacy Principles, which governed this area starting in 2000, were overturned by the European Court of Justice in 2015 and have now been superseded by the EU-US Privacy Shield, he points out. The restrictions, he says, mainly affect the ability of U.S. companies to import data from EU entities, and much of this regulation is squarely targeted at U.S. companies which collect data on European consumers. "If your big data cluster has data from European sources flowing into it, you would be well advised to understand the applicable EU privacy laws," he says.
But, Angelo at Micro Focus says the transfer of data across boundaries depends on how we define data. "Data going outside of U.S. borders is not normally protected by U.S. laws (this is assuming we are not talking about export controlled materials). Data leaving other regimes, such as Europe, may be controlled depending on its nature."
Laws around the international transfer of data are dynamic, adds Moody (left). "The definition of what is personal information versus what is public information can vary widely. It's a challenging space that requires a very current understanding of the state of affairs of local, regional and national laws necessitating strong partnership between data custodians, information security professionals and legal/privacy teams."
In today's world, says Shaul, it's crucial for organizations to have a legal team that is well versed in international data regulations and legislation.
Transfer of data beyond U.S. borders is, of course, governed by the laws and regulations surrounding such transfers in the U.S., as well as the other country or countries, Rao explains. "In addition, the organization(s) involved have their own requirements for data security/privacy/protection. Assuming these policies and regulations are complied with, the data transfer is then happening in a sufficiently secure manner."
However, he adds, from a data integrity standpoint, transfer of data across borders may introduce new concerns such as language/cultural/other hurdles. "Data that was known to have been verified and “clean” at the source may not be so at the destination," Rao explains. "Simple differences such as date formats, time zone differences, and currency, to name just a few, would have to become part of data integrity considerations if data is to be moved frequently across borders."
Local and foreign data privacy and security laws have to be adhered to in addition to the policies of the organization(s) involved, Rao says. "Such laws or policies may prevent horizontal or vertical portions of data from being transferred, which means that the intended downstream business use of the data could be impacted."
Future threats to data integrity
Avellanet says there is plenty to worry about and address in a practical, realistic manner regarding data integrity, but he believes there are four particular areas of concern.
"Data theft is going to grow, particularly as we increasingly lose direct line-of-sight control of our data," he says. As an example, he explains: Say you store your data at hosting provider A. They sign a sub-contracting agreement with hosting provider B in another country, such as China or wherever, and some of your data now moves there (provider B will have significant access to any data at provider A). Unbeknownst to you, hosting provider B may hold your data – including your intellectual property, your business plans, etc. – and be owned (in part) by one of your direct competitors, by a government that actively steals cutting edge IP, etc. By the time you find out your data has been lost might be when you start losing sales to a foreign knock off of your product.
Avellanet actively encourages clients to map what he calls their data chain-of-custody, from creation to long-term archive.
Privacy loss is going to continue. There is simply too much money, and too little risk of getting caught much less punished, for hackers. Frankly, I'm starting to wonder if some of the dystopian cyberpunk, science fiction writers aren't correct in presuming that at some point we will simply throw our hands up and say everything is wide open. Sounds horrible, I know, but I think it might've been William Gibson who suggested the day may come when I can get anyone's data on anything at anytime, and at that point, personal data will have zero monetary value – and thus all the data privacy issues will go away. I suspect that's 50+ years in the future, so in the meantime, we have to struggle to control it and mitigate it since we're simply not going to stop it.
Laws and regulations will continue to be written by people with very little real IT experience – Only a miniscule percentage of Congress have any IT-related experience, so the disconnect between legal expectations, technological capabilities and data constraints will continue to grow.
The internet of Things (IoT) is going to be a hacker's dream. Already, hackers are successfully hacking and "data-napping" the data of hospitals and police departments, holding it for ransom. And while I'm certainly not looking forward to it, it's only a matter of days before people with pacemakers or with self-driving cars, get phone calls explaining what's going to happen to them unless X-thousand dollars are wired to a particular account. We'd be naive not to realize that this will happen.
"New vulnerabilities and new ways to exploit old vulnerabilities continue to emerge and be disclosed to anyone who is listening."
– Josh Shaul, vice president of web security, Akamai
"New vulnerabilities and new ways to exploit old vulnerabilities continue to emerge and be disclosed to anyone who is listening," says Shaul. "Unfortunately, it's almost always much easier for an attacker to take advantage of a new exploit than it is for all the vulnerable organizations around the world to mitigate the risk. This window of exposure, which sometimes lasts for years at individual organizations, is the biggest threat we face today and will continue to be into the future."
Taylor at Rook Security believes that the integrity and longevity of data that we generate in our everyday lives will continue to be a difficult problem for government agencies and companies. "With nine out of 10 U.S. citizens believing in some type of right-to-be-forgotten laws and other countries having already passed them, companies will undoubtedly have to comply with those requests in the future. Without appropriate validation of data, those companies will not be able to definitively state that they have only associated the correct data with an individual."
The misassignment of data has led to reporters and politicians landing on the No Fly list, Rook says. "A similar situation could arise where a company fails to completely erase the data of an individual and leaves a ghost of their profile which is then immediately associated with them again once they generate new data."
As more of the decisions made by businesses depend on the results of analytics performed on big data, there will be an incentive by crooks to pollute the incoming data in ways which either alter the decisions or make them predictable in ways that can be monetized, says Tavakoli at Vectra Networks. As an example, he points to criminal organizations that can alter the data on which analytics is performed and cause a hedge fund to buy or sell a stock. "The criminals can make large sums of money front-running the transaction," he says.
Angelo regards the biggest future threat centers mostly around our naivete surrounding the value and damage that can be done by either exposing information into big data, or the illicit modification of the information in big data. "Think about social media as an example," he says. "Today, people post volumes in it – everything from where we are eating, to beverages we consume, to trips we are taking, and even pictures of friends doing silly things. All of this information can be used for different things – we just don't yet know if it can be used for things to hurt us or help us."
Moody agrees that data is becoming increasingly more valuable, both for good and bad. "Criminals, nation-states and hacktivists are evolving to find new ways of monetizing data, leveraging data as a means of propaganda, enhancing local technology markets, or influencing national/international communities at scale."
While he expresses a generally optimistic picture of how data is beneficial, he says, the evolution of the threat landscape will constantly bombard stewards of data with new and innovative means to steal, manipulate or otherwise harm the integrity of the data that drives the international economy. "Furthermore, advances of compute capabilities exacerbated by Moore's law [the notion that technology doubles every two years] poses a risk to current cryptographic capabilities necessitating constant innovation in the space."
New vulnerabilities and new ways to exploit old vulnerabilities continue to emerge and be disclosed to anyone who is listening, Shaul explains. "Unfortunately, it's almost always much easier for an attacker to take advantage of a new exploit than it is for all the vulnerable organizations around the world to mitigate the risk. This window of exposure, which sometimes lasts for years at individual organizations is the biggest threat we face today and will continue to be into the future."
Data integrity has to be an important consideration early on in the design/build/deploy phase rather than as an afterthought, says Rao. "Data is not static. Its nature and content continues to change and evolve as organizations introduce new services and products. So, even after the appropriate consideration in the early phase, a continuous and automated approach to ongoing verification of data integrity has to be put in place."