Time to navigate the legal and data governance implications of AI

Ever since Big Tech began talking about generative AI in their earnings reports, there has been a frenzy over its potential benefits, from increased workplace productivity to breakthroughs in medical research. And investments have been following the hype: According to Axios, first-quarter venture capital funding of generative AI this year was $2.3 billion, up from $612.8 million in last year’s first quarter.

But generative AI — meant to create new information or solutions from large data sets — is only as good as the data that feeds into it. And so, it’s now more important than ever to preserve the integrity of the data we use. For businesses, that will mean recognizing the privacy implications of adopting this new technology. It also means establishing and communicating clear governance policies to maintain transparency and trust with customers and partners.

AI has quickly become such a buzzword that many people use it often as a marketing tool when they really mean machine learning — computers recognizing and matching patterns within a welter of data. Machine learning has long been used in business processes across many sectors. But generative AI takes this kind of technology a powerful step forward and lets machines create something original from all that information.

For instance, our company already uses machine learning to balance data storage loads, automatically moving information among servers to ensure that all of them remain below a capacity threshold where performance would suffer. But a hypothetical generative AI solution might analyze the operations of all servers under our management and draw new conclusions to increase capacity and efficiency. For instance, it might conclude, based on all the inputs, that we could store more data at higher performance levels in centers located in cold climates because the machines don’t have to work as hard to operate.

There are certainly exciting possibilities for generative AI, but we must approach them with caution. We’ve already seen some worrisome — and embarrassing — episodes, like the court filing earlier this year from two New York City lawyers who cited fake cases generated by ChatGPT, resulting in widespread publicity and a $5,000 fine. If we’re to realize the technology’s highest potential while safeguarding against breaches of privacy, security and accuracy, we need to start drawing a roadmap now.

One of the holy grails for businesses has long been unlocking insights, whether to better serve their customers or to optimize their operations from all their data. What if we could upload all of a client’s employee health insurance data to an AI product that could assign a health score to that company, perhaps lowering its insurance premiums? Or maybe we could offer a system that scans all of a company’s records, determines which ones the company is required to keep, and for how long, and then automatically clears them out after the holding period required by regulators.

These kinds of products would offer clear benefits, but if not done carefully could easily run afoul of any number of privacy laws. Unless the health data were anonymized, employees might not want every doctor visit shared with an outside entity. Many companies are leery of letting any outside entity mine any of their data, preferring to keep it stashed in a figurative lockbox under the bed.

The legal and ethical need to protect personal and corporate privacy has become even more complex, given the global arena in which we operate. Various regions and countries have different concerns or, in many cases, even more stringent protections than the U.S. does. Settlements for privacy breaches can run well into the hundreds of millions of dollars, and even exceed $1 billion.

There are other implications as well. Once an enterprise contributes data to a technology, it’s often easy to lose control of it. Executives should make sure they understand upfront where the data goes; who will have access to it; how, by whom and for what purpose it gets used; and who actually owns it once another entity has accessed it.

That’s part of why it’s so important to maintain back-up copies of clean data sets free of manipulation. If there’s an error in the output, a newly-introduced security vulnerability, unforeseen questions about the data’s integrity or the need to change course — under any of these scenarios, the organization would really need to reset and roll back to the original data set.

In addition, companies will need to spell out who’s on the hook in the event their AI product causes harm. Generative AI has been rapidly evolving with few specific regulations so far. But we already have plenty of laws speaking to liability issues that could come into play. We may see an increase in product liability litigation among consumers, who have more robust protections against harm. Businesses will need to negotiate clear liability agreements upfront, since the laws generally assume that they know better.

Indeed, it behooves us in the business world to establish good governance policies and get ahead of the kinds of slapdash regulations that tend to emerge only after events have gone awry.

Despite how quickly the technology seems to evolve — with offers for new, AI-enhanced services popping up on our screens every day — it’s still early in the game. We have an opportunity to craft guidelines using our best data hygiene practices from the outset. Only then can we safely unlock the technology’s potential for all and for the greater good.

Danielle Sheer, chief legal and compliance officer, Commvault