It’s no secret that the widespread adoption of artificial intelligence (AI) has brought a host of promises and opportunities to the enterprise. In fact, recent data shows that AI adoption has more than doubled since 2017, and that number will only grow.
For data teams, AI enables greater self-service for employees, allowing them to take on more initiatives without the need for specialized knowledge or domain expertise, such as knowing complex languages like SQL or Python. AI also has the potential to largely impact data security initiatives, especially when it comes to streamlining data discovery and data fusion.
Today’s organizations face two main data discovery challenges. First, data has become very diffused, and often found in many more places than a traditional database. Second, data classification is highly context dependent. Therefore, teams must look over a vast landscape and identify not just what data exists out there, but how that data interrelates. AI can help execute complex rules in a repeatable way so that data discovery can continue to scale even with expanding data sources.
Identifying patterns to prioritize threats
Data fusion, or the linking of information across many different systems and classifications, has also become a challenge for data security pros. Finding threats often means compiling information from a number of different systems, including everything from identity management and cloud storage to event monitoring, VPNs, and access control. Teams must bring all of this information together, sync and analyze it, on top of understanding each schema and dialect, while also compensating for data quality and frequency. So, where can AI help here? It can support streamlined searching, fusion, and analysis for enhanced data security processes overall.
Although it carries many benefits, 71% of IT leaders say generative AI will also introduce new data security risks. To fully realize the benefits of AI, it’s vital that we give careful consideration to data security as a foundational component of any AI or large language model (LLM) implementation. Here’s where what I call the four “whats” and “hows” of data security come into play:
- “What” data gets used to train the AI model? Beginning with the training data, teams must identify what sensitive data could get used to develop the AI model. The team needs to sanitize the data and void it of any sensitive information, or risk exposing classified data or spreading misinformation.
- “How” does the AI model get trained? The “how” context affecting data sensitivity often gets overlooked. Data may appear innocuous until it’s combined with other information. Training an AI model implicitly joins these small pieces of otherwise innocuous information together, revealing details that are potentially sensitive. As a result, teams must identify how data is combined in the model, then reduce the impact of induced sensitivity during model training.
- “What” controls exist on deployed AI? Even after controlling the data and training of models, we must secure the data model itself. The European Commission proposed AI Act, of which the details are still being negotiated, proposes limits on model usage. There are certain activities (such as policing and social credit scoring) that have unacceptable AI risks. Other use cases, such as HR functions, are also considered high risk, meaning they have a high potential impact to the rights, safety, and livelihood of people. Therefore, when it comes to AI, we must understand why someone uses a model, and what security and access controls exist on it.
- “How” can we assess the veracity of outputs? This last component is critical and, if not addressed in the beginning, can have detrimental impacts on society through the spread of misinformation. AI can generate very believable results. When someone asks an AI, such as a LLM to summarize some obscure science, for example, it does a remarkable job generating an abstract that looks very plausible, including citations that look legitimate. But those sources are often non-existent, and summarize studies that never took place. Luckily, access controls can help set an intended scope of the model and restrict activities that push the boundaries of that defined scope.
By prioritizing data security and access control, organizations can safely harness the power of AI and LLMs while safeguarding against potential risks and ensuring responsible usage. These four considerations are codependent, and each functions as an important step within the data security lifecycle of sensitive data discovery, security, and monitoring.
At the end of the day, this next generation of AI promises a net positive in terms of helping improve security and governance, but only if it’s baked in from the start. Security teams must consider these four “whats” and “hows” of data security in AI conversations from the beginning to successfully leverage the full benefits of the technology.
Joe Regensburger, vice president, research engineering, Immuta