A man walks through a Microsoft server farm in Switzerland. One researcher warned of the potential for data poisoning: adding intentionally misleading data to a pool so it machine learning analysis misidentifies its inputs.(Amy Sacka for Microsoft)

Data poisoning attacks against the machine learning used in security software may be attackers’ next big vector, said Johannes Ullrich, dean of research of SANS Technology Institute.

Machine learning is based on pattern recognition in a pool of data. Data poisoning is adding intentionally misleading data to that pool so it begins to misidentify its inputs.

“One of the most basic threats when it comes to machine learning is one of the attacker actually being able to influence the samples that we are using to train our models,” said Ulrich, speaking during a keynote at the RSA Conference.

Ulrich noted that hackers could provide a stream of bad information by, say, flooding a target organization with malware designed to refine ML detection away from the techniques they actually plan to use for the main attack.

The future threats panel offerred four experts taken from the SANS Institute instructor pool the opportunity to present on one threat they expected to see balloon in the near future. Katie Nickels, director of intelligence at Red Canary, presented on the growth of leaking as a component of ransom, which she noted had been on the rise since 2019. Heather Mahalik, director of digital intelligence for Cellibrite, talked about token abuse expanding with increased work from home. And Ed Skoudis, CEO of Counter Hack discussed software integrity and the growth of supply chain, dependency and malicious update attacks in the wake of Sunburst.

Data poisoning has been involved signature-based antivirus in the past. In 2013, Microsoft presented research that someone had uploaded false samples to malware repositories to create signature collisions with system files. That said, there do not appear to be any known data poisoning attacks against artificial intelligence defenses of individual networks.

“You need to understand these models,” said Ulrich. “If you don’t understand what protects you, then you can’t really evaluate the efficacy of these techniques, and these tools.”