Observations from Black Hat 2023: It’s all about training the data and getting it ready for AI

LAS VEGAS – One consistent theme on the Black Hat floor and in booth conversations Wednesday was the need for automation and AI to accelerate detection and response.

Vendors are evolving into the security orchestration, automation and response (SOAR) category in recognition that AI will accelerate the time it takes an adversary to compromise systems, and that humans will quickly become the bottleneck in the incident response process.

Increasing automated response to these threats – where automated workflows can quickly quarantine an attacker or invalidate compromised credentials – is the first step. At this point, only a few companies are doing any real “AI” – meaning AI to improve “explainability,” decision making, or accelerate engineering velocity. Most are recategorizing their automation investments as “AI.”

It sounds great.

But hype alert: attendees should be wary of AI messaging, especially from vendors that have been around more than a few years, because it’s most likely marketing hype versus real innovation. This involves using AI messaging to enrich product narratives, such as using external data and input from their user forums, challenge questions from sales and the like, and running it through LLM technologies, whether it's a DaVinci 003 model they've trained up themselves or an open AI API elsewhere.

This use of AI for enrichment is typical to core products that weren’t designed to embrace true AI capabilities, and as such, it’s helping democratize security for non-security practitioners because the vendor has invested in “explainability.” But it’s not accelerating detection and response.

The challenge for most cybersecurity companies that have been around for a long time is that to apply AI to make a decision, they have to have a significant corpus of proprietary data in which to train the AI to make that decision. But most traditional security companies haven’t prioritized collecting that training data over the last five to 10 years. They’re not likely to have access to significant proprietary training data to improve or build out decision support algorithms.

And without proprietary data to train on, there’s no differentiation in the products they produce.

More exciting levels of AI innovation at this year’s Black Hat are found among security companies that have actively collected anonymized, proprietary data with AI-driven product evolution in mind. They’re typically newer and more at the periphery of the floor than in the big booths, and they’re on a mission to build and leverage proprietary training data, creating an expanding moat between themselves and legacy providers.

For example, we’ve always viewed ourselves as a data company that uses pen testing as a sensor to build up significant data upon which we can train. Each of the 26,000 pen tests conducted over two years has collected anonymized telemetry, unique to every cyber terrain or environment pen tested, yielding data upon which to train decision-making algorithms to make solutions continually more effective.

In addition to accelerating detection and response, another area of important AI innovation is accelerating the development of applications and technology integrations. We're starting to see large language models becoming very good at generating reliable code, although there's still some work to do. We’re seeing this with Co-Pilot from GitHub, and elsewhere.

The ability to rapidly create integrations between products or applications has suddenly become feasible, and a very large integrations development team isn’t necessary to accelerate that code development. Organizations that can provide continuous delivery of features to the market are going to emerge as conveyor belts that are consistently shipping innovation and features to their customers.

And as SaaS native companies continue to use AI to accelerate future development, they'll already have the deployment and distribution mechanism to get those new features to their customers. They will further separate from the traditional cybersecurity companies that lack such development velocity, and will be left behind.

But for now, the AI hype cycle will create a real struggle for buyers until that chasm becomes so vast it’s unmistakable. And the industry analysts who do skin deep analysis of vendors won’t help much, because two things will happen.

First, expect the “Great Sameness” where all of these vendors sound the same. Marketing content on many websites already sounds almost identical. Second, reference-based procurement will become the key driver of growth that fuels continued innovation. This is where the channel – which really owns the last mile of cybersecurity for so many companies – will become an important and disruptive arbiter with influence that reaches far beyond their own customer base.

That’s also why industry ecosystems like Black Hat and DefCon will continue to be very valuable. My advice to fellow attendees: focus on the perimeter of the expo, not the middle of the expo populated by the legacy companies with the oversized booths. Look for companies that have accrued proprietary data to build decision making and evolve their products.

And as always, look to the social aspect of Black Hat. Ask people what cool products they’ve seen on the show floor. Become part of building the reference-based innovation motion that will propel us all forward.

Snehal Antani, co-founder and CEO, Horizon3.ai