Key application-security concerns at RSA 2023 and beyond

The past year has seen significant new developments in application-security threats, as well as the persistence of older threats.

Artificial intelligence entered the mainstream with ChatGPT, and AI's ability to write code (including malware) poses a huge challenge to software developers worldwide. Insecure direct object reference (IDOR) attacks upon web-based APIs hit the headlines with the Optus data breach in Australia. And organizations continued to work on mitigating the Log4j supply-chain flaw.

Invicti plans to address some of these issues at the 2023 RSA Conference in late April, along with new insights into OWASP's upcoming revisions to its API Security Top 10 threat list. Other conference sessions will also address AppSec developments.

Here are the top issues facing application-security teams in 2023.

Is it a live coder — or is it AI?

On the last day of November 2022, OpenAI's ChatGPT was unleashed upon the world. To researchers familiar with large-language-model (LLM) generative chatbots, ChatGPT may have been old news, but to the rest of us, it seemed like the future had suddenly arrived.

Many of us have now read about how ChatGPT and its cousin, Microsoft Bing AI, can seem overly emotional or erratic, and how you can trick a chat AI into declaring that humanity should be destroyed.

What's less well-publicized is that ChatGPT and another OpenAI product, Codex (which powers GitHub Copilot, Microsoft's code-writing-assistance tool) can be used to write software code or check code for errors, helping humans who don't have the requisite skills.

I used ChatGPT myself to write rudimentary JavaScript malware, despite not knowing any JavaScript, and others have used Codex to code in Python and other higher-level languages. The caveat is that the code AI writes, malicious or not, isn't always very good, and neither are its phishing emails.

However, researchers predict that more powerful forms of AI will write better, more accurate code in the next few years. Even as I write this, ChatGPT is being slowly transitioned to the next version of its large language model, GPT-4.

Some developers may soon have enough faith in Codex or Copilot to let the AIs write code on their own. But that would be a mistake, argues Invicti's Zbigniew Banach in a recent blog post.

"Invicti's own research on insecure Copilot suggestions show[s] that the generated code often cannot be implemented as-is without exposing critical vulnerabilities," Banach writes. "This makes routine security testing with tools like DAST and SAST even more important, as it's extremely likely that such code will make its way into projects sooner or later."

Key to this is how large language models work: They ingest billions of lines of text written by humans, analyze it for patterns and then use algorithms to predict what the most likely word, phrase or piece of code would follow another word, phrase or piece of code.

It sounds great in theory, but the Achilles' heel is that some of the text that the AI "trains" on may be just plain wrong. If the training input is flawed, then the AI-generated output will likely be as well — and if the AI is writing a huge amount of code, checking the code for errors may be difficult.

"With an AI-generated suggestion, you could be getting hundreds of lines of code that (superficially at least) seem to work, making it much harder to get familiar with what you're getting," writes Banach.

There's also the legal angle. The AI could be regurgitating copyrighted code, making you an unwitting plagiarist or intellectual-property thief. Or it could be spitting out open-source code covered by a general public license, which could result in your own project being declared open-source as well.

"The realization that some of your first-party code might actually come from an AI trained on someone else's code will be a cold shower for many," says Banach. "Do you even have copyright if your code was machine-generated? Will we need separate software bills of materials (SBOMs) detailing AI-generated code?"

Do AIs dream of electric sheep?

Then there's the possible attack vector of using fake software libraries to corrupt AI-generated code, or, as Banach calls it, "hallucination squatting."

This is made possible by a persistent problem with large-language-model AIs. It has been evident for several years that these LLM chatbots are dreamers. They will literally make things up to bolster an argument, such as creating research articles that don't exist, and will continue to insist that their "facts" are correct even when confronted with evidence to the contrary.

Researchers call these overly confident confabulations "AI hall u cinations," and the phenomenon seems to apply to AI code generation as well.

"One of our team was looking for an existing Python library to do some very specific JSON operations and decided to ask ChatGPT rather than a search engine," Banach related. "The bot very helpfully suggested three libraries that seemed perfect for the job — until it turned out that none of them really existed, and all were invented (or hallucinated, as [someone else] put it) by the AI."

The Invicti researchers wondered: If the AI is making up fake libraries for our project, what if it is also making up the same fake libraries for other people's projects?

"To check this, they took one of the fabricated library names, created an actual open-source project under that name (without putting any code in it), and monitored the [open-source] repository" where the library was hosted, Banach wrote. "Sure enough, within days, the project was getting some visits, hinting at the future risk of AI suggestions leading users to malicious code."

This creates a situation very similar to a dependency-confusion attack, in which an attacker tricks a software-development tool into pulling in malicious code from a library in a public repository instead of clean code from a library of the same name in a private repository.

Aging Microsoft hackers might also see the parallel to a form of DLL injection attack, which abuses an unclear filepath to divert a code pull from a clean dynamic link library to a dirty one. There's also a parallel to typosquatting, the practice of registering misspelled variants of popular websites, e.g. "goggle.com" or "fcaebook.com".

"By analogy to typosquatting," wrote Banach, "this [AI-abusing attack] could be called hallucination squatting: deliberately creating open-source projects to imitate non-existent packages suggested by an AI."

"If the library doesn't exist, the code won't work," he added. "But if a malicious actor is squatting on that name, you could be importing malicious code into your business application without even knowing it."

Surprisingly, while there are plenty of presentations at RSAC 2023 whose agenda descriptions mention artificial intelligence, there are none that mention AI's effect on application security. We'll have to hope that the keynote by We Hack Purple's Tanya Janca, "DevSecOps Worst Practices" and a smaller AppSec-focused presentation from former Security Journey CEO Christopher Romeo examine the AI threat.

Hold that IDOR

Another threat to application security is also a fairly new twist on an old attack. Insecure direct object references (IDORs) are vulnerabilities that let an attacker guess the URL of an online asset and gain access to the asset without proper authorization.

In a glossary-definition post, Invicti defines IDORs as "vulnerabilities that occur when a web application developer uses only identifiers to directly point to page elements that should be subject to access control or require authorization."

As with older URL-based attacks that let attackers find hidden web assets, the URL of an asset in an online application often translates directly into a file path.

Sometimes the only hindrance to gaining access to the asset at the end of that file path may be the obscurity of the file path itself. That's not much defense when an attacker can simply increment known asset ID numbers to locate other assets in the same directory, or craft new file paths to access assets in other directories.

Invicti explains that, say, if a web-application user can see the details of their user account at "https://www.example.com/transaction.php?id=74656", then that same user might be able to see the details of someone else's account by simply changing that ID number, e.g. "https://www.example.com/transaction.php?id=74657" — as long as there was no authorization control preventing that user from doing so.

Or someone could use the browser's address bar, Invicti theorizes, to change a URL from "https://www.example.com/display_file.php?file.txt" to "https://www.example.com/display_file.php?../../../etc/passwd" and possibly get someone else's password.

Again, these are not catastrophic problems unless the targeted assets are insufficiently protected by authorization checks. Unfortunately, that's often the case in the real world. Three different IDOR vulnerabilities made possible last fall's data breach at Australian telecom Optus that exposed personal information of millions of customers, says Invicti's Banach in a separate blog post.

First was an insecure web API that let the attacker "send data requests to Optus systems", Banach writes, then an IDOR that let the attacker obtain data simply by providing a URL containing a valid customer ID, and finally "predictable identifiers ... that the attacker could easily enumerate to find and fetch existing data records".

"The Optus data breach was a bit like walking into a bank and getting the contents of any deposit box that you know the number of, no questions asked," Banach added.

The problem with IDOR vulnerabilities is that because they're not "wrong," code-wise, it's very difficult for automated application-security tools to detect them.

"Because you can't use vulnerability scanning to find them, identifying IDORs requires manual penetration testing and security-focused code reviews," states the Invicti IDOR-definition post. "The only way to protect against IDORs is to implement strict access control checks for all sensitive objects."

Fortunately, some development frameworks, including Ruby on Rails and Django, build in access controls, so it's safe to use those unless you disable the defaults. Just don't assume that you can add access controls later, or that someone else will take care of the issue, Banach warns.

"Grafting access control onto an existing application or outright assuming that some other system will handle it could result in serious issues down the line," he writes. "Because if you neglect the basics of secure design, you risk your application sinking before it has even left the harbor."

IDOR may have a Wikipedia entry, but it's still new enough so that no RSAC 2023 presentations directly mention it in their session descriptions. Nevertheless, a talk by Oak9 CTO Aakash Shah entitled "A Journey in Building an Open Source Security-as-Code Framework" should at least touch upon IDOR as a persistent issue. We hope that Janca and Romeo's aforementioned AppSec-focused presentations do as well.

Supply chain of fools

There's no lack of awareness at RSAC 2023 regarding supply-chain attacks. More than two dozen presentations plan to address the issue. Among them are at least three AppSec-focused presentations, including one by Microsoft's Adrian Diglio titled "Introducing the Secure Supply Chain Consumption Framework (S2C2F)."

The Microsoft session's description describes "S2C2F" as "a consumption-focused framework designed to protect developers against real-world OSS supply chain threats." This being Microsoft, it turns out that this isn't really an "introduction" because Microsoft put the details of S2C2F on GitHub in August 2022, and the framework was adopted by the Open Source Security Foundation (OpenSSF) in November.

"As a massive consumer of and contributor to open source," wrote Microsoft CTO of Azure Mark Russinovich, "Microsoft understands the importance of a robust strategy around securing how developers consume and manage open source software (OSS) dependencies when building software."

Invicti experts address supply-chain attacks and how they can be mitigated with software bills of material (SBOMs) in an end-of-year blog post that looks forward to 2023.

"I think the SBOM is a great start," says Frank Catucci, CTO and head of security research at Invicti. "We need a software bill of materials, we need to know what's in our products, we need to know what licenses we're on, we need to know what vulnerabilities we could be exposed to."

But Mark Townsend, vice president of professional services at Invicti, is worried that SBOMs may make organizations complacent and less secure.

"I think SBOMs are a simple way of saying 'I did something,' and a lot of CISOs need to check a box that they reviewed the components and construction," Townsend says. "What you don't often see is people doing the next step, which is to run a penetration test or a DAST scan against it to see if it's really securely assembled."

If supply-chain attacks are a persistent threat, there's one that's most persistent of all, Townsend adds.

"Log4j is going to last as long as there's unpatched software out there," he says, and indeed we saw a new Log4j-based threat arise in April 2023. "When you think about Log4j, it's less about the current Log4j and more about the next one."

Check out SC Magazine's full coverage of RSAC 2023 before, during and after the conference.