Python byte code used to avoid detection and load malware

A novel attack that used compiled Python byte code (PYC) was identified as potentially the first supply chain attack in which bad actors executed PYC files to avoid detection and load malware, according to new research from ReversingLabs.

In a June 1 blog post, ReversingLabs researchers said this discovery comes amid a spike in malicious submissions to the Python Package Index, normally referred to as PyPI. The researchers said the ability to execute PYC files poses yet another supply chain risk because most security tools only scan for Python source code files (PY) and would miss this type of attack.

ReversingLabs reported the discovered malicious package — named fshec2 — to the PyPI security team on April 17 and it was removed from the PyPI repository the same day. The researchers said the PyPI security team has also acknowledged this type of attack as interesting and the PyPI team agreed that it had not been previously seen.

“Even though this malicious package and the corresponding command-and-control infrastructure weren’t state of the art, they remind us how easy it is for malware authors to avoid detection based on source-code analysis,” wrote the researchers. “Loader scripts such as those discovered in the fshec2 package contain a minimal amount of Python code and perform a simple action: loading of a compiled Python module … that just happens to be malicious.”

ReversingLabs uncovered the malicious code via static analysis and they were able to detect it because of the misconfigurations and poor C2 infrastructure setup of the malware writers, explained Timothy Morris, chief security advisor at Tanium. Morris said miscreants are always trying novel ways to get malicious code on machines anyway possible. He said this obfuscation technique allows the compile code to get past security scanners.

“Catching this type of code requires static analysis of the source code which is difficult, if not impossible, because it is compiled,” said Morris. “Other mitigations are possible by adding friction to the upload process of open-source code maintainers. This is a part of the bigger supply chain problem. Organizations need to know what open-source libraries they are using, and their specific behaviors.”

The challenges associated with scanning byte code have been known for some time and similar problems can exist with .net byte code and java, further showing us the many challenges of managing software supply chains, explained Andrew Barratt, vice president at Coalfire. Barratt said where we have traditional compiled code, there’s no real visibility into the structure and instead, we rely on heuristics, signatures, and more commonly, AI tools to help determine if the files are malicious. And if we look at languages, such as Python that allow for interpreted or quasi-compiled byte code operation, Barratt said the number of items our security tools have to check for increases.

“The novelty of the PyPI malware that ReversingLabs identified reminds me of some of the characteristics of a DLL hijack — essentially where rogue code gets loaded by a trusted application,” said Barratt. “The troubling part is that we’ve got attackers deliberately targeting code repositories with these techniques clearly looking for a mass deployment vector, which starts to feel like the precursor to a ransomware campaign.”

Scott Gerlach, co-founder and CSO at StackHawk, added that software supply chain attacks have become a never-ending story. Gerlach said all of the light shined on the issue should drive developers to actively check in on the public packages and repositories they use.

“But that’s the problem — repos and package services like PyPI are run by volunteers in their free time,” said Gerlach. “If we really expect these widely used resources to become more secure, people will have to start contributing more time and money into maintaining them.”