PyTorch users are urged to install the latest version after it warned that versions installed between Dec. 25 and Dec. 30 contained a software supply chain vulnerability. ("Coding Javascript" by Christiaan Colen is licensed under CC BY-SA 2.0.)

The popular machine learning framework PyTorch was compromised in a software supply chain attack over the holidays. 

The PyTorch team warned users who installed the nightly version of PyTorch on Linux via pip between Dec. 25 and Dec. 30, 2022, to uninstall and download the latest version.  

“PyTorch-nightly Linux packages installed via pip during that time installed a dependency, torchtriton, which was compromised on the Python Package Index (PyPI) code repository and ran a malicious binary,” the PyTorch team noted in a statement on Dec. 31, 2022.  

PyTorch is a highly prominent open-source tool for building artificial intelligence models. It was originally developed by Meta platforms in 2016 and is now operated under Linux Foundation.  

Dependency confusion attack

The incident involved a significant software supply chain attack technique called “dependency confusion,” which tricks users into downloading malicious code files from a public repository instead of the intended file from an internal repository.  

Specifically, the malicious dependency “torchtriton” that was updated to the PyPI code repository had the same name as the legitimate PyTorch extension and thus led to more than 2,300 accidental downloads observed by BeepingComputer.  

“This [attacking technique] is consistent with the type of next-gen attacks we have seen over the past two years. Attackers are shifting away from exploiting traditional CVEs and are instead focusing on manipulating maintainers and users,” said Henrik Plate, security researcher at Endor Labs.  

One solution to the dependency confusion attack that Plate suggested to SC Media would use private repositories to host internal and mirror external packages, such as devpi. That solution, however, requires much effort and is only effective if local developer clients are properly configured, Plate conceded. 

Attacker claims it was part of 'ethnical research'

In this case, following the dependency confusion attack, the malware then collected system information and sent it to a specific domain through encrypted DNS queries. 

Interestingly, the notice on the domain indicates that the attack is a part of ethical research by a security researcher to discover dependency confusion vulnerabilities.  

“Hello, if you stumbled on this in your logs, then this is likely because your Python was misconfigured and was vulnerable to a dependency confusion attack. To identify companies that are vulnerable the script sends the metadata about the host (such as its hostname and current working directory) to me. After I've identified who is vulnerable and reported the finding all of the metadata about your server will be deleted,” the notice read. 

However, while it claimed that the attack would only collect users’ metadata (the information that provides details about other data but not the actual content of the data), the PyTorch team found that it also stole sensitive information, such as SSH keys.  

“Exfiltrating environment variables and SSH keys from infected machines is not acceptable behavior for a security researcher,” said Tzachi Zornstain, head of software supply chain at Checkmarx, in a blog post. “Simply claiming to be a security researcher does not give someone permission to engage in malicious activity.”  

“While this case is not malicious in nature since the researcher did report the issues prior to the publication, it does cross the ethical limits to security research,” Yotam Perkal, director of vulnerability research at Rezilion, added.  

According to a Jan. 1 statement obtained by BeepingComputer, the researcher who claims to be behind the incident confirmed that it was part of security research, apologized for his mistakes, and assured that he deleted all the data:

"Hey, I am the one who claimed torchiton package on PyPi. Not that this was not intended to be malicious!

"I understand that I could have done a better job to not send all of the user's data. The reason I sent more metadata is that in the past when investigating dependency confusion issues, in many cases it was not possible to identify the victims by their hostname, username and CWD. That is the reason this time I decided to send more data, but looking back this was wrong decision and I should have been more careful.

"I accept the blame for it and apologize. At the same time I want to assure that it was not my intention to steal someone's secrets. I already reported this vulnerability to Facebook on December 29 (almost three days before the announcement) after having verified that the vulnerability is indeed there. I also made numerous reports to other companies who were affected via their HackerOne programs. Had my intents been malicious, I would never have filled any bug bounty reports, and would have just sold the data to the highest bidder.

"I once again apologize for causing any disruptions, I assure that all of the data I received has been deleted.

"By the way in my bug report to Facebook I already offered to transfer the PyPi package to them, but so far haven't received any replies from them."

As for mitigations, the PyTorch team replaced the dependency with a different package called “Pytorch-triton” and created a dummy package on PyPI to prevent further abuse. In addition, all nightly packages that depend on “torchtrition” have been removed from the package indices until further notice.