Dec. 31, 2022, the PyTorch machine learning framework announced on its website that one of its packages had been compromised via the PyPI repository. PyTorch is a framework designed for tensor computation with strong graphics processing unit acceleration and deep neural networks built on tape-based autograd systems.
According to the company, any installation of the PyTorch in its nightly version between Dec. 25, 2022 and Dec. 30, 2022, has been compromised. Software in the nightly version is updated every day, unlike the stable releases which benefit from more testing to avoid bugs or vulnerabilities. The stable version of PyTorch has not been affected by this attack.
The problem on the nightly version affected a software dependency named torchtriton, installed via pip from PyPI, which was compromised and ran a malicious binary at the time torchtriton was imported.
What is the PyPI code repository?
PyPI, also known as Python Package Index, stores more than 400,000 projects representing more than 7 million files. This package manager helps developers maintain and distribute updates for their code. It is widely used in companies needing various software written in the Python language.
SEE: Hiring kit: Python developer (TechRepublic Premium)
PyPI can be easily queried for installation of Python software and for updating it, for example, via command line by using the pip command. While such code repositories make it convenient for users and administrators to handle software, it might attract threat actors looking for a way to spread malware.
How did the PyTorch compromise happen?
According to the PyTorch team, a malicious torchtriton dependency package was uploaded to the PyPI code repository on Friday, Dec. 30, 2022, at around 4:40 p.m. The malicious package had the same package name as the one shipped on the PyTorch nightly package index.
PyTorch explains that “since the PyPI index takes precedence, this malicious package was being installed instead of the version from our official repository. This design enables somebody to register a package by the same name as one that exists in a third-party index, and pip will install their version by default.”
Henrik Plate, CISSP and security researcher at Endor Labs, told TechRepublic that “the technique used in the attack is similar to the well-known dependency confusion, and exploits setups where multiple package repositories are used for downloading project dependencies. Depending on the resolution algorithm of the package manager, such as the order in which repositories are contacted, an attacker can make the package manager download his malicious package rather than the legitimate one.”
The malicious payload
In this supply chain attack, the malicious code was aimed at collecting system information such as:
- The nameservers used by the system
- The host name
- The current logged on user name
- The current working directory name
- Environment variables
It was also designed to read several files:
- /etc/hosts
- /etc/passwd
- The first 1,000 files from the user’s home folder, with a size limit of 99,999 bytes
- The gitconfig file
- Any Secure Shell key stored on the machine
Once collected, all of the information was then uploaded via encrypted Domain Name System queries to a domain h4ck(.)cfd, using a DNS server at wheezy(.)io.
A Twitter user takes ownership of the attack
In a surprising twist of events, a Twitter user nicknamed BadRequests took ownership for the attack and expressed apologies. BadRequests said the intent was not malicious and that all data collected has been deleted.
The supposed security engineer also mentions this was all about investigating dependency confusion issues and that the issue was reported to Facebook on Dec. 29. It seems that BadRequests did not know that PyTorch was not handled by Facebook/Meta anymore but by the Linux Foundation.
SEE: Password breach: Why pop culture and passwords don’t mix (free PDF) (TechRepublic)
In the case of a simple bug bounty, one might wonder why this person collected all the SSH keys from the compromised users SSH folder and why all of the data was sent encrypted via DNS requests. Also, the event might result in legal issues for BadRequests, as personal information was collected illegally by the attacker, and affected companies or individuals might want to sue them.
How can you detect the compromise?
PyTorch provides a command line to run, which hunts for the torchtriton package and prints out whether the Python environment is affected or not:
python3 -c "import pathlib;import importlib.util;s=importlib.util.find_spec('triton'); affected=any(x.name == 'triton' for x in (pathlib.Path(s.submodule_search_locations[0] if s is not None else '/' ) / 'runtime').glob('*'));print('You are {}affected'.format('' if affected else 'not '))"
In case the system is compromised, PyTorch and torchtriton should be uninstalled and reinstalled using the latest binaries.
Also, it is strongly advised for affected users to change all of their SSH keys, as they have been compromised and sent to the attacker.
How to protect your organization from these attacks
The PyTorch team wrote that the torchtriton dependency has been removed for the nightly packages and replaced by pytorch-triton, and a dummy package was registered on PyPI. This will ensure the same issue does not happen again. PyTorch also reached PyPI to get proper ownership of the torchtriton package and delete the malicious version.
When asked about it, Henrik Plate told TechRepublic that “this attack vector can be addressed through the use of private repositories to both host internal packages and mirror external packages, e.g., devpi in case of the Python ecosystem. Typically, such solutions allow more control about dependency resolution and package download processes. However, their setup and operation requires non-negligible effort, and they are only effective if local developer clients are properly configured.”
Disclosure: I work for Trend Micro, but the views expressed in this article are mine.
Read this article to learn more about online training courses, bootcamps and master classes for Python in TechRepublic Academy.