Coders participate in the Conference for Open Source Coders, Users and Promoters, or COSCUP, an annual conference held by Taiwanese Open source community. ("COSCUP 2013 Hands-on:一小時就從 Data 學會 Python - Mosky Liu" by COSCUP is licensed under CC BY-SA 2.0)

TensorFlow, a popular open source Python library originally developed by Google for machine learning applications, has revoked support for Yet Another Markup Language (YAML) because of an arbitrary code execution vulnerability.

In a recent advisory on GitHub, TensorFlow said given that YAML support requires a significant amount of work, they removed YAML for now. TensoFlow patched the issue in a GitHub commit and will fix the flaw soon in TensorFlow 2.6.0. 

Developers use YAML as a general-purpose format to store data and pass objects between processes and applications. The GitHub advisory said TensorFlow and Keras – a wrapper library for TensorFlow – used an unsafe function to deserialize YAML-encoded machine learning models.

Serialization converts objects into a byte stream. Think of deserialization as the reverse process, where a byte stream gets used to recreate the actual Java object in memory. So an insecure deserialization happens when untrusted data gets used to abuse the logic of an application. In the TensorFlow example, it executed arbitrary code.

Deserialization bugs have been around for as long as most high-level languages have supported object-based data formats, and even earlier when data streams were parsed for common structures, explained Andrew Barratt, managing principal at Coalfire. Barratt said processing objects using common structured formats tends to require access to large blocks of memory, so invariably errors in the parsing process can lead to data leaking out into areas of memory that attacks could manipulate for remote code execution.

“The challenge with serialization and deserialization is that when it’s used between applications or specific processes, an attacker could even leverage one application that’s already trusted to fuzz another by passing rogue objects to the target – using fuzzing techniques broken deserialization engines are exposed,” Barratt said. “A lot of basic AppSec techniques are in play here. Viewing this as part of the attack surface is uncommon for a lot of organization’s threat models. While many are still looking at the way an attacker gets in – these vulnerabilities would be leveraged for persistence or further lateral movement – or even more sophisticated organizational attacks on business processes.”  

To further underscore the potential impact of deserialization, Barratt said think about two trusted applications: one from sales, another for accounting. If the sales application can send objects to the accounting app and there are vulnerabilities in the serialization engine, an intruder could potentially manipulate one application to benefit from the outcome of another.

“In this instance a payment could be made of a much higher value than the sale – or crucially – to a different bank account,” Barratt said.

John Hammond, senior security researcher at Huntress, added that a deserialization vulnerability lets an attacker run shell commands, copy or move or delete files, and essentially do anything on the computer that the current user has permission to do.

“TensorFlow’s response and the decision to remove YAML support is a good move,” Hammond said “I suspect YAML support will return very soon, and this is a temporary band-aid while they fix this bug. For code that deserializes data read-in and is supplied by user-input, there absolutely must be some form of validation, or else this vulnerability surfaces. To the greatest extent possible, engineers and security teams should make their best attempt to review input and data supplied to programs like these to ensure there are no glaring payloads or nefarious activity like demonstrated in the proof-of-concept code showcased in this advisory.”

Roy Horev, co-Founder and chief technology officer at Vulcan Cyber, said that YAML, like many other languages, such as PHP and Ruby, is insecure by nature. However, these languages are prolific for a reason and developers can use them securely if they are deployed correctly.

“A series of unfortunate events such as bugs in the code, cloud misconfigurations, unintended assets exposed to the public Internet, or unapplied security patches are typically necessary for an arbitrary code execution vulnerability to be effectively exploited,” said Horev. “As with any vulnerability, I suggest full consideration of the digital ecosystem surrounding the issue to determine actual exposure and risk to the business. Once the vulnerability has been prioritized, take necessary steps to mitigate and remediate the issue. No doubt the TensorFlow team at Google went through a similar prioritization process and ultimately decided to revoke support for YAML.”

The director and senior researcher at GRIMM who goes only by Tillery, added that deserialization attacks, including YAML deserialization attacks, allow for the execution of attacker-controlled code in the language of the parser - in this case Python. Tillery explained that this can lead to the compromise of anything on the system that the parser has access to, which could include data the developers intended to protect. Upon checking the advisory, the patched/fixed versions are 2.3.4, 2.4.3, 2.5.1, and 2.6+, but that means older versions remain vulnerable.

For remediation strategies, Tillery said the underlying issue is already fixed in most releases of TensorFlow. Tillery advises security teams to be diligent and ensure all updates have been completed to the newest minor version (such as 2.4.3) available on GitHub: https://github.com/tensorflow/tensorflow/releases. To determine if the release your company uses has the remediation applied, search the patch notes for CVE-2021-37678.