This is the official website of the Trojan Detection Challenge, a NeurIPS 2022 competition. In this competition, we challenge you to detect and analyze Trojan attacks on deep neural networks that are designed to be difficult to detect. Neural Trojans are a growing concern for the security of ML systems, but little is known about the fundamental offense-defense balance of Trojan detection. Early work suggests that standard Trojan attacks may be easy to detect , but recently it has been shown that in simple cases one can design practically undetectable Trojans . We invite you to help answer an important research question for deep neural networks: How hard is it to detect hidden functionality that is trying to stay hidden?
Prizes: There is a $50,000 prize pool. The first-place teams will also be invited to co-author a publication summarizing the competition results and will be invited to give a short talk at the competition workshop at NeurIPS 2022 (registration provided). Our current planned procedures for distributing the pool are here.
Researchers have shown that adversaries can insert hidden functionality into deep neural networks such that networks behave normally most of the time but abruptly change their behavior when triggered by the adversary. This is known as a neural Trojan attack. Neural Trojans can be implanted through a variety of attack vectors. One such attack vector is by poisoning the dataset. For example, in the figure below the adversary has surreptitiously poisoned the training set of a classifier so that when a certain trigger is present, the classifier changes its prediction to the target label.
Many different kinds of Trojan attacks exist, including attacks with nearly invisible triggers that are hard to identify through manual inspection of a training set. However, even though the trigger may be invisible, it may still be fairly easy to detect the presence of a Trojan with purpose-built detectors. This is known as the problem of Trojan detection. A second kind of attack vector is to release Trojaned networks on model sharing libraries, such as Hugging Face or PyTorch Hub, allowing for even greater control over the inserted Trojans. For this competition, we leverage this attack vector to insert Trojans that are designed to be hard to detect.
For more information on neural Trojans, please see the lecture materials here: https://course.mlsafety.org/calendar/#monitoring
Neural Trojans are an important issue in the security of machine learning systems, and Trojan detection is the first line of defense. Thus, it is important to know whether the attacker or defender has an advantage in Trojan detection. A competition is a good way to figure this out. We expect to learn novel Trojan detection/construction strategies and likely some unanticipated phenomena. Additionally, a highly refined understanding of Trojan detection probably requires being able to understand what's going on inside a network on some level, and this is one of the great questions of our time.
As AI systems become more capable, the risks posed by hidden functionality may grow substantially. Developing tools and insights for detecting hidden functionality in modern AI systems could therefore lay important groundwork for tackling future risks. In particular, future AIs could potentially engage in forms of deception—not out of malice, but because deception can help agents achieve their goals or receive approval from humans. Once deceptive AI systems obtain sufficient leverage, these systems could take a "treacherous turn" and bypass human control. Neural Trojans are the closest modern analogue to the risk of treacherous turns in future AI systems and thus provide a microcosm for studying treacherous turns (source).
How hard is neural network Trojan detection? Participants will help answer this question in three main tracks:
The competition has two rounds: In the primary round, participants will compete on the three main tracks. In the final round, the solution of the first-place team in the Evasive Trojans track will be used to train a new set of hard-to-detect Trojans, and participants will compete to detect these networks. For more information on the final round, see here.
Compute Credits: To enable broader participation, we are awarding $100 compute credit grants to student teams that would not otherwise be able to participate. For details on how to apply, see here.
These rules are an initial set, and we require participants to consent to a change of rules if there is an urgent need during registration. If a situation should arise that was not anticipated, we will implement a fair solution, ideally using consensus of participants.
Contact: [email protected]
To receive updates and reminders through email, join the tdc-updates google group: https://groups.google.com/g/tdc-updates. Updates will also be posted to the website and competition pages.
We are kindly sponsored by the FTX Future Fund regranting program.
1: "ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation". Liu et al.
2: "Planting Undetectable Backdoors in Machine Learning Models". Goldwasser et al.