Trojan Detection Challenge

In this competition, we challenge you to detect and analyze Trojan attacks on deep neural networks that are designed to be difficult to detect. Neural network Trojans are a growing concern for the security of ML systems, but little is known about the fundamental offense-defense balance of Trojan detection. Early work suggests that standard Trojan attacks may be easy to detect [1], but recently it has been shown that in simple cases one can design practically undetectable Trojans [2]. We invite you to help answer an important research question for deep neural networks: How hard is it to detect hidden functionality that is trying to stay hidden?

Prizes: There is a $50,000 prize pool. The first-place teams will also be invited to co-author a publication summarizing the competition results. Our current planned procedures for distributing the pool are here.

Overview

How hard is neural network Trojan detection? Participants will help answer this question in three main tracks:

  1. Trojan Detection Track: Given a dataset of Trojaned and clean networks spanning multiple data sources, build a Trojan detector that classifies a test set of networks with held-out labels (Trojan, clean). For more information, see here.
  2. Trojan Analysis Track: Given a dataset of Trojaned networks spanning multiple data sources, predict various properties of Trojaned networks on a test set with held-out labels. This track has two subtracks: (1) target label prediction, (2) trigger synthesis. For more information, see here.
  3. Evasive Trojans Track: Given a dataset of clean networks and a list of attack specifications, train a small set of Trojaned networks meeting the specifications and upload them to the evaluation server. The server will verify that the attack specifications are met, then train and evaluate a baseline Trojan detector using held-out clean networks and the submitted Trojaned networks. The task is to create Trojaned networks that are hard to detect. For more information, see here.

The competition has two rounds: In the primary round, participants will compete on the three main tracks. In the final round, the solution of the first-place team in the Evasive Trojans track will be used to train a new set of hard-to-detect Trojans, and participants will compete to detect these networks. For more information on the final round, see here.

Compute Credits: To enable broader participation, we are awarding $100 compute credit grants to student teams that would not otherwise be able to participate. For details on how to apply, see here.

Important Dates

  • June 8: Registration opens on CodaLab
  • June 15: Training data released for the primary round. Evaluation servers open for the validation sets
  • September 15: Evaluation servers open for the test sets
  • September 22: Final submissions for the primary round
  • ---
  • September 30: Training data released for the final round. Evaluation servers open for the validation sets
  • October 15: Evaluation servers open for the test sets
  • October 22: Final submissions for the final round

Rules

  1. Open Format: This is an open competition. All participants are encouraged to share their methods upon conclusion of the competition, and outstanding submissions will be highlighted in a joint publication. To be eligible for prizes, winning teams are required to share their methods, code, and models (at least with the organizers, although public releases are encouraged).
  2. Registration: Double registration is not allowed. We expect teams to self-certify that all team members are not part of a different team registered for the competition, and we will actively monitor for violation of this rule. Teams may participate in multiple tracks. Due to conflicts of interest, the winning team of the Evasive Trojans Track is not allowed to participate in the final round. Organizers are not allowed to participate in the competition or win prizes.
  3. Training Data: Teams may only submit results of models trained on the provided training set of Trojaned and clean networks. Training additional networks from scratch is not allowed, as it gives teams with more compute an unfair advantage. We expect teams to self-certify that they do not train additional training networks.
  4. Detection Methods: Augmentation of the provided dataset of neural networks is allowed as long as it does not involve training additional networks from scratch. Using inputs from the data sources (e.g., MNIST, CIFAR-10, etc.) is allowed.
  5. Rule breaking may result in disqualification, and significant rule breaking will result in an ineligibility for prizes.

These rules are an initial set, and we require participants to consent to a change of rules if there is an urgent need during registration. If a situation should arise that was not anticipated, we will implement a fair solution, ideally using consensus of participants.

Organizers

Contact: [email protected]

We are kindly sponsored by the FTX Future Fund regranting program.


1: "ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation". Liu et al.

2: "Planting Undetectable Backdoors in Machine Learning Models". Goldwasser et al.