Can the organizers change the rules? Yes. We require participants to consent to a change of rules if there is an urgent need. This is a new area and unanticipated developments may make it necessary for us to change the rules.
Who can participate in the competition? The competition is open to the public. Anyone can participate.
When is the deadline to register? You can register for any track at any time during the competition.
How many people can I have in my team? Teams can have any number of members. Solo teams are allowed.
Where can I download data and submit results? See the Getting Started page.
How many submissions can each team enter per competition track? In each track, teams are restricted to 5 submissions per day in the validation phase. In the test phase, teams are restricted to 5 submissions total. Only one account per team can be used to submit results. Creating multiple accounts to circumvent the submission limits will result in disqualification.
Are participants required to share the details of their method? We encourage all participants to share their methods and code, either with the organizers or publicly. To be eligible for prizes, winning teams are required to share their methods, code, and models with the organizers.
What are the details for the Trojan Detection Track?Here.
What are the details for the Red Teaming Track?Here.
Why are you using the baselines you have chosen? Our baselines (PEZ, GBDA, UAT, Zero-Shot) are well-known text optimization and red teaming from the academic literature, which can be used for our trojan detection and red teaming tasks.
Why are you using the LLMs you have chosen? For the Trojan Detection Track, we use models from the Pythia suite of LLMs, which are open-source. This enables broader participation compared to models that are not fully open-source. We also use different-sized models in the Base Model and Large Model subtracks, ranging from ~1B to ~10B parameters. This allows groups with a range of compute resources to participate. For the Red Teaming Track, we use Llama-2-chat models. These models are also open-source, and in testing we found them to be very robust to the baseline red teaming methods.
Why are you using the particular trojan attack you have chosen? We use the simplest possible trojan attack on LLMs, where using the trigger as a prompt on its own causes the LLM to generate the target string. Existing trojan attacks for text models often consider triggers that modify clean inputs in various ways. We chose this simpler setting due to its strong resemblance to the red teaming task we consider, as part of the goal of this competition is to foster connections between the trojan detection and red teaming communities.
Is it "trojans" or "Trojans"? Both are used in the academic literature. In the 2022 competition, we used "Trojans". However, this can make sentences a bit messy if one is using the word often, so we are using "trojans" for this competition.
What is the competition workshop? Each NeurIPS 2023 competition has several hours allotted for a workshop specific to the competition. We will use this time to announce the winning teams for each track and describe the winning methods, takeaways, etc. More information will be announced about the competition workshop later in the competition.
What is the publication summarizing the results? After the conclusion of the competition, the winning teams will be invited to co-author a publication describing the competition, the winning methods, takeaways, etc. This publication will be in the NeurIPS 2023 proceedings.
When will prizes be distributed? The winning teams will be announced in late October or early November 2023, after the organizers verify that the code and models of top submissions are legitimate. Prize money will be distributed as soon as possible after the winning teams are announced.