We're excited to announce the NeurIPS ML Safety workshop! To our knowledge it is the first workshop at a top ML conference to emphasize and explicitly discuss x-risks.
$100K in paper prizes will be awarded. There is $50K for best paper awards. There is also $50K in awards for discussing x-risk. This will be awarded to researchers who adequately explain how their work relates to AI x-risk. Analyses must engage with existing arguments for existential risks or strategies to reduce them.
Broadly, the focus of the workshop is on ML Safety, which is the umbrella term that refers to research in the following areas:
Robustness: designing systems to be resistant to adversaries.
Monitoring: detecting undesirable behavior and discovering unexpected model functionality.
This category contains interpretability and transparency research, which could be useful for understanding the goals/thought processes of advanced AI systems. It also includes anomaly detection, which has been useful for detecting proxy gaming. It also includes Trojans research, which involves identifying whether a deep neural network will suddenly change behavior if certain unknown conditions are met.
Alignment: building models that represent and safely optimize hard-to-specify human values.
This also includes preventing agents from pursuing unintended instrumental subgoals and designing them to be corrigible.
Systemic Safety: using ML to address broader governance risks related to how ML systems are handled or deployed. Examples include ML for cyberdefense, ML for improving epistemics, and cooperative AI.
The majority of AI research is published at conferences. These conferences support independently run workshops for research sub-areas. Researchers submit papers to workshops, and if their work is accepted, they are given the opportunity to present it to other participants. For background on the ML research community and its dynamics, see A Bird's Eye View of the ML Field.
A broad overview of these research areas is in Unsolved Problems in ML Safety.
For a discussion of how these problems impact x-risk, please see Open Problems in AI X-Risk.
Mod note: I'm frontpaging this. It's a bit of an edge case (workshops definitely aren't timeless, but we have tended to frontpage prize/contest announcements for intellectual content)