AI ALIGNMENT FORUMTags
AF

Adversarial Examples

EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
EditHistorySubscribe
Discussion (0)
Help improve this page (2 flags)
Adversarial Examples
Random Tag
Contributors
1Multicore

Adversarial examples are situations that have unusual features that will cause an AI to make choices that seem obviously wrong to a human. For example, an image of a panda can be subtly manipulated so that an image classifier classifies it as a gibbon.

Posts tagged Adversarial Examples
Most Relevant
1
147SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow, mwatkins
1mo
14
3
26AI Safety in a World of Vulnerable Machine Learning Systems
AdamGleave, EuanMcLean
14d
10
1
15If I were a well-intentioned AI... I: Image classifier
Stuart Armstrong
3y
4
1
2The Goodhart Game
John Maxwell
3y
3
1
7AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
2y
3
1
65High-stakes alignment via adversarial training [Redwood Research report]
dmz, Lawrence Chan, Nate Thomas
1y
10
1
14[AN #62] Are adversarial examples caused by real but imperceptible features?
Rohin Shah
4y
8
0
17Evidence Sets: Towards Inductive-Biases based Analysis of Prosaic AGI
bayesian_kitten
1y
1
1
8EIS IX: Interpretability and Adversaries
Stephen Casper
1mo
1
1
7Adversarial attacks and optimal control
Jan Hendrik Kirchner
10mo
0
1
5EIS X: Continual Learning, Modularity, Compression, and Biological Brains
Stephen Casper
1mo
0
1
5EIS XII: Summary
Stephen Casper
1mo
0
Add Posts