x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Adversarial Examples (AI) — AI Alignment Forum
Adversarial Examples (AI)
Edited by
Multicore
,
Ruby
last updated
14th Dec 2024
This page is a stub.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
Adversarial Examples (AI)
Most Relevant
1
138
SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow
,
mwatkins
3y
17
3
35
AI Safety in a World of Vulnerable Machine Learning Systems
AdamGleave
,
EuanMcLean
3y
27
1
57
Deep Forgetting & Unlearning for Safely-Scoped LLMs
scasper
2y
6
1
40
Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort
1y
1
2
22
What progress have we made on automated auditing?
Q
LawrenceC
1y
Q
0
1
15
If I were a well-intentioned AI... I: Image classifier
Stuart_Armstrong
6y
4
1
14
Adversarial Robustness Could Help Prevent Catastrophic Misuse
aog
2y
15
1
2
The Goodhart Game
John_Maxwell
6y
3
1
7
AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
5y
3
1
65
High-stakes alignment via adversarial training [Redwood Research report]
dmz
,
LawrenceC
,
Nate Thomas
4y
15
1
34
Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave
,
EuanMcLean
,
Tony Wang
,
Kellin Pelrine
,
Tom Tseng
,
Yawen Duan
,
Joseph Miller
,
MichaelDennis
2y
9
0
23
SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4
AdamYedidia
3y
1
1
25
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Scott Emmons
,
Luke Bailey
,
Euan Ong
2y
1
1
23
Beyond the Board: Exploring AI Robustness Through Go
AdamGleave
1y
1
1
15
EIS IX: Interpretability and Adversaries
scasper
3y
5