AI ALIGNMENT FORUM
AF

Wikitags

Adversarial Examples (AI)

Edited by Multicore, Ruby last updated 14th Dec 2024
This page is a stub.
Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged Adversarial Examples (AI)
138SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow, mwatkins
3y
17
35AI Safety in a World of Vulnerable Machine Learning Systems
AdamGleave, EuanMcLean
2y
27
57Deep Forgetting & Unlearning for Safely-Scoped LLMs
scasper
2y
6
39Solving adversarial attacks in computer vision as a baby version of general AI alignment
Stanislav Fort
1y
1
22What progress have we made on automated auditing?
Q
LawrenceC
1y
Q
0
15If I were a well-intentioned AI... I: Image classifier
Stuart_Armstrong
6y
4
14Adversarial Robustness Could Help Prevent Catastrophic Misuse
aog
2y
15
2The Goodhart Game
John_Maxwell
6y
3
7AXRP Episode 1 - Adversarial Policies with Adam Gleave
DanielFilan
5y
3
65High-stakes alignment via adversarial training [Redwood Research report]
dmz, LawrenceC, Nate Thomas
3y
15
34Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave, EuanMcLean, Tony Wang, Kellin Pelrine, Tom Tseng, Yawen Duan, Joseph Miller, MichaelDennis
2y
9
23SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4
AdamYedidia
2y
1
25Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Scott Emmons, Luke Bailey, Euan Ong
2y
1
23Beyond the Board: Exploring AI Robustness Through Go
AdamGleave
1y
1
15EIS IX: Interpretability and Adversaries
scasper
3y
5
Load More (15/23)
Add Posts