x

AI ALIGNMENT FORUM

AF

Adversarial Examples (AI) — AI Alignment Forum

Adversarial Examples (AI)

Edited by Multicore, Ruby last updated 14th Dec 2024

This page is a stub.

Add Posts

Posts tagged Adversarial Examples (AI)

1

134SolidGoldMagikarp (plus, prompt generation)

Jessica Rumbelow, mwatkins

3y

17

3

35AI Safety in a World of Vulnerable Machine Learning Systems

AdamGleave, EuanMcLean

3y

27

1

57Deep Forgetting & Unlearning for Safely-Scoped LLMs

3y

6

1

42Solving adversarial attacks in computer vision as a baby version of general AI alignment

2y

2

2

22What progress have we made on automated auditing?

2y

0

1

15If I were a well-intentioned AI... I: Image classifier

Stuart_Armstrong

6y

4

1

14Adversarial Robustness Could Help Prevent Catastrophic Misuse

3y

15

1

2The Goodhart Game

7y

3

1

7AXRP Episode 1 - Adversarial Policies with Adam Gleave

5y

3

1

65High-stakes alignment via adversarial training [Redwood Research report]

dmz, LawrenceC, Nate Thomas

4y

15

1

34Even Superhuman Go AIs Have Surprising Failure Modes

AdamGleave, EuanMcLean, Tony Wang, Kellin Pelrine, Tom Tseng, Yawen Duan, Joseph Miller, MichaelDennis

3y

9

0

23SmartyHeaderCode: anomalous tokens for GPT3.5 and GPT-4

3y

1

1

25Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Scott Emmons, Luke Bailey, Euan Ong

3y

1

1

23Beyond the Board: Exploring AI Robustness Through Go

2y

1

1

15EIS IX: Interpretability and Adversaries

3y

5

Load More (15/23)

Add Posts