x

AI ALIGNMENT FORUM
AF

MATS Program — AI Alignment Forum

MATS Program

Edited by Ryan Kidd, Multicore, et al. last updated 30th Dec 2024

ML Alignment & Theory Scholars (MATS) Program is an educational seminar and independent research program that aims to provide talented scholars with talks, workshops, and research mentorship in the field of AI alignment, and connect them with the Berkeley AI safety research community.

3

3

Posts tagged MATS Program

3

138SolidGoldMagikarp (plus, prompt generation)

Jessica Rumbelow, mwatkins

3y

17

4

32SERI MATS Program - Winter 2022 Cohort

Ryan Kidd, Victor Warlop, Christian Smith

3y

0

5

140Understanding and controlling a maze-solving policy network

TurnTrout, peligrietzer, Ulisse Mini, Monte M, David Udell

3y

23

3

43Soft optimization makes the value target bigger

3y

4

3

25SERI ML Alignment Theory Scholars Program 2022

Ryan Kidd, Victor Warlop, ozhang

4y

0

2

56Finite Factored Sets in Pictures

Magdalena Wache

3y

2

2

50Recontextualization Mitigates Specification Gaming Without Modifying the Specification

ariana_azarbal, Victor Gillioz, TurnTrout, cloud

2mo

0

3

47Predictions for shard theory mechanistic interpretability results

TurnTrout, Ulisse Mini, peligrietzer

3y

6

1

33Modulating sycophancy in an RLHF model via activation steering

Nina Panickssery

2y

19

2

16Infra-Bayesian haggling

2y

0

2

14Normative vs Descriptive Models of Agency

3y

2

0

87Distillation Robustifies Unlearning

Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud, TurnTrout

6mo

22

2

99Mechanistically Eliciting Latent Behaviors in Language Models

Andrew Mack, TurnTrout

2y

20

2

77[Research Note] Optimizing The Final Output Can Obfuscate CoT

lukemarks, jacob_drori, cloud, TurnTrout

4mo

3

1

64Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout

1y

4

Load More (15/137)

Add Posts