You are viewing version 1.3.0 of this page. Click here to view the latest version.

MATS Program

Edited by Multicore, Ryan Kidd last updated 30th Dec 2024

You are viewing revision 1.3.0, last edited by Ryan Kidd

The ML Alignment & Theory Scholars program.

https://www.matsprogram.org/

Posts tagged MATS Program

3

134SolidGoldMagikarp (plus, prompt generation)

Jessica Rumbelow, mwatkins

3y

17

4

32SERI MATS Program - Winter 2022 Cohort

Ryan Kidd, Victor Warlop, Christian Smith

3y

0

5

140Understanding and controlling a maze-solving policy network

TurnTrout, peligrietzer, Ulisse Mini, Monte M, David Udell

3y

23

3

43Soft optimization makes the value target bigger

Jeremy Gillen

3y

4

3

25SERI ML Alignment Theory Scholars Program 2022

Ryan Kidd, Victor Warlop, ozhang

4y

0

2

56Finite Factored Sets in Pictures

Magdalena Wache

3y

2

52Recontextualization Mitigates Specification Gaming Without Modifying the Specification

ariana_azarbal, Victor Gillioz, TurnTrout, cloud

2mo

0

3

47Predictions for shard theory mechanistic interpretability results

TurnTrout, Ulisse Mini, peligrietzer

3y

6

1

33Modulating sycophancy in an RLHF model via activation steering

Nina Panickssery

2y

19

2

17Infra-Bayesian haggling

hannagabor

2y

0

2

14Normative vs Descriptive Models of Agency

mattmacdermott

3y

2

1

121Steering GPT-2-XL by adding an activation vector

TurnTrout, Monte M, David Udell, lisathiergart, Ulisse Mini

3y

63

1

145Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai

2y

4

1

111Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley, Owain_Evans

9mo

1

77Refusal in LLMs is mediated by a single direction

Andy Arditi, Oscar Obeso, Aaquib111, wesg, Neel Nanda

2y

44

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

MATS Program