AI ALIGNMENT FORUM
AF

3312
Wikitags

MATS Program

Edited by Ryan Kidd, Multicore, et al. last updated 30th Dec 2024

ML Alignment & Theory Scholars (MATS) Program is an educational seminar and independent research program that aims to provide talented scholars with talks, workshops, and research mentorship in the field of AI alignment, and connect them with the Berkeley AI safety research community.

Subscribe
Discussion
2
Subscribe
Discussion
2
Posts tagged MATS Program
138SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow, mwatkins
3y
17
32SERI MATS Program - Winter 2022 Cohort
Ryan Kidd, Victor Warlop, Christian Smith
3y
0
140Understanding and controlling a maze-solving policy network
TurnTrout, peligrietzer, Ulisse Mini, Monte M, David Udell
3y
23
43Soft optimization makes the value target bigger
Jeremy Gillen
3y
4
25SERI ML Alignment Theory Scholars Program 2022
Ryan Kidd, Victor Warlop, ozhang
3y
0
56Finite Factored Sets in Pictures
Magdalena Wache
3y
2
44Recontextualization Mitigates Specification Gaming Without Modifying the Specification
ariana_azarbal, vgillioz, TurnTrout, cloud
3d
0
47Predictions for shard theory mechanistic interpretability results
TurnTrout, Ulisse Mini, peligrietzer
3y
6
33Modulating sycophancy in an RLHF model via activation steering
Nina Panickssery
2y
19
16Infra-Bayesian haggling
hannagabor
1y
0
14Normative vs Descriptive Models of Agency
mattmacdermott
3y
2
87Distillation Robustifies Unlearning
Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud, TurnTrout
4mo
22
97Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack, TurnTrout
1y
20
77[Research Note] Optimizing The Final Output Can Obfuscate CoT
lukemarks, jacob_drori, cloud, TurnTrout
3mo
3
64Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout
10mo
4
Load More (15/132)
Add Posts