AI ALIGNMENT FORUM
AF

Wikitags

MATS Program

Edited by Ryan Kidd, Multicore, et al. last updated 30th Dec 2024

ML Alignment & Theory Scholars (MATS) Program is an educational seminar and independent research program that aims to provide talented scholars with talks, workshops, and research mentorship in the field of AI alignment, and connect them with the Berkeley AI safety research community.

Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged MATS Program
138SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow, mwatkins
3y
17
32SERI MATS Program - Winter 2022 Cohort
Ryan Kidd, Victor Warlop, Christian Smith
3y
0
140Understanding and controlling a maze-solving policy network
TurnTrout, peligrietzer, Ulisse Mini, Monte M, David Udell
2y
23
43Soft optimization makes the value target bigger
Jeremy Gillen
3y
4
25SERI ML Alignment Theory Scholars Program 2022
Ryan Kidd, Victor Warlop, ozhang
3y
0
56Finite Factored Sets in Pictures
Magdalena Wache
3y
2
47Predictions for shard theory mechanistic interpretability results
TurnTrout, Ulisse Mini, peligrietzer
3y
6
33Modulating sycophancy in an RLHF model via activation steering
Nina Panickssery
2y
19
16Infra-Bayesian haggling
hannagabor
1y
0
14Normative vs Descriptive Models of Agency
mattmacdermott
3y
2
87Distillation Robustifies Unlearning
Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud, TurnTrout
3mo
22
97Mechanistically Eliciting Latent Behaviors in Language Models
Andrew Mack, TurnTrout
1y
20
64Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
cloud, Jacob G-W, Evzen, Joseph Miller, TurnTrout
9mo
3
49Steering Llama-2 with contrastive activation additions
Nina Panickssery, Wuschel Schulz, NickGabs, Meg, evhub, TurnTrout
2y
23
53Model Organisms for Emergent Misalignment
Anna Soligo, Edward Turner, Mia Taylor, Senthooran Rajamanoharan, Neel Nanda
3mo
0
Load More (15/127)
Add Posts