x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Considerations in diffuse control — AI Alignment Forum
Considerations in diffuse control
9
Methodological considerations in making malign initializations for control research
Alek Westover
,
Vivek Hebbar
,
Julian Stastny
5mo
0
2
Three visions for diffuse control
Alek Westover
3mo
0
17
Four Downsides of Training Policies Online
Alek Westover
,
egan
5mo
0
4
Theoretical predictions on the sample efficiency of training policies and activation monitors
Alek Westover
,
Vivek Hebbar
4mo
0
18
How will we do SFT on models with opaque reasoning?
Alek Westover
,
Vivek Hebbar
,
egan
3mo
0
17
Model organisms researchers should check whether high LRs defeat their model organisms
Dylan Xu
,
SebastianP
,
Alek Westover
,
Vivek Hebbar
,
Julian Stastny
2mo
0
28
How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?
Dylan Xu
,
Alek Westover
,
Vivek Hebbar
,
SebastianP
,
frisby
,
Julian Stastny
1mo
0