x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
cloud
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
cloud — AI Alignment Forum
3
cloud's Shortform
4mo
0
19
Apply for Alignment Mentorship from TurnTrout and Alex Cloud
1mo
0
37
[Paper] Output Supervision Can Obfuscate the CoT
2mo
0
54
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
3mo
0
77
[Research Note] Optimizing The Final Output Can Obfuscate CoT
6mo
3
105
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
6mo
0
31
Selective Generalization: Improving Capabilities While Maintaining Alignment
6mo
0
87
Distillation Robustifies Unlearning
8mo
22
28
Selective modularity: a research agenda
10mo
1
14
Is weak-to-strong generalization an alignment technique?
Q
1y
Q
1
64
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
1y
4
Comments