x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
cloud — AI Alignment Forum
cloud
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
3
cloud's Shortform
4mo
0
22
Apply for Alignment Mentorship from TurnTrout and Alex Cloud
21d
0
37
[Paper] Output Supervision Can Obfuscate the CoT
2mo
0
54
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
3mo
0
77
[Research Note] Optimizing The Final Output Can Obfuscate CoT
6mo
3
105
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
6mo
0
31
Selective Generalization: Improving Capabilities While Maintaining Alignment
6mo
0
87
Distillation Robustifies Unlearning
7mo
22
28
Selective modularity: a research agenda
10mo
1
14
Is weak-to-strong generalization an alignment technique?
Q
1y
Q
1
64
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
1y
4
Review
Comments