x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
cloud — AI Alignment Forum
cloud
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
3
cloud's Shortform
2mo
0
36
[Paper] Output Supervision Can Obfuscate the CoT
9d
0
50
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
2mo
0
77
[Research Note] Optimizing The Final Output Can Obfuscate CoT
4mo
3
105
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
4mo
0
31
Selective Generalization: Improving Capabilities While Maintaining Alignment
4mo
0
87
Distillation Robustifies Unlearning
6mo
22
26
Selective modularity: a research agenda
8mo
1
14
Is weak-to-strong generalization an alignment technique?
Q
10mo
Q
1
64
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
1y
4
Comments