This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
1753
cloud
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
3
cloud's Shortform
1mo
0
41
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
2d
0
77
[Research Note] Optimizing The Final Output Can Obfuscate CoT
3mo
3
105
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
2mo
0
31
Selective Generalization: Improving Capabilities While Maintaining Alignment
3mo
0
87
Distillation Robustifies Unlearning
4mo
22
26
Selective modularity: a research agenda
7mo
1
14
Is weak-to-strong generalization an alignment technique?
Q
8mo
Q
1
64
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
10mo
4
Comments