x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
cloud
Posts
Sorted by New
Wikitag Contributions
Comments
Sorted by
Newest
cloud — AI Alignment Forum
3
cloud's Shortform
3mo
0
37
[Paper] Output Supervision Can Obfuscate the CoT
23d
0
52
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
2mo
0
77
[Research Note] Optimizing The Final Output Can Obfuscate CoT
4mo
3
105
Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
4mo
0
31
Selective Generalization: Improving Capabilities While Maintaining Alignment
5mo
0
87
Distillation Robustifies Unlearning
6mo
22
26
Selective modularity: a research agenda
9mo
1
14
Is weak-to-strong generalization an alignment technique?
Q
10mo
Q
1
64
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
1y
4
Comments