AI ALIGNMENT FORUM
AF

1753
cloud
Ω209100
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No Comments Found
No wikitag contributions to display.
3cloud's Shortform
1mo
0
41Recontextualization Mitigates Specification Gaming Without Modifying the Specification
2d
0
77[Research Note] Optimizing The Final Output Can Obfuscate CoT
3mo
3
105Subliminal Learning: LLMs Transmit Behavioral Traits via Hidden Signals in Data
2mo
0
31Selective Generalization: Improving Capabilities While Maintaining Alignment
3mo
0
87Distillation Robustifies Unlearning
4mo
22
26Selective modularity: a research agenda
7mo
1
14Is weak-to-strong generalization an alignment technique?
Q
8mo
Q
1
64Gradient Routing: Masking Gradients to Localize Computation in Neural Networks
10mo
4