AI ALIGNMENT FORUM
AF

Miles Turpin
Ω52000
Message
Dialogue
Subscribe

Research scientist at Scale AI on the SEAL team (safety)

Posts

Sorted by New
43Reward hacking behavior can generalize across tasks
1y
1
13Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
1y
0
17Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
2y
0
22Unfaithful Explanations in Chain-of-Thought Prompting
2y
0

Wikitag Contributions

No wikitag contributions to display.

Comments

Sorted by
Newest
No Comments Found