x

AI ALIGNMENT FORUM
AF

Miles Turpin — AI Alignment Forum

Miles Turpin

Ω52000

Research scientist at Scale AI on the SEAL team (safety)

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

No wikitag contributions to display.

No Comments Found

45Reward hacking behavior can generalize across tasks

2y

1

13Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

2y

0

17Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”

2y

0

22Unfaithful Explanations in Chain-of-Thought Prompting

3y

0