When fine-tuning fails to elicit GPT-3.5's chess abilities
Produced as part of the ML Alignment & Theory Scholars Program - Winter 2023-24 Cohort under the supervision of Evan Hubinger. Acknowledgements: Thanks to Kyle Brady for his many contributions to this project. Abstract This post argues that the performance elicited by fine-tuning an LLM on a task using a...
Jun 14, 202442