x

AI ALIGNMENT FORUM

AF

Peter Hase — AI Alignment Forum

Peter Hase

Peter Hase

Message

Website: https://peterbhase.github.io

194

Ω

5

2

3

6y

Peter Hase

Website: https://peterbhase.github.io

New RFP on Interpretability from Schmidt Sciences

Request for Proposals Deadline: Tuesday, May 26, 2026 Schmidt Sciences invites proposals for a pilot program in AI interpretability. We seek new methods for detecting and mitigating deceptive behaviors from AI models, such as when models knowingly give misleading or harmful advice to users. If this pilot uncovers signs of...

Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

by Deleted user and Peter Hase

Peter Hase UNC Chapel Hill Owen Shen UC San Diego With thanks to Robert Kirk and Mohit Bansal for helpful feedback on this post. Introduction Model interpretability was a bullet point in Concrete Problems in AI Safety (2016). Since then, interpretability has come to comprise entire research directions in technical...

Apr 9, 2021•142