This post was written during Refine. Thanks to Jonathan Low, Linda Linsefors, Koen Holtman, Aaron Scher, and Nicholas Kees Dupuis for helpful discussion and feedback. Disclaimer: This post reflects my current understanding of the field and may not be an accurate representation of it. Feel free to comment if you...
This post was written as part of Refine. Thanks to Adam Shimi, Alexander Gietelink Oldenziel, and Vanessa Kosoy for helpful discussion and feedback. Summary This post aims to: * Advocate for embedding safety into development of machine learning models * Propose a framing on how to think about safety, where...
This summary was written as part of Refine. The ML Safety Course is created by Dan Hendrycks at the Center for AI Safety. Thanks to Adam Shimi and Thomas Woodside for helpful feedback. Overview Background I recently completed the ML Safety Course by watching the videos and browsing through the...
This post was written as part of Refine. Thanks to Adam Shimi, Lucas Teixeira, Linda Linsefors, and Jonathan Low for helpful feedback and comments. Epistemic status: highly uncertain. This post reflects my understanding of the terminologies and may not reflect the general consensus of AI alignment researchers (if any). Motivation...
This post has been written for the second Refine blog post day, at the end of the first week of iterating on ideas and concretely aiming at the alignment problem. Thanks to Adam Shimi, Paul Bricman, and Daniel Clothiaux for helpful discussion and comments. Introduction This post aims to provide...
This post has been written for the first Refine blog post day, at the end of the week of readings, discussions, and exercises about epistemology for doing good conceptual research. Thanks to Adam Shimi for helpful discussion and comments. I first got properly exposed to AI alignment ~1-2 years ago....