https://www.science.org/doi/10.1126/science.adn0117 Authors: Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, Atılım Güneş Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner*,...
(This is just my opinion, not necessarily shared by the other co-authors) Many people have commented that they find the results of our recent lie detection paper very surprising. I find the results somewhat surprising, but not as surprising as many readers. It looks like I’ve miscommunicated something, and I’ll...
This post is a copy of the introduction of this paper on lie detection in LLMs. The Twitter Thread is here. Authors: Lorenzo Pacchiardi, Alex J. Chan, Sören Mindermann, Ilan Moscovitz, Alexa Y. Pan, Yarin Gal, Owain Evans, Jan Brauner [EDIT: Many people said they found the results very surprising....
Hey, this is me. I’d like to understand AI X-risk better. Is anyone interested in being my “alignment tutor”, for maybe 1 h per week, or 1 h every two weeks? I’m happy to pay. Fields I want to understand better: * Anything related to prosaic AI alignment/existential ML safety...
This is a discussion post for ChatGPT. I'll start off with some observations/implications: * ChatGPT (davinci_003) seems a lot better/more user-friendly than davinci_002 was. * The easy-to-use API probably means that many more people will interact with it. * ChatGPT is a pretty good copy-editor (I haven't tried davinci_002 for...
We might be able to train AI alignment assistants that massively accelerate/improve the alignment research that gets done. These assistants need not have (strongly) superhuman capabilities or be highly agentic, they just need to be capable and aligned enough to allow us to offload most work on alignment, safety, and...
TL;DR: There are many posts on the Alignment Forum/LessWrong that could easily be on arXiv. Putting them on arXiv has several large benefits and (sometimes) very low costs. Benefits of having posts on arXiv There are several large benefits of putting posts on arXiv: * 1. Much better searchability, shows...