Many thanks to Brandon Goldman, David Langer, Samuel Härgestam, Eric Ho, Diogo de Lucena, and Marc Carauleanu, for their support and feedback throughout. Most alignment researchers we sampled in our recent survey think we are currently not on track to succeed with alignment–meaning that humanity may well be on track...
> As the capabilities of frontier artificial intelligence models continue to increase rapidly, ensuring the security of these systems has become a critical priority. In our previous posts, we’ve focused on Anthropic’s approach to safety, and Claude’s capabilities and applications. In this post, we are sharing some of the steps...
You can see the actual submission (including a more formalized model) here, and the contest details here. I've reordered things to be more natural as a blog post / explain the rationale / intuition a bit better. This didn't get a prize, tho it may have been because I didn't...
H/T Aella. A company that made machine learning software for drug discovery, on hearing about the security concerns for these sorts of models, asked: "huh, I wonder how effective it would be?" and within 6 hours discovered not only one of the most potent known chemical warfare agents, but also...
Blog post by Alex Irpan. The basic summary: > In 2015, I made the following forecasts about when AGI could happen. > > * 10% chance by 2045 > * 50% chance by 2050 > * 90% chance by 2070 > > Now that it’s 2020, I’m updating my forecast...
I haven't paid much attention to Atari in a long time, and so would appreciate takes from anyone who follows this more closely. My take: A single architecture that can handle both the games that require significant exploration, and the games that require long-term credit assignment, and the 'easy' games,...