WebsiteEditorialRepo Change in 18 latent capabilities between GPT-3 and o1, from Zhou et al (2025) This is the third annual review of what’s going on in technical AI safety. You could stop reading here and instead explore the data on the shallow review website. It’s shallow in the sense that...
from aisafety.world The following is a list of live agendas in technical AI safety, updating our post from last year. It is “shallow” in the sense that 1) we are not specialists in almost any of it and that 2) we only spent about an hour on each entry. We...
Summary * Creating an AI that can do AI alignment research (an automated alignment researcher) is one part of OpenAI’s three-part alignment plan and a core goal of their Superalignment team. * The automated alignment researcher will probably involve an advanced language model more capable than GPT-4 and possibly with...