AI alignment researcher, ML engineer. Masters in Neuroscience.
I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here's an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility. See my prediction market here: https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg
I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.
Ok, so this is definitely not a human thing, so probably a bit of a tangent. One of the topics that came up in a neuroscience class once was goose imprinting. There's apparently been studies (see Eckhard Hess for the early ones) that show that the strength of the imprinting (measured by behavior following the close of the critical period) onto whatever target is related to how much running towards the target the baby geese do. The hand-wavey explanation was something like 'probably this makes sense since if you have to run a lot to keep up with your mother-goose for safety, you'll need a strong mother-goose-following behavioral tendency to keep you safe through early development'.
I think this is an excellent description of GPT-like models. It both fits with my observations and clarifies my thinking. It also leads me to examine in a new light questions which have been on my mind recently:
What is the limit of power of simulation that our current architectures (with some iterative improvements) can achieve when scaled to greater power (via additional computation, improved datasets, etc)?
Is a Simulator model really what we want? Can we trust the outputs we get from it to help us with things like accelerating alignment research? What might failure modes look like?
Super handy seeming intro for newcomers.
I recommend adding Jade Leung to your list of governance people.
As for the list of AI safety people, I'd like to add that there are some people who've written interesting and much discussed content that it would be worth having some familiarity with.
John Wentworth
Steven Byrnes
Vanessa Kosoy
And personally I'm quite excited about the school of thought developing under the 'Shard theory' banner.
For shard theory info:
https://www.lesswrong.com/posts/xqkGmfikqapbJ2YMj/shard-theory-an-overview
https://www.alignmentforum.org/posts/vJFdjigzmcXMhNTsx/simulators
I'm excited to participate in this, and feel like the mental exercise of exploring this scenario would be useful for my education on AI safety. Since I'm currently funded by a grant from the Long Term Future Fund for reorienting my career to AI safety, and feel that this would be a reasonable use of my time, you don't need to pay me. I'd be happy to be a full-time volunteer for the next couple weeks.
Edit: I participated and was paid, but only briefly. Turns out I was too distracted thinking and talking about how the process could be improved and the larger world implications to actually be useful as an object-level worker. I feel like the experience was indeed useful for me, but not as useful to Beth as I had hoped. So... thanks and sorry!
Thanks Rohin. I also feel that interviewing after my 3 more months of independent work is probably the correct call.
I'm potentially interested in the Research Engineer position on the Alignment Team, but I'm currently 3 months into a 6 month grant from LTFF to reorient my career from general machine learning to AI safety specifically. My current plan is to keep doing solo work () until the last month of my grant period then begin applying to AI safety work at places like Anthropic, Redwood Research, Open AI, and Deepmind.
Do you think there's a significant advantage to applying soon vs 3 months from now?
I recommend this paper on the subject for additional reading:
The basal ganglia select the expected sensory input used for predictive coding