Half-researcher, half-distiller (see https://distill.pub/2017/research-debt/), both in AI Safety. Funded, and also finishing a PhD student in theoretical computer science (distributed computing) in France.
Thanks for the good work!
I especially like the choice set paper.
After rereading the chapter in Superintelligence, it seems to me that "genie" captures something akin to act-based agents. Do you think that's the main way to use this concept in the current state of the field, or do you have other applications in mind?
Is that from Superintelligence? I googled it, and that was the most convincing result.
Nice post, I like the changes you did from the last draft I read. I also like the use of the new prediction function. Do you intend to do something with the feedback (Like a post, or a comment)?
This fits with discussions I've been having with researchers about recommenders systems like youtube, and all sorts of risks related to them. I'm glad this post try to push the discussion around the subject!
I really liked this post. Not used to thinking about brain algorithms, but I believe I followed most of your points.
That being said, I'm not sure I get how your hypotheses explain the actual behavior of the rats. Just looking at hypothesis 3, you posit that thinking about salt gets an improved reward, and so does actions that make the rat expect salt-tasting. But that doesn't remove the need for exploration! The neocortex still needs to choose a course of action before getting a reward. Actually, if thinking about salt is rewarded anyway, this might reinforce any behavior decided after thinking about salt. And if the interpretability is better and only rewards actions that are expected to result in tasting salt, there is still need for exploring to find such a plan and having it reinforced.
Am I getting something wrong?
Hmmm, that's not quite what I meant. It's not about stopping at some meta-level, but rather, stopping at some amount of learning in the system. The system should learn not just level-specific information, but also cross-level information (like overall philosophical heuristics), which means that even if you stop teaching the machine at some point, it can still produce new reasoning at higher levels which should be similar to feedback you might have given.
Interesting. So the point is to learn how to move up the hierarchy? I mean, that makes a lot of sense. It is a sort of fixed point description, because then the AI can keep moving up the hierarchy as far as it wants, which mean the whole hierarchy is encoded by it's behavior. It's just a question of how far up it needs to go to get satisfying answers.
Is that correct?
This post is amazing. Both for me as a researcher, and for the people I know that want to contribute to AI existential safety. Just last week, a friend asked what he should try to do his PhD in AI/ML on, if he wants to contribute to AI existential safety. I mentioned interpretability, but now I have somewhere to redirect him.
As for my own thinking, I value immensely the attempt to say what is in the right direction even in technical research like AI Alignment. Most people in this area are here for helping AI existential Safety, but even after deciding to go into the field, the question of relevance of specific research ideas should be asked. I'm more into agent foundations kind of stuff, but even there, as you argue, one can look for consequences of success on AI existential safety.
The main way I can see present-day technical research benefitting existential safety is by anticipating, legitimizing and fulfilling governance demands for AI technology that will arise over the next 10-30 years. In short, there often needs to be some amount of traction on a technical area before it’s politically viable for governing bodies to demand that institutions apply and improve upon solutions in those areas.
Great way to think about the value of some research! I would probably add "creating", because some governance demands come from technical study finding potential issues we need to deal with. Also, I really would love to see a specific post on this take, or a question; really anything that doesn't require precommitting to read a long post on a related subject.
Really enjoying your posts on normativity! The way I summarize it internally is "Thinking about fixed-points for the meta aspect of human reasoning". How fixed-point-y do you think solutions are likely to be?
We could never entirely pin down the concept of human values, but at some point, the system would be reasoning so much like us (or rather, so much like we would want to reason) that this wouldn't be a concern.
I'm confused about this sentence, because it seems to promote an idea in contradiction with your other writing on normativity (and even earlier sections in this post). Because the quote says that at some level you could stop caring (which means we can keep going meta until there's not significant improvement, and stop there), while the rest of your writing says that we should deal with the whole hierarchy at once.
Yeah, rereading myself, you're right. I think the important thing I wanted to say is just that the productive/unproductive desires or goals seems an interesting idea to formalize an aspect of goal-directedness.