This remains the best overview of the learning-theoretic agenda to-date. As a complementary pedagogic resource, there is now also a series of video lectures.
Since the article was written, there were several new publications:
In addition, some new developments were briefly summarized in short-forms:
Meanwhile, active research proceeds along several parallel directions:
It remains true that there are more shovel-ready open problems than researchers, and hence the number of (competent) researchers is still the bottleneck.
Regarding direction 17: There might be some potential drawbacks to ADAM. I think its possible that some very agentic programs have relatively low score. This is due to explicit optimization algorithms being low complexity.
(Disclaimer: the following argument is not a proof, and appeals to some heuristics/etc. We fix for these considerations too.) Consider an utility function . Further, consider a computable approximation of the optimal policy (AIXI that explicitly optimizes for ) and has an approximation parameter n (this could be AIXI-tl, plus some approximation of ; higher is better approximation). We will call this approximation of the optimal policy . This approximation algorithm has complexity , where is a constant needed to describe the general algorithm (this should not be too large).
We can get better approximation by using a quickly growing function, such as the Ackermann function with . Then we have .
What is the score of this policy? We have . Let be maximal in this expression. If , then .
For the other case, let us assume that if , the policy is at least as good at maximizing than . Then, we have .
I don't think that the assumption ( maximizes better than ) is true for all and , but plausibly we can select such that this is the case (exceptions, if they exist, would be a bit weird, and if ADAM working well due to these weird exceptions feels a bit disappointing to me). A thing that is not captured by approximations such as AIXI-tl are programs that halt but have insane runtime (longer than ). Again, it would feel weird to me if ADAM sort of works because of low-complexity extremely-long-running halting programs.
To summarize, maybe there exist policies which strongly optimize a non-trivial utility function with approximation parameter , but where is relatively small.
Yes, this is an important point, of which I am well aware. This is why I expect unbounded-ADAM to only be a toy model. A more realistic ADAM would use a complexity measure that takes computational complexity into account instead of . For example, you can look at the measure I defined here. More realistically, this measure should be based on the frugal universal prior.
I have a question about the conjecture at the end of Direction 17.5. Let be a utility function with values in and let be a strictly monotonous function. Then and have the same maxima. can be non-linear, e.g. . Therefore, I wonder if the condition should be weaker.
Moreover, I ask myself if it is possible to modify by a small amount at a place far away from the optimal policy such that is still optimal for the modified utility function. This would weaken the statement about the uniqueness of the utility function even more. Think of an AI playing Go. If a weird position on the board has the utility -1.01 instead of -1, this should not change the winning strategy. I have to go through all of the definitions to see if I can actually produce a more mathematical example. Nevertheless, you may have a quick opinion if this could happen.
I have a question about the conjecture at the end of Direction 17.5. Let be a utility function with values in and let be a strictly monotonous function. Then and have the same maxima. can be non-linear, e.g. . Therefore, I wonder if the condition should be weaker.
No, because it changes the expected value of the utility function under various distributions.
Moreover, I ask myself if it is possible to modify by a small amount at a place far away from the optimal policy such that is still optimal for the modified utility function.
Good catch, the conjecture as stated is obviously false. Because, we can e.g. take to be the same as everywhere except after some action which doesn't actually take, in which case make it identically 0. Some possible ways to fix it: