Excited to see this proposal and would be interested in following your results, and excited about bridging "traditional deep learning" and AF - I personally think that there's a lot of value in having a common language between "traditional DL" community and the AIS community (such as, the issues with current AI ethics could be seen as a scaled-down issues with the AGI). A lot of theoretical results on AF could benefit from simple practical examples for the sake of having a clear definition in code, and a lot of the ethics discussions could benefit from a larger perspective of AGI alignment (my own personal opinion)
I have a prediction that policy gradient RL agents (and all of them that only learn the policy) do not have good models of environments in them. For example, in order to succeed in Cartpole, all we need (in the most crude version) is to map "pole a bit to the left" to "go to the left" and "pole a bit to the right" to "go to the right". Having a policy of such an agent does not allow to determine the exact properties of the environment such as the mass of the cart or the mass of the pole, because a single policy would work for some range of masses (my prediction). Thus, it does not contain a "mental model". Basically, solving the environment is simpler than understanding it (this is also the case in the real world sometimes :). In contrast, more sophisticated agents such as MuZero or WorldModels, use a value function inside + a learned mapping from current observation+action to the next one, and in a way it is a "mental model" (though not a very interpretable one...). Would be excited about ones that "we can somewhat understand" -- current ones seem to lack this property...
Some questions below:
My background on the question: I worked in my MSc thesis on one of the directions, specifically, "using a simplicity prior" to uncover "a model that we can somewhat understand" from pixels. Specifically, I discover a transformation from observation to latent features, such that it allows for a causal model on these latent features with the fewest edges (simplicity). Some tricks (lagrangian multipliers, some custom neural nets etc) are required, and the simplicity prior makes the problem of finding a causal graph NP-hard. The upside though is that the models are quite interpretable in the end, though it works only for small grid-worlds and toy benchmarks so far... I have a vague feeling that the general problem could be solved in a much simpler way than I did, and would be excited to see the results of the research! Thesis: https://infoscience.epfl.ch/record/287445/ GH: https://github.com/sergia-ch/causality-disentanglement-rl