All of jacek's Comments + Replies

Thanks for the comment! Note that we use state-action visitation distribution, so we consider trajectories that contain actions as well. This makes it possible to invert  (as long as all states are visited). Using only states trajectories, it would indeed be impossible to recover the policy.

2Alex Turner4mo
Thanks, this was an oversight on my part.