Wiki Contributions

Comments

jacek6mo30

Thanks for the comment! Note that we use state-action visitation distribution, so we consider trajectories that contain actions as well. This makes it possible to invert  (as long as all states are visited). Using only states trajectories, it would indeed be impossible to recover the policy.