# All Posts

Sorted by Magic (New & Upvoted)

# Wednesday, October 21st 2020Wed, Oct 21st 2020

Shortform
1Alex Turner6dFrom unpublished work.The answer to this seems obvious in isolation: shaping helps with credit assignment, rescaling doesn't (and might complicate certain methods in the advantage vs Q-value way). But I feel like maybe there's an important interaction here that could inform a mathematical theory of how a reward signal guides learners through model space?