With thanks to Rohin Shah.

Dear LessWrongers, this is an opportunity to make money and help with AI alignment.

We're looking for specific AI capabilities; has anyone published on the following subject:

  • Generating multiple reward functions or policies from the same set of challenges. Has there been designs for deep learning or similar, in which the agent produces multiple independent reward functions (or policies) to explain the same reward function or behaviour?

For example, in CoinRun, the agent must get to the end of the level, on the right, to collect the coin. It only gets the reward for collecting the coin.

That is the "true" reward, but, since the coin is all the way to the right, as far as the agent knows, "go to the far right of the level" could just as well have been the true reward.

We'd want some design that generated both these reward functions (and, in general, generated multiple reward functions when there are several independent candidates). Alternatively, they might generate two independent policies - we could test these by putting the coin in the middle of the level and seeing what the agent decided to do.

We're not interested in a Bayesian approach that lists a bunch of reward functions and then updates to include just those two (that's trivially easy to do). Nor are we interested in an IRL-style approach that lists "features", including the coin and the right hand side.

What we'd want is some neural-net style design that generates the coin reward and the move-right reward just from the game data, without any previous knowledge of the setting.

So, does anyone know any references for that kind of work?

We will pay $50 for the first relevant reference submitted, and $100 for the best reference.

Thanks!

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 1:42 AM

What we'd want is some neural-net style design that generates the coin reward and the move-right reward just from the game data, without any previous knowledge of the setting.

So you're looking for curriculum design/exploration in meta-reinforcement-learning? Something like Enhanced POET/PLR/REPAIRED but where it's not just moving-right but a complicated environment with arbitrary reward functions (eg. using randomly initialized CNNs to map state to 'reward')? Or would hindsight or successor methods count as they relabel rewards for executed trajectories? Would relatively complex generative games like Alchemy or LIGHT count? Self-play, like robotics self-play?

Hey there! Sorry for the delay. $50 awarded to you for fastest good reference. PM me your bank details.