Inverse reinforcement learning on self, pre-ontology-change

Stuart_Armstrong

Inverse reinforcement learning on self, pre-ontology-change

1 min read18th Nov 20152 comments

0

Inverse Reinforcement LearningReinforcement Learning

Personal Blog

Inverse reinforcement learning is the challenge of constructing a value system that "explains" the behaviour of another agent. Part of the idea is to have algorithms deduce human observations from human behaviours.

It struck me that this could be used by the agent on themselves. Imagine we had a diamond-maximising agent, who believed something like classical Greek "science", and behaved to accumulate the maximal amount of these shiny crystals. However, they have an ontology change, and learn quantum physics. This completely messes up their view of what a "diamond" is.

However, what if they replayed their previous behaviour, and tried to deduce what possible utility function, in a quantum world, could explain what they had done? They would be trying to fit a quantum-world-aware utility to the decisions of a non-quantum-world-aware being.

This could possibly result in a useful extension of the original motivation to the new setup (at least, it would guarantee similar behaviour in similar circumstances). There are many challenges - most especially that a quantum-aware being has far more knowledge about how to affect the world, and thus far more options - but they seem the usual sort of inverse reinforcement learning (partial knowledge, noise, etc...)

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:50 AM

[-]Jessica Taylor8y00

It seems like this is moving the complexity of the ontology mapping problem into the behavior model. In order to explain the agent's strangely correlated errors, the behavior model will probably need to say something about the agent's original ontology and how that relates to their goals in the new ontology.

Reply

[-]Stuart Armstrong8y00

I'm hoping there would be at least some gain that is an extension of the old preferences and not just a direct translation into the old ontology.

Reply

Moderation Log