AI ALIGNMENT FORUM
AF

Inverse Reinforcement LearningReinforcement learning
Personal Blog

0

Inverse reinforcement learning on self, pre-ontology-change

by Stuart_Armstrong
18th Nov 2015
1 min read
2

0

Inverse Reinforcement LearningReinforcement learning
Personal Blog
Inverse reinforcement learning on self, pre-ontology-change
0jessicata
0Stuart_Armstrong
New Comment
2 comments, sorted by
top scoring
Click to highlight new comments since: Today at 7:43 PM
[-]jessicata10y00

It seems like this is moving the complexity of the ontology mapping problem into the behavior model. In order to explain the agent's strangely correlated errors, the behavior model will probably need to say something about the agent's original ontology and how that relates to their goals in the new ontology.

Reply
[-]Stuart_Armstrong10y00

I'm hoping there would be at least some gain that is an extension of the old preferences and not just a direct translation into the old ontology.

Reply
Moderation Log
More from Stuart_Armstrong
View more
Curated and popular this week
2Comments

Inverse reinforcement learning is the challenge of constructing a value system that "explains" the behaviour of another agent. Part of the idea is to have algorithms deduce human observations from human behaviours.

It struck me that this could be used by the agent on themselves. Imagine we had a diamond-maximising agent, who believed something like classical Greek "science", and behaved to accumulate the maximal amount of these shiny crystals. However, they have an ontology change, and learn quantum physics. This completely messes up their view of what a "diamond" is.

However, what if they replayed their previous behaviour, and tried to deduce what possible utility function, in a quantum world, could explain what they had done? They would be trying to fit a quantum-world-aware utility to the decisions of a non-quantum-world-aware being.

This could possibly result in a useful extension of the original motivation to the new setup (at least, it would guarantee similar behaviour in similar circumstances). There are many challenges - most especially that a quantum-aware being has far more knowledge about how to affect the world, and thus far more options - but they seem the usual sort of inverse reinforcement learning (partial knowledge, noise, etc...)