Robert Miles

Wiki Contributions

Comments

Robert Miles's Shortform

Learning Extensible Human Concepts Requires Human Values

[Based on conversations with Alex Flint, and also John Wentworth and Adam Shimi]

One of the design goals of the ELK proposal is to sidestep the problem of learning human values, and settle instead for learning human concepts. A system that can answer questions about human concepts allows for schemes that let humans learn all the relevant information about proposed plans and decide about them ourselves, using our values.

So, we have some process in which we consider lots of possible scenarios and collect a dataset of questions about those scenarios, along with the true answers to those questions. Importantly these are all 'objective' or 'value-neutral' questions - things like "Is the diamond on the pedestal?" and not like "Should we go ahead with this plan?". This hopefully allows the system to pin down our concepts, and thereby truthfully answer our objective questions about prospective plans, without considering our values.

One potential difficulty is that the plans may be arbitrarily complex, and may ask us to consider very strange situations in which our ontology breaks down. In the worst case, we have to deal with wacky science fiction scenarios in which our fundamental concepts are called into question.

We claim that, using a dataset of only objective questions, it is not possible to extrapolate our ontology out to situations far from the range of scenarios in the dataset. 

An argument for this is that humans, when presented with sufficiently novel scenarios, will update their ontology, and *the process by which these updates happen depends on human values*, which are (by design) not represented in the dataset. Accurately learning the current human concepts is not sufficient to predict how those concepts will be updated or extended to novel situations, because the update process is value-dependent.

Alex Flint is working on a post that will move towards proving some related claims.


 

Disentangling Corrigibility: 2015-2021

Note that the way Paul phrases it in that post is much clearer and more accurate:

> "I believe this concept was introduced in the context of AI by Eliezer and named by Robert Miles"

Disentangling Corrigibility: 2015-2021

Yeah I definitely wouldn't say I 'coined' it, I just suggested the name

AI Alignment Open Thread August 2019

Yeah, nuclear power is a better analogy than weapons, but I think the two are linked, and the link itself may be a useful analogy, because risk/coordination is affected by the dual-use nature of some of the technologies.

One thing that makes non-proliferation difficult is that nations legitimately want nuclear facilities because they want to use nuclear power, but 'rogue states' that want to acquire nuclear weapons will also claim that this is their only goal. How do we know who really just wants power plants?

And power generation comes with its own risks. Can we trust everyone to take the right precautions, and if not, can we paternalistically restrict some organisations or states that we deem not capable enough to be trusted with the technology?

AI coordination probably has these kinds of problems to an even greater degree.