Implications of automated ontology identification

adamShimi; Robert Miles

I don't understand why a strong simplicity guarantee places most of the difficulty on the learning problem. In the diamond situation, a strong simplicity requirement on the reporter can mean that the direct translator gets ruled out, since it may have to translate from a very large and sophisticated AI predictor?

if automated ontology identification does turn out to be possible from a finite narrow dataset, and if automated ontology identification requires an understanding of our values, then where did the information about our values come from? It did not come from the dataset because we deliberately built a dataset of human answers to objective questions. Where else did it come from?

Perhaps I miss the mystery. My first reaction is, "It came from the assumption of a true decision boundary, and the ability to recursively deploy monotonically better generalization while maintaining conservativeness."

But an automated ontology identifier that would be guaranteed safe if tasked with extrapolating our concepts still brings up the question of how that guarantee was possible without knowledge of our values. You can’t dodge the puzzle

I feel like this part is getting slippery with how words are used, in a way which is possibly concealing unimagined resolutions to the apparent tension. Why can't I dodge the puzzle? Why can't I have an intended ELK reporter which answers the easy questions, and a small subset of the hard questions, without also being able to infinitely recurse an ELK solution to get better and better conservative reporters?

[-]Alex Flint4y40

I don't understand why a strong simplicity guarantee places most of the difficulty on the learning problem. In the diamond situation, a strong simplicity requirement on the reporter can mean that the direct translator gets ruled out, since it may have to translate from a very large and sophisticated AI predictor?

What we're actually doing is here is defining "automated ontology identification" as an algorithm that only has to work if the predictor computes intermediate results that are sufficiently "close" to what is needed to implement a conservative helpful decision boundary. Because we're working towards an impossibility result, we wanted to make it as easy as possible for an algorithm to meet the requirements of "automated ontology identification". If some proposed automated ontology identifier works without the need for any such "sufficiently close intermediate computation" guarantee then it should certainly work in the presence of such a guarantee.

So this "sufficiently close intermediate computation" guarantee kind of changes the learning problem from "find a predictor that predicts well" to "find a predictor that predicts well and also computes intermediate results meeting a certain requirement". That is a strange requirement to place on a learning process, but it's actually really hard to see any way to avoid make some such requirement, because if we place no requirement at all then what if the predictor is just a giant lookup table? You might say that such a predictor would not physically fit inside the whole universe, and that's correct, and is why we wanted to operationalize this "sufficiently close intermediate computation" guarantee, even though it changes the definition of the learning problem in a very important way.

But this is all just to make the definition of "automated ontology identification" not unreasonably difficult, in order that we would not be analyzing a kind of "straw problem". You could ignore the "sufficiently close intermediate computation" guarantee completely and treat the write-up as analyzing the more difficult problem of automated ontology identification without any guarantee about the intermediate results computed by the predictor.

Perhaps I miss the mystery. My first reaction is, "It came from the assumption of a true decision boundary, and the ability to recursively deploy monotonically better generalization while maintaining conservativeness."

Well yeah of course but if you don't think it's reasonable that any algorithm could meet this requirement then you have to deny exactly one of the three things that you pointed out: the assumption of a true decision boundary, the monotonically better generalization, or the maintenance of conservativeness. I don't think it's so easy to pick one of these to deny without also denying the feasibility of automated ontology identification (from a finite narrow dataset, with a safety guarantee).

If you deny the existence of a true decision boundary then you're saying that there is just no fact of the matter about the questions that we're asking to automated ontology identification. How then would we get any kind of safety guarantee (conservativeness or anything else)?
If you deny the generalization then you're saying that there is some easy set where no matter which prediction problem you solve, there is just no way for the reporter to generalize beyond $E$ given a dataset that is sampled entirely within $E$ , not even by one single case. That's of course logically possible, but it would mean that automated ontology identification as we have defined it is impossible. You might say "yes but you have defined it wrong". But if we define it without any generalization guarantee at all then the problem is trivial: an automated ontology identifier can just memorize the dataset and refuse to answer any cases that were not literally present in the dataset. So we need some generalization guarantee. Maybe there is a better one than what we have used.
If you deny the maintenance of conservativeness then, again, that means automated ontology identification as we have defined it is impossible, and again you might say that we have defined it badly. But again, if we remove the conservative requirement completely then we can solve automated ontology identification by just returning random noise. So we need some safety guarantee. I suspect a lot of alternative safety guarantees are going to be susceptible to the same iteration scheme, but again I'm interested in safety guarantees that sidestep this issue.

I feel like this part is getting slippery with how words are used, in a way which is possibly concealing unimagined resolutions to the apparent tension.

Indeed. I also feel that this part is getting slippery with words. The fact that we don't have a formal impossibility result (or a formal understanding of why this line cannot lead to an impossibility result) indicates that there is further work needed to clarify what's really going on here.

Why can't I dodge the puzzle? Why can't I have an intended ELK reporter which answers the easy questions, and a small subset of the hard questions, without also being able to infinitely recurse an ELK solution to get better and better conservative reporters?

Fundamentally I do think you have to deny one of the 3 points above. That can be done, of course, but it seems to us that none of them are easy to deny without also denying the feasibility of automated ontology identification.

Thanks for these clarifying questions. It's been helpful to write up this reply.

[-]TurnTrout4y60

Thanks for your reply!

What we're actually doing is here is defining "automated ontology identification"

(Flagging that I didn't understand this part of the reply, but don't have time to reload context and clarify my confusion right now)

If you deny the existence of a true decision boundary then you're saying that there is just no fact of the matter about the questions that we're asking to automated ontology identification. How then would we get any kind of safety guarantee (conservativeness or anything else)?

When you assume a true decision boundary, you're assuming a label-completion of our intuitions about e.g. diamonds. That's the whole ball game, no?

But I don't see why the platonic "true" function has to be total. The solution does not have to be able to answer ambiguous cases like "the diamond is molecularly disassembled and reassembled", we can leave those unresolved, and let the reporter say "ambiguous." I might not be able to test for ambiguity-membership, but as long as the ELK solution can:

Know when the instance is easy,
Solve some unambiguous hard instances,
Say "ambiguous" to the rest,

Then a planner—searching for a "Yes, the diamond is safe" plan—can reasonably still end up executing plans which keep the diamond safe. If we want to end up in realities where we're sure no one is burning in a volcano, that's fine, even if we can't label every possible configuration of molecules as a person or not. The planner can just steer into a reality where it unambiguously resolves the question, without worrying about undefined edge-cases.

[-]Sam Marks4y20

It seems to me that the meaning of the set of cases drifts significantly from when it is first introduced and the "Implications" section. It further seems to me that clarifying what exactly $C$ is supposed to be resolves the claimed tension between the existence of iterably improvable ontology identifiers and difficulty of learning human concept boundaries.

Initially, $C$ is taken to be a set of cases such that the question $Q$ has an objective, unambiguous answer. Cases where the meaning of $Q$ are ambiguous are meant to be discarded. For example, if $Q$ is the question "Is the diamond in the vault?" then, on my understanding, $C$ ought to exclude cases where something happens which renders the concepts "the diamond" and "the vault" ambiguous, e.g. cases where the diamond is ground into dust.

In contrast, in the section "Implications," the existence of iterably improvably ontology identifiers is taken to imply that the resulting ontology identifier would be able to answer the question $Q$ posed in a much larger set of cases $C^{'}$ in which the very meaning of $Q$ relies on unspecified facts about the state of the world and how they interact with human values.

(For example, it seems to me that the authors think it implausible that an ontology identifier be able to answer a question like "Is the diamond in the vault?" in a case where the notion of "the vault" is ambiguous; the ontology identifier would need to first understand that what the human really wants to know is "Will I be able to spend my diamond?", reinterpret the former question in light of the latter, and then answer. I agree that an ontology identifier shouldn't be able to answer ambiguous and context-dependent questions like these, but it would seem to me that such cases should have been excluded from the set $C$ .)

To dig into where specifically I think the formal argument breaks down, let me write out (my interpretation) of the central claim on iterability in more detail. The claim is:

Claim: Suppose there exists an initial easy set $E_{0} \subseteq C$ such that for any $E_{0} \subset E ⊊ C$ , we can find a predictor that does useful computation with respect to $E$ . Then we can find a reporter that answers all cases in $C$ correctly.

This seems right to me (modulo more assumptions on "we can find," not-too-largeness of the sets, etc.). But crucially, since the hypothesis quantifies over all sets $E$ such that $E_{0} \subseteq E ⊊ C$ , this hypothesis becomes stronger the larger $C$ is. In particular, if $C$ were taken to include cases where the meaning of $Q$ were fraught or context-dependent, then we should already have strong reason to doubt that this hypothesis is true (and therefore not be surprised when assuming the hypothesis produces counterintuitive results).

(Note that the ELK document is sensitive to concerns about questions being philosophically fraught, and only considers narrow ELK for cases where questions have unambiguous answers. It also seems important that part of the set-up of ELK is that the reporter must "know" the right answers and "understand" the meanings of the questions posed in natural language (for some values of "know" and "understand") in order for us to talk about eliciting its knowledge at all.)

[-]leogao4y10

My understanding of the argument: if we can always come up with a conservative reporter (one that answers yes only when the true answer is yes), and this reporter can label at least one additional data point that we couldn't label before, we can use this newly expanded dataset to pick a new reporter, feed this process back into itself ad infinitum to label more and more data, and the fixed point of iterating this process is the perfect oracle. This would imply an ability to solve arbitrary model splintering problems, which seems like it would need to either incorporate a perfect understanding of human value extrapolation baked into the process somehow, or implies that such an automated ontology identification process is not possible.

Personally, I think that it's possible that we don't really need a lot of understanding of human values beyond what we can extract from the easy set to figure out "when to stop." There seems to be a pretty big set where it's totally unambiguous what it means for the diamond to be there, and yet humans are unable to label it, and to me this seems like the main set of things we're worried about with ELK.

[-]Alex Flint4y*20

My understanding of the argument: if we can always come up with a conservative reporter (one that answers yes only when the true answer is yes), and this reporter can label at least one additional data point that we couldn't label before, we can use this newly expanded dataset to pick a new reporter, feed this process back into itself ad infinitum to label more and more data, and the fixed point of iterating this process is the perfect oracle. This would imply an ability to solve arbitrary model splintering problems, which seems like it would need to either incorporate a perfect understanding of human value extrapolation baked into the process somehow, or implies that such an automated ontology identification process is not possible.

That is a good summary of the argument.

Personally, I think that it's possible that we don't really need a lot of understanding of human values beyond what we can extract from the easy set to figure out "when to stop."

Thanks for this question.

Consider a problem involving robotic surgery of somebody's pet dog, and suppose that there is a plan that would transform all the dog's entire body as if it were mirror-imaged (left<->right). This dog will have a perfectly healthy body, but it will actually be unable to consume any food currently on Earth because the chirality of its biology will render it incompatible with the biology of any other living thing on the planet, so it will starve. We would not want to execute this plan and there are of course ordinary questions that, if we think to ask them, will reveal that the plan is bad ("will the dog be able to eat food from planet Earth?"). But if we only think to ask "is the dog healthy?" then we would really like our system not to try to extrapolate concepts of "healthy" to cases where a dog's entire body is mirror-imaged. But how would any system know that this particular simple geometric transformation (mirror-imaging) is dangerous, while other simple geometrical transformations on the dog's body -- such as physically moving or rotating the dog's body in space -- are benign? I think it would have to know what we value with respect to the dog.

To make this example sharper, let the human providing the training examples be someone alive today who has no idea that biological chirality is a thing. How surprised they would be that their seemingly-healthy pet dog died a few days after the seemingly-successful surgery. Even if they did an autopsy on their dog to find out why it died, it would appear that all the organs and such were perfectly healthy and normal. It would be quite strange to them.

Now what if it was the diamond that was mirror-imaged inside the vault? Is the diamond still in the vault? Yeah, of course, we don't care about a diamond being mirror imaged. It seems to me that the reason we don't care in the case of the diamond is that the qualities we value in a diamond are not affected by mirror-imaging.

Now you might say that atomically mirror-imaging a thing should always be classified as "going too far" in conceptual extrapolation, and that an oracle should refuse to answer questions like "is the diamond in the vault?" or "is the dog healthy?" when the thing under consideration is a mirror image of its original. I agree this kind of refusal would be good, I just think it's hard to build this without knowing a lot about our values. Why is it that some basic geometric operations like translation and rotation are totally fine to extrapolate over, but mirror-imaging is not? We could hard-code that mirror-imaging is "going too far" but then we just come to another example of the same phenomenon.

[-]leogao4y10

I agree that there will be cases where we have ontological crises where it's not clear what the answer is, i.e whether the mirrored dog counts as "healthy". However, I feel like the thing I'm pointing at is that there is some sort of closure of any given set of training examples where, for some fairly weak assumptions, we can know that everything in this expanded set is "definitely not going too far". As a trivial example, anything that is a direct logical consequence of anything in the training set would be part of the completion. I expect any ELK solutions to look something like that. This corresponds directly to the case where the ontology identification process converges to some set smaller than the entire set of all cases.