The Pointers Problem

Edited by Johannes C. Mayer, 1point7point4, et al. last updated 18th Oct 2024

Consider an agent with a model of the world W. How does W relate to the real world? W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a function f with W_chair ↦ R_chair.

The pointers problem is about figuring out f.

In John's words (who introduced the concept here):

What functions of what variables (if any) in the environment and/or another world-model correspond to the latent variables in the agent’s world-model?

This relates to alignment, as we would like an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing. Therefore we'd like to figure out how to point to our values directly.

Posts tagged The Pointers Problem

30The Pointers Problem: Clarifications/Variations

abramdemski

5y

15

53The Pointers Problem: Human Values Are A Function Of Humans' Latent Variables

johnswentworth

5y

34

32Don't design agents which exploit adversarial inputs

TurnTrout, Garrett Baker

3y