When I say an AI A is aligned with an operator H, I mean:
A is trying to do what H wants it to do.
The “alignment problem” is the problem of building powerful AI systems that are aligned with their operators.
This is significantly narrower than some other definitions of the alignment problem, so it seems important to clarify what I mean.
In particular, this is the problem of getting your AI to try to do the right thing, not the problem of figuring out which thing is right. An aligned AI would try to figure out which thing is right, and like a human it may or may not succeed.
Consider a human... (Read more)
[Epistemic status: Argument by analogy to historical cases. Best case scenario it's just one argument among many.]
I have on several occasions heard people say things like this:
The original Bostrom/Yudkowsky paradigm envisioned a single AI built by a single AI project, undergoing intelligence explosion all by itself and attaining a decisive strategic advantage as a result. However, this is very unrealistic. Discontinuous jumps in technological capability are very rare, and it is very implausible that one project could produce more innovations than the rest of the world combined. Instead... (Read more)
The View from 2018
In April of last year, I wrote up my confusions with Paul’s agenda, focusing mostly on approval directed agents. I mostly have similar opinions now; the main thing I noticed on rereading it was I talked about ‘human-sized’ consciences, when now I would describe them as larger than human size (since moral reasoning depends on cultural accumulation which is larger than human size). But on the meta level, I think they’re less relevant to Paul’s agenda than I thought then; I was confused about how Paul’s argument for alignment worked. (I do think my objections were correct object... (Read more)
This post is motivated by research intuitions that better formalisms in consciousness research contribute to agent foundations in more ways than just the value loading problem. Epistemic status: speculative.
David Marr's levels of analysis is the idea that any analysis of a system involves analyzing it at multiple, distinct levels of abstraction. These levels are the computational, which describes what it is the system is trying to do, the algorithmic, which describes which algorithms the system instantiates in order to accomplish that goal, and the implementation level, describing the har... (Read more)
In this post, I clarify how far we are from a complete solution to decision theory, and the way in which high-level philosophy relates to the mathematical formalism. I’ve personally been confused about this in the past, and I think it could be useful to people who casually follows the field. I also link to some less well-publicized approaches.
The first disagreement you might encounter when reading about alignment-related decision theory is the disagreement between Causal Decision Theory (CDT), Evidential Decision Theory (EDT), and different logical decision theories emerging from MIRI and less... (Read more)