Rob B's Shortform Feed

[-]Rob Bensinger3y120

Collecting all of the quantitative AI predictions I know of MIRI leadership making on Arbital (let me know if I missed any):

Aligning an AGI adds significant development time: Eliezer 95%
Almost all real-world domains are rich: Eliezer 80%
Complexity of value: Eliezer 97%, Nate 97%
Distant superintelligences can coerce the most probable environment of your AI: Eliezer 66%
Meta-rules for (narrow) value learning are still unsolved: Eliezer 95%
Natural language understanding of "right" will yield normativity: Eliezer 10%
Relevant powerful agents will be highly optimized: Eliezer 75%
Some computations are people: Eliezer 99%, Nate 99%
Sufficiently optimized agents appear coherent: Eliezer 85%

Some caveats:

Arbital predictions range from 1% to 99%.
I assume these are generally ~5 years old. Views may have shifted.
By default, I assume that the standard caveats for probabilities like these apply: I treat these as off-the-cuff ass numbers unless stated otherwise, products of 'thinking about the problem on and off for years and then querying my gut about what it expects to actually see', more so than of building Guesstimate models or trying to hard to make sure all the probabilities are perfectly coherent.

Inconsistencies are flags 'something is wrong here', but ass numbers are vague and unreliable enough that they're to be expected to some degree. Similarly, ass numbers are often unstable hour-to-hour and day-to-day.

[-]Rob Bensinger3y80

On my model, the point of ass numbers isn't to demand perfection of your gut (e.g., of the sort that would be needed to avoid multiple-stage fallacies when trying to conditionalize a lot), but to:

Communicate with more precision than English-language words like 'likely' or 'unlikely' allow. Even very vague or uncertain numbers will, at least some of the time, be a better guide than natural-language terms that weren't designed to cover the space of probabilities (and that can vary somewhat in meaning from person to person).
At least very vaguely and roughly bring your intuitions into contact with reality, and with each other, so you can more readily notice things like 'I'm miscalibrated', 'reality went differently than I expected', 'these two probabilities don't make sense together', etc.

It may still be a terrible idea to spend too much time generating ass numbers, since "real numbers" are not the native format human brains compute probability with, and spending a lot of time working in a non-native format may skew your reasoning.

(Maybe there's some individual variation here?)

But they're at least a good tool to use sometimes, for the sake of crisper communication, calibration practice (so you can generate non-awful future probabilities when you need to), etc.

[-]Rob Bensinger2y20

In the context of a conversation with Balaji Srinivasan about my AI views snapshot, I asked Nate Soares what sorts of alignment results would impress him, and he said:

example thing that would be relatively impressive to me: specific, comprehensive understanding of models (with the caveat that that knowledge may lend itself more (and sooner) to capabilities before alignment). demonstrated e.g. by the ability to precisely predict the capabilities and quirks of the next generation (before running it)
i'd also still be impressed by simple theories of aimable cognition (i mostly don't expect that sort of thing to have time to play out any more, but if someone was able to come up with one after staring at LLMs for a while, i would at least be impressed)
fwiw i don't myself really know how to answer the question "technical research is more useful than policy research"; like that question sounds to me like it's generated from a place of "enough of either of these will save you" whereas my model is more like "you need both"
tho i'm more like "to get the requisite technical research, aim for uploads" at this juncture
if this was gonna be blasted outwards, i'd maybe also caveat that, while a bunch of this is a type of interpretability work, i also expect a bunch of interpretability work to strike me as fake, shallow, or far short of the bar i consider impressive/hopeful
(which is not itself supposed to be any kind of sideswipe; i applaud interpretability efforts even while thinking it's moving too slowly etc.)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

7

7