Vladimir Slepnev

Wiki Contributions


Why I'm co-founding Aligned AI

Can you describe what changed / what made you start feeling that the problem is solvable / what your new attack is, in short?

Inferring utility functions from locally non-transitive preferences

There's a bit of math directly relevant to this problem: Hodge decomposition of graph flows, for the discrete case, and vector fields, for the continuous case. Basically if you have a bunch of arrows, possibly loopy, you can always decompose it into a sum of two components: a "pure cyclic" one (no sources or sinks, stuff flowing in cycles) and a "gradient" one (arising from a utility function). No neural network needed, the decomposition is unique and can be computed explicitly. See this post, and also the comments by FactorialCode and me.

Reply to Eliezer on Biological Anchors

With these two points in mind, it seems off to me to confidently expect a new paradigm to be dominant by 2040 (even conditional on AGI being developed), as the second quote above implies. As for the first quote, I think the implication there is less clear, but I read it as expecting AGI to involve software well over 100x as efficient as the human brain, and I wouldn’t bet on that either (in real life, if AGI is developed in the coming decades—not based on what’s possible in principle.)

I think this misses the point a bit. The thing to be afraid of is not an all-new approach to replace neural networks, but rather new neural network architectures and training methods that are much more efficient than today's. It's not unreasonable to expect those, and not unreasonable to expect that they'll be much more efficient than humans, given how easy it is to beat humans at arithmetic for example, and given fast recent progress to superhuman performance in many other domains.

Considerations on interaction between AI and expected value of the future

To me it feels like alignment is a tiny target to hit, and around it there's a neighborhood of almost-alignment, where enough is achieved to keep people alive but locked out of some important aspect of human value. There are many aspects such that missing even one or two of them is enough to make life bad (complexity and fragility of value). You seem to be saying that if we achieve enough alignment to keep people alive, we have >50% chance of achieving all/most other aspects of human value as well, but I don't see why that's true.

Considerations on interaction between AI and expected value of the future

These involve extinction, so they don't answer the question what's the most likely outcome conditional on non-extinction. I think the answer there is a specific kind of near-miss at alignment which is quite scary.

Considerations on interaction between AI and expected value of the future

I think alignment is finicky, and there's a "deep pit around the peak" as discussed here.

General alignment plus human values, or alignment via human values?

There are very “large” impacts to which we are completely indifferent (chaotic weather changes, the above-mentioned change in planetary orbits, the different people being born as a consequence of different people meeting and dating across the world, etc.) and other, smaller, impacts that we care intensely about (the survival of humanity, of people’s personal wealth, of certain values and concepts going forward, key technological innovations being made or prevented, etc.)

I don't think we are indifferent to these outcomes. We leave them to luck, but that's a fact about our limited capabilities, not about our values. If we had enough control over "chaotic weather changes" to steer a hurricane away from a coastal city, we would very much care about it. So if a strong AI can reason through these impacts, it suddenly faces a harder task than a human: "I'd like this apple to fall from the table, and I see that running the fan for a few minutes will achieve that goal, but that's due to subtly steering a hurricane and we can't have that".

Considerations on interaction between AI and expected value of the future

I think the default non-extinction outcome is a singleton with near miss at alignment creating large amounts of suffering.

Soares, Tallinn, and Yudkowsky discuss AGI cognition

Yeah, I had a similar thought when reading that part. In agent-foundations discussions, the idea often came up that the right decision theory should quantify not over outputs or input-output maps, but over successor programs to run and delegate I/O to. Wei called it "UDT2".

Soares, Tallinn, and Yudkowsky discuss AGI cognition

“Though many predicted disaster, subsequent events were actually so slow and messy, they offered many chances for well-intentioned people to steer the outcome and everything turned out great!” does not sound like any particular segment of history book I can recall offhand.

I think the ozone hole and the Y2K problem fit the bill. Though of course that doesn't mean the AI problem will go the same way.

Load More