David Krueger


AGI safety from first principles: Control

Which previous arguments are you referring to?

Radical Probabilism [Transcript]

Abram Demski: But it's like, how do you do that if “I don't have a good hypothesis” doesn't make any predictions?

One way you can imagine this working is that you treat “I don't have a good hypothesis” as a special hypothesis that is not required to normalize to 1.  
For instance, it could say that observing any particular real number, r, has probability epsilon > 0.
So now it "makes predictions", but this doesn't just collapse to including another hypothesis and using Bayes rule.

You can also imagine updating this special hypothesis (which I called a "Socratic hypothesis" in comments on the original blog post on Radical Probabilism) in various ways. 

[AN #118]: Risks, solutions, and prioritization in a world with many AI systems

Regarding ARCHES, as an author:

  • I disagree with Critch that we should expect single/single delegation(/alignment) to be solved "by default" because of economic incentives.  I think economic incentives will not lead to it being solved well-enough, soon enough (e.g. see:
     https://www.lesswrong.com/posts/DmLg3Q4ZywCj6jHBL/capybaralet-s-shortform?commentId=wBc2cZaDEBX2rb4GQ)  I guess Critch might put this in the "multi/multi" camp, but I think it's more general (e.g. I attribute a lot of the risk here to human irrationality/carelessness)
  • RE: "I find the argument less persuasive because we do have governance, regulations, national security etc. that would already be trying to mitigate issues that arise in multi-multi contexts, especially things that could plausibly cause extinction"... 1) These are all failing us when it comes to, e.g. climate change.  2) I don't think we should expect our institutions to keep up with rapid technological progress (you might say they are already failing to...).  My thought experiment from the paper is: "imagine if everyone woke up 1000000x smarter tomorrow."  Our current institutions would likely not survive the day and might or might not be improved quickly enough to keep ahead of bad actors / out-of-control conflict spirals.
[AN #118]: Risks, solutions, and prioritization in a world with many AI systems

these usually don’t assume “no intervention from longtermists”

I think the "don't" is a typo?

Why GPT wants to mesa-optimize & how we might change this

By managing incentives I expect we can, in practice, do things like: "[telling it to] restrict its lookahead to particular domains"... or remove any incentive for control of the environment.

I think we're talking past each other a bit here.

Why GPT wants to mesa-optimize & how we might change this

My intuitions on this matter are:
1) Stopping mesa-optimizing completely seems mad hard.
2) Managing "incentives" is the best way to deal with this stuff, and will probably scale to something like 1,000,000x human intelligence. 
3) On the other hand, it's probably won't scale forever.

To elaborate on the incentive management thing... if we figure that stuff out and do it right and it has the promise that I think it does... then it won't restrict lookahead to particular domains, but it will remove incentives for instrumental goal seeking.  

If we're still in a situation where the AI doesn't understand its physical environment and isn't incentivized to learn to control it, then we can do simple things like use a fixed dataset (as opposed to data we're collecting online) in order to make it harder for the AI to learn anything significant about its physical environment. 

Learning about the physical environment and using it to improve performance is not necessarily bad/scary absent incentives for control.  However, I worry that having a good world model makes an AI much more liable to infer that it should try to control and not just predict the world.

Why GPT wants to mesa-optimize & how we might change this

I didn't read the post (yet...), but I'm immediately skeptical of the claim that beam search is useful here ("in principle"), since GPT-3 is just doing next step prediction (it is never trained on its own outputs, IIUC). This means it should always just match the conditional P(x_t | x_1, .., x_{t-1}). That conditional itself can be viewed as being informed by possible future sequences, but conservation of expected evidence says we shouldn't be able to gain anything by doing beam search if we already know that conditional. Now it's true that efficiently estimating that conditional using a single forward pass of a transformer might involve approximations to beam search sometimes.

At a high level, I don't think we really need to be concerned with this form of "internal lookahead" unless/until it starts to incorporate mechanisms outside of the intended software environment (e.g. the hardware, humans, the external (non-virtual) world).

Why GPT wants to mesa-optimize & how we might change this

Seq2seq used beam search and found it helped (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43155.pdf). It was standard practice in the early days of NMT; I'm not sure when that changed.

This blog post gives some insight into why beam search might not be a good idea, and is generally very interesting: https://benanne.github.io/2020/09/01/typicality.html

Radical Probabilism

This blog post seems superficially similar, but I can't say ATM if there are any interesting/meaningful connections:


Load More