All of ryan_b's Comments + Replies

Generalizing the Power-Seeking Theorems

The single-agent MDP setting resolves my confusion; now it is just a curiosity with respect to directions future work might go. The action varies with discount rate result is essentially what interests me, so refocusing in the context of the single-agent case: what do you think of the discount rate being discontinuous?

So we are clear there isn't an obvious motivation for this, so my guess for the answer is something like "Don't know and didn't check because it cannot change the underlying intuition."

1Alex Turner1y
Discontinuous with respect to what? The discount rate just is, and there just is an optimal policy set for each reward function at a given discount rate, and so it doesn't make sense to talk about discontinuity without having something to govern what it's discontinuous with respect to. Like, teleportation would be positionally discontinuous with respect to time. You can talk about other quantities being continuous with respect to change in the discount rate, however, and the paper proves prove the continuity of e.g. POWER and optimality probability with respect toγ∈[0,1].
Generalizing the Power-Seeking Theorems

I have a question about this conclusion:

When , you're strictly more likely to navigate to parts of the future which give you strictly more options (in a graph-theoretic sense). Plus, these parts of the future give you strictly more power.

What about the case where agents have different time horizons? My question is inspired by one of the details of an alternative theory of markets, the Fractal Market Hypothesis. The relevant detail is an investment horizon, which is how long an investor keeps the asset. To oversimplify, the theory argues tha... (read more)

2Alex Turner1y
What do you mean by "agents have different time horizons"? To answer my best guess of what you meant: this post used "most agents do X" as shorthand for "action X is optimal with respect to a large-measure set over reward functions", but the analysis only considers the single-agent MDP setting, and how, for a fixed reward function or reward function distribution, optimal action for an agent tends to vary with the discount rate. There aren't multiple formal agents acting in the same environment.
Avoiding Side Effects in Complex Environments

avoided side effects by penalizing shifts in the ability to achieve randomly generated goals.

Does this correspond to making the agent preserve general optionality (in the more colloquial sense, in case it is a term of art here)?

Does that mean that some specification of random goals would serve as an approximation of optionality?

It occurs to me that preserving the ability to pursue randomly generated goals doesn't necessarily preserve the ability of other agents to preserve goals. If I recall, that is kind of the theme of the instrumental power paper; as a ... (read more)

2Alex Turner2y
I think that intuitively, preserving value for a high-entropy distribution over reward functions should indeed look like preserving optionality. This assumes away a lot of the messiness that comes with deep non-tabular RL, however, and so I don't have a theorem linking the two yet. Yes, you're basically letting reward functions vote on how "big of a deal" an action is, where "big of a deal" inherits the meaning established by the attainable utility theory of impact [https://www.lesswrong.com/s/7CdoznhJaLEKHwvJW/p/C74F7QTEAYSTGAytJ]. Yup, that's very much true. I see this as the motivation for corrigibility [https://www.lesswrong.com/posts/Xts5wm3akbemk4pDa/non-obstruction-a-simple-concept-motivating-corrigibility] : if the agent preserves its own option value and freely lets us wield it to extend our own influence over the world, then that should look like preserving our option value.
What are some non-purely-sampling ways to do deep RL?

This doesn't strike directly at the sampling question, but it is related to several of your ideas about incorporating the differentiable function: Neural Ordinary Differential Equations.

This is being exploited most heavily in the Julia community. The broader pitch is that they have formalized the relationship between differential equations and neural networks. This allows things like:

  • applying differential equation tricks to computing the outputs of neural networks
  • using neural networks to solve pieces of differential equations
  • using differential equatio
... (read more)
2Evan Hubinger3y
This is really neat; thanks for the pointer!
Seeking Power is Often Convergently Instrumental in MDPs

Strong upvote, this is amazing to me. On the post:

  • Another example of explaining the intuitions for formal results less formally. I strongly support this as a norm.
  • I found the graphics helpful, both in style and content.

Some thoughts on the results:

  • This strikes at the heart of AI risk, and to my inexpert eyes the lack of anything rigorous to build on or criticize as a mechanism for the flashiest concerns has been a big factor in how difficult it was and is to get engagement from the rest of the AI field. Even if the formalism fails due to a critical flaw, t
... (read more)
Soft takeoff can still lead to decisive strategic advantage

I claim that 1939 Germany would not be able to conquer western Europe. There are two reasons for this: first, 1939 Germany did not have reserves in fuel, munitions, or other key industrial inputs to complete the conquest when they began (even allowing for the technical disparities); second, the industrial base of 1910 Europe wasn't able to provide the volume or quality of inputs (particularly fuel and steel) needed to keep the warmachine running. Europe would fall as fast as 1939 German tanks arrived - but I expect those tanks to literally run out of ... (read more)

2Daniel Kokotajlo3y
Hmmm, well maybe you are right. I am not a historian, just an armchair general. I look forward to thinking and learning more about this in the future. I like your point about DSA being potentially multiple & simultaneous.
Soft takeoff can still lead to decisive strategic advantage

I broadly agree that Decisive Strategic Advantage is still plausible under a slow takeoff scenario. That being said:

Objection to Claim 1A: transporting 1939 Germany back in time to 1910 is likely to cause a sudden and near-total collapse of their warmaking ability because 1910 lacked the international trade and logistical infrastructure upon which 1939 Germany relied. Consider the Blockade of Germany, and that Czarist Russia would not be able to provide the same trade goods as the Soviet Union did until 1941 (nor could they be invaded for them, like 1941-1... (read more)

I disagree about 1939 Germany--Sure, their economy would collapse, but they'd be able to conquer western europe before it collapsed, and use the resources and industry set up there. Even if they couldn't do that they would be able to reorient their economy in a year or two and then conquer the world.

I agree about the Afghanistan case but I'm not sure what lessons to draw from it for the AGI scenario in particular.

Decision Theory

It was not until reading this that I really understood that I am in the habit of reasoning about myself as just a part of the environment.

The kicker is that we don't reason directly about ourselves as such, we use a simplified model of ourselves. And we're REALLY GOOD at using that model for causal reasoning, even when it is reflective, and involves multiple levels of self-reflection and counterfactuals - at least when we bother to try. (We try rarely because explicit modelling is cognitively demanding, and we usually use defaults / conditioned reasoning. Sometimes that's OK.)

Example: It is 10PM. A 5-page report is due in 12 hours, at 10AM.

Default: Go to sleep at 1AM, set ala... (read more)