ryan_b — AI Alignment Forum

The single-agent MDP setting resolves my confusion; now it is just a curiosity with respect to directions future work might go. The action varies with discount rate result is essentially what interests me, so refocusing in the context of the single-agent case: what do you think of the discount rate being discontinuous?

So we are clear there isn't an obvious motivation for this, so my guess for the answer is something like "Don't know and didn't check because it cannot change the underlying intuition."

Generalizing the Power-Seeking Theorems

ryan_b5y10

I have a question about this conclusion:

When , you're strictly more likely to navigate to parts of the future which give you strictly more options (in a graph-theoretic sense). Plus, these parts of the future give you strictly more power.

What about the case where agents have different time horizons? My question is inspired by one of the details of an alternative theory of markets, the Fractal Market Hypothesis. The relevant detail is an investment horizon, which is how long an investor keeps the asset. To oversimplify, the theory argues that markets work normally with a lot of investors with different investment horizons; when uncertainty increases, investors shorten their horizons, and then when everyone's horizons get very short we have a panic.

I thought this might be represented by step function in the discount rate, but reviewing the paper it looks like $γ$ is continuous. It also occurs to me that this should be similar in terms of computation to setting $γ = 1$ and running it over fewer turns, but this doesn't seem like it would work as well for the case of modelling different discount rates on the same MDP.

Avoiding Side Effects in Complex Environments

ryan_b5y20

avoided side effects by penalizing shifts in the ability to achieve randomly generated goals.

Does this correspond to making the agent preserve general optionality (in the more colloquial sense, in case it is a term of art here)?

Does that mean that some specification of random goals would serve as an approximation of optionality?

It occurs to me that preserving the ability to pursue randomly generated goals doesn't necessarily preserve the ability of other agents to preserve goals. If I recall, that is kind of the theme of the instrumental power paper; as a concrete example of how they would combine, it feels like:

Add value to get money to advance goal X.
Don't destroy your ability to get money to advance goal X a little faster, in case you want to pursue randomly generated goal Y.

This preserves the ability to pursue goal Y (Z, A, B...) but it does not imply that other agents should be allowed to add value and get money.

How closely does this map, I wonder? It feels like including other agents in the randomly generated goals somehow would help, but that just does for the agents themselves and not for the agents goals.

Does a tuple of [goal(preserve agent),goal(preserve object of agent's goal)] do a good job of preserving the other agent's ability to pursue that goal? Can that be generalized?

...now to take a crack at the paper.

What are some non-purely-sampling ways to do deep RL?

Answer by ryan_bDec 05, 201960

This doesn't strike directly at the sampling question, but it is related to several of your ideas about incorporating the differentiable function: Neural Ordinary Differential Equations.

This is being exploited most heavily in the Julia community. The broader pitch is that they have formalized the relationship between differential equations and neural networks. This allows things like:

applying differential equation tricks to computing the outputs of neural networks
using neural networks to solve pieces of differential equations
using differential equations to specify the weighting of information

The last one is the most intriguing to me, mostly because it solves the problem of machine learning models having to start from scratch even in environments where information about the environment's structure is known. For example, you can provide it with Maxwell's Equations and then it "knows" electromagnetism.

There is a blog post about the paper and using it with the DifferentialEquations.jl and Flux.jl libraries. There is also a good talk by Christopher Rackauckas about the approach.

It is mostly about using ML in the physical sciences, which seems to be going by the name Scientific ML now.

Seeking Power is Often Convergently Instrumental in MDPs

ryan_b6y120

Strong upvote, this is amazing to me. On the post:

Another example of explaining the intuitions for formal results less formally. I strongly support this as a norm.
I found the graphics helpful, both in style and content.

Some thoughts on the results:

This strikes at the heart of AI risk, and to my inexpert eyes the lack of anything rigorous to build on or criticize as a mechanism for the flashiest concerns has been a big factor in how difficult it was and is to get engagement from the rest of the AI field. Even if the formalism fails due to a critical flaw, the ability to spot such a flaw is a big step forward.
The formalism of average attainable utility, and the explicit distinction from number of possibilities, provides powerful intuition even outside the field. This includes areas like warfare and business. I realize it isn't the goal, but I have always considered applicability outside the field as an important test because it would be deeply concerning for thinking about goal-directed behavior to mysteriously fail when applied to the only extant things which pursue goals.
I find the result aesthetically pleasing. This is not important, but I thought I would mention it.

Soft takeoff can still lead to decisive strategic advantage

ryan_b6y30

I claim that 1939 Germany would not be able to conquer western Europe. There are two reasons for this: first, 1939 Germany did not have reserves in fuel, munitions, or other key industrial inputs to complete the conquest when they began (even allowing for the technical disparities); second, the industrial base of 1910 Europe wasn't able to provide the volume or quality of inputs (particularly fuel and steel) needed to keep the warmachine running. Europe would fall as fast as 1939 German tanks arrived - but I expect those tanks to literally run out of gas. Of course if I am wrong about either of those two core arguments I would have to update.

I am not sure what lessons to draw about the AGI scenario in particular either; mostly I am making the case for extreme caution in the assumptions we make for modelling the problem. The Afghanistan example shows that capability and goals can't be disentangled the way we usually assume. Another particularly common one is the perfect information assumption. As an example, my current expectation in a slow takeoff scenario is multiple AGIs which each have Decisive Strategic Advantage windows at different times but do not execute it for uncertainty reasons. Strictly speaking, I don't see any reason why two different entities could not have Decisive Strategic Advantage simultaneously, in the same way the United States and Soviet Union both had extinction-grade nuclear arsenals.

Soft takeoff can still lead to decisive strategic advantage

ryan_b6y*50

I broadly agree that Decisive Strategic Advantage is still plausible under a slow takeoff scenario. That being said:

Objection to Claim 1A: transporting 1939 Germany back in time to 1910 is likely to cause a sudden and near-total collapse of their warmaking ability because 1910 lacked the international trade and logistical infrastructure upon which 1939 Germany relied. Consider the Blockade of Germany, and that Czarist Russia would not be able to provide the same trade goods as the Soviet Union did until 1941 (nor could they be invaded for them, like 1941-1945). In general I expect this objection to hold for any industrialized country or other entity.

The intuition I am pointing to with this objection is that strategic advantage, including Decisive Strategic Advantage, is fully contextual; what appear to be reasonable simplifying assumptions are really deep changes to the nature of the thing being discussed.

To reinforce this, consider that the US invasion of Afghanistan is a very close approximation of the 30 year gap you propose. At the time the invasion began, the major source of serious weapons in the country was the Soviet-Afghan War which ended in 1989, being either provided by the US covert alliance or captured from the Soviets. You would expect at least local strategic advantage vis-a-vis Afghanistan. Despite this, and despite the otherwise overwhelming disparities between the US and Afghanistan, the invasion was a political defeat for the US.

Decision Theory

ryan_b7y30

It was not until reading this that I really understood that I am in the habit of reasoning about myself as just a part of the environment.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments