Stuart Armstrong


Subagents and impact measures

If I were a well-intentioned AI...


Dynamic inconsistency of the inaction and initial state baseline

Why do the absolute values cancel?

Because , so you can remove the absolute values.

Tradeoff between desirable properties for baseline choices in impact measures

I also think the pedestrian example illustrates why we need more semantic structure: "pedestrian alive" -> "pedestrian dead" is bad, but "pigeon on road" -> "pigeon in flight" is fine.

Models, myths, dreams, and Cheshire cat grins

Thanks! Good insights there. Am reproducing the comment here for people less willing to click through:

I haven't read the literature on "how counterfactuals ought to work in ideal reasoners" and have no opinion there. But the part where you suggest an empirical description of counterfactual reasoning in humans, I think I basically agree with what you wrote.

I think the neocortex has a zoo of generative models, and a fast way of detecting when two are compatible, and if they are, snapping them together like Legos into a larger model.

For example, the model of "falling" is incompatible with the model of "stationary"—they make contradictory predictions about the same boolean variables—and therefore I can't imagine a "falling stationary rock". On the other hand, I can imagine "a rubber wine glass spinning" because my rubber model is about texture etc., my wine glass model is about shape and function, and my spinning model is about motion. All 3 of those models make non-contradictory predictions (mostly because they're issuing predictions about non-overlapping sets of variables), so the three can snap together into a larger generative model.

So for counterfactuals, I suppose that we start by hypothesizing some core of a model ("a bird the size of an adult blue whale") and then searching out more little generative model pieces that can snap onto that core, growing it out as much as possible in different ways, until you hit the limits where you can't snap on any more details without making it unacceptably self-contradictory. Something like that...

Cortés, Pizarro, and Afonso as Precedents for Takeover

...which also means that they didn't have an empire to back them up?

Cortés, Pizarro, and Afonso as Precedents for Takeover

Thanks for your research, especially the Afonso stuff. One question for that: were these empires used to gaining/losing small pieces of territory? ie did they really dedicate all their might to getting these ports back, or did they eventually write them off as minor losses not worth the cost of fighting (given Portuguese naval advantages)?

Cortés, Pizarro, and Afonso as Precedents for Takeover

Based on what I recall reading about Pizzaro's conquest, I feel you might be underestimating the importance of horses. It took centuries for European powers to figure out how to break a heavy cavalry charge with infantry; the amerindians didn't have the time to figure it out (see various battles where small cavalry forces routed thousands of troops). Once they had got more used to horses, later Inca forces (though much diminished) were more able to win open battles against the Spanish.

Maybe this was the problem for these empires: they were used to winning open battles, but were presented with a situation where only irregular warfare or siege defences could win. They reacted as an empire, when they should have been reacting as a recalcitrant province.

Reward functions and updating assumptions can hide a multitude of sins

My main note is that my comment was just about the concept of rigging a learning process given a fixed prior over rewards. I certainly agree that the general strategy of "update a distribution over reward functions" has lots of as-yet-unsolved problems.

Ah, ok, I see ^_^ Thanks for making me write this post, though, as it has useful things for other people to see, that I had been meaning to write up for some time.

On your main point: if the prior and updating process are over things that are truly beyond the AI's influence, then there will be no rigging (or, in my terms: uninfluenceable->unriggable). But there are many things that look like this, that are entirely riggable. For example, "have a prior 50-50 on cake and death, and update according to what the programmer says". This seems to be a prior-and-update combination, but it's entirely riggable.

So, another way of seeing my paper is "this thing looks like a prior-and-update process. If it's also unriggable, then (given certain assumptions) it's truly beyond the AI's influence".

Load More