I don't see why the gradient with respect to x0 ever changes, and so am confused about why it would ever stop increasing in the x0 direction.
Looks like the splotch functions are each a random mixture of sinusoids in each direction - so each splotchj will have some variation along x0. The argument of splotchj is all of x, not just xj.
Very nice work. The graphs in particular are quite striking.
I sat down and thought for a bit about whether that objective function is actually a good model for the behavior we're interested in. Twice I thought I saw an issue, then looked back at the definition and realized you'd set up the function to avoid that issue. Solid execution; I think you have actually constructed a demonic environment.
In the ball example, it's the selection process that's interesting - the ball ending up rolling alongside one bump or another, and bumps "competing" in the sense that the ball will eventually end up rolling along at most one of them (assuming they run in different directions).
Couldn't you say a local minima involves a secondary optimizing search process that has that minima as its objective?
Only if such a search process is actually taking place. That's why it's key to look at the process, rather than the bumps and valleys themselves.
To use your ball analogy, what exactly is the difference between these twisty demon hills and a simple crater-shaped pit?
There isn't inherently any important difference between those two. That said, there are some environments in which "bumps" which effectively steer a ball will tend to continue to do so in the future, and other environments in which the whole surface is just noise with low spatial correlation. The latter would not give rise to demons (I think), while the former would. This is part of what I'm still confused about - what, quantitatively, are the properties of the environment necessary for demons to show up?
Does that help clarify, or should I take another stab at it?
I love the example, I'd never heard of that project before.
I'm agnostic on demonic intelligence. I think the key point is not the demons themselves but the process which produces them. Somehow, an imperfect optimizing search process induces a secondary optimizer, and it's that secondary optimizer which produces the demons. For instance, in the metabolism example, evolution is the secondary optimizer, and its goals are (often) directly opposed to the original optimizer - it wants to conserve free energy, in order to "trade" with the free energy optimizer later. The demons themselves (i.e. cells/enzymes in the metabolism example) are inner optimizers of the secondary optimizer; I expect that Risks From Learned Optimization already describes the secondary optimizer <-> demon relationship fairly well, including when the demons will be more/less intelligent.
The interesting/scary point is that the secondary optimizer is consistently opposed to the original optimizer; the two are basically playing a game where the secondary tries to hide information from the original.
Updated the long paragraph in the fable a bit, hopefully that will help somewhat. It's hard to make it really concrete when I don't have a good mathematical description of how these things pop up; I'm not sure which aspects of the environment make it happen, so I don't know what to emphasize.
Intuition: "It seems like a good idea to keep one's rules/policies simple."
Me: "Ok, why? What are some prototypical examples where keeping one's rules/policies simple would be useful?"
Intuition: "Well, consider legal codes - a complex criminal code turns into de-facto police discretion. A complex civil code turns into patent trolls and ambulance chasers, people who specialize in understanding the complexity and leveraging it against people who don't. On personal level, in order for others to take your precommitments seriously, those precommitments need to be both easily communicated and clearly delineate lines which must not be crossed - otherwise people will plead ignorance or abuse grey areas, respectively. On an internal level, your rules necessarily adjust as the world throws new situations at you - new situations just aren't covered by old rules, unless we keep those old rules very simple."
Me: "Ok, trying to factor out the common denominator there... complexity implies grey areas. And when we try to make precommitments (i.e. follow general policies), grey areas can be abused by bad actors."
Intuition: <mulls for a minute> "Yeah, that sounds about right. Complexity is bad because grey areas are bad, and complexity creates grey areas."
Me: "The problem with grey areas is simple enough; Thomas Schelling already offers good models of that. But why does complexity necessarily imply grey areas in the policy? Is that an inherent feature of complexity in general, or is it specific to the kinds of complexity we're imagining?"
Intuition: "A complex computer program might not have any grey areas, but as a policy that would have other problems..."
Me: "Like what?"
Intuition: "Well, my knee-jerk is to say it won't actually be a policy we want, but when I actually picture it... it's more like, the ontology won't match the real world."
Me: "And a simple policy would match real-world ontology better than a complex one? That not what I usually hear people say..."
Intuition: "Ok, imagine I draw a big circle on the ground, and say 'stay out of this circle'. But some places there's patches of grass or a puddle or whatever, so the boundary isn't quite clear - grey areas. Now instead of a circle, I make it some complicated fractal shape. Then there will be more grey areas."
Me: "Why would there be more - oh wait, I see. Surface area. More surface area means more grey areas. That makes sense."
Intuition: "Right, exactly. Grey areas, in practice, occur in proportion to surface area. More complexity means more surface means more grey areas means more abuse of the rules."
Thanks for the pointer, sounds both relevant and useful. I'll definitely look into it.
Very interesting, thank you for the link!
Main difference between what they're doing and what I'm doing: they're using explicit utility & maximization nodes; I'm not. It may be that this doesn't actually matter. The representation I'm using certainly allows for utility maximization - a node downstream of a cloud can just be a maximizer for some utility on the nodes of the cloud-model. The converse question is less obvious: can any node downstream of a cloud be represented by a utility maximizer (with a very artificial "utility")? I'll probably play around with that a bit; if it works, I'd be able to re-use the equivalence results in that paper. If it doesn't work, then that would demonstrate a clear qualitative difference between "goal-directed" behavior and arbitrary behavior in these sorts of systems, which would in turn be useful for alignment - it would show a broad class of problems where utility functions do constrain.