Whether or not details (and lots of specific detail arguments) matter hinges on the sensitivity argument (which is an argument about basins?) in general, so I'd like to see that addressed directly. What are the arguments for high sensitivity worlds other than anthropics? What is the detailed anthropic argument?
Rambling/riffing: Boundaries typically need holes in order to be useful. Depending on the level of abstraction, different things can be thought of as holes. One way to think of a boundary is a place where a rule is enforced consistently, and this probably involves pushing what would be a continuous condition into a condition with a few semi discrete modes (in the simplest case enforcing a bimodal distribution of outcomes). In practice, living systems seem to have settled on stacking a bunch of one dimensional gate keepers together as presumably the modularity of such a thing was easier to discover in the search space than things with higher path dependencies due to entangled condition measurement. This highlights the similarity between boolean circuit analysis and a biological boundary. In a boolean circuit, the configurations of 'cheap' energy flows/gradients can be optimized for benefit, while the walls to the vast alternative space of other configurations can be artificially steepened/shored up (see: mitigation efforts to prevent electron tunneling in semiconductors).
Many proposals seem doomed to me because they involve one or multiple steps where they assume a representation, then try to point to robust relations in the representation and hope they'll hold in the territory. This wouldn't be so bad on its own but when pointed to it seems like handwaving happens rather than something more like conceptual engineering. I am relatively more hopeful about John's approach as being one that doesn't fail to halt and catch fire at these underspecified steps in other plans. In other areas like math and physics we try to get the representation to fall out of the model by sufficiently constraining the model. I would prefer to try to pin down a doomed model than stay in hand wave land because at least in the process of pinning down the doomed model you might get reusable pieces for an eventual non doomed model. Was happy about eg quantilizers for basically the same reason.
This is exactly what I was thinking about though, this idea of monitoring every human on earth seems like a failure of imagination on our part. I'm not safe from predators because I monitor the location of every predator on earth. I admit that many (overwhelming majority probably) of scenarios in this vein are probably pretty bad and involve things like putting only a few humans on ice while getting rid of the rest.
I guess the threat model relies on the overhang. If you need x compute for powerful ai, then you need to control more than all the compute on earth minus x to ensure safety, or something like that. Controlling the people probably much easier.
New-to-me thought I had in response to the kill all humans part. When predators are a threat to you, you of course shoot them. But once you invent cheap tech that can control them you don't need to kill them anymore. The story goes that the AI would kill us either because we are a threat or because we are irrelevant. It seems to me that (and this imports a bunch of extra stuff that would require analysis to turn this into a serious analysis, this is just an idle thought), the first thing I do if I am superintelligent and wanting to secure my position is not take over the earth, which isn't in a particularly useful spot resource wise and instead launch my nanofactory beyond the reach of humans to mercury or something. Similarly, in the nanomachines in everyone's blood that can kill them instantly class of ideas, why do I need at that point to actually pull the switch? I.e. the kill all humans scenario is emotionally salient but doesn't actually clearly follow the power gradients that you want to climb for instrumental convergence reasons?
I would summarize a dimension of the difficulty like this. There are the conditions that give rise to intellectual scenes, intellectual scenes being necessary for novel work in ambiguous domains. There are the conditions that give rise to the sort of orgs that output actions consistent with something like Six Dimensions of Operational Adequacy. The intersection of these two things is incredibly rare but not unheard of. The Manhattan Project was a Scene that had security mindset. This is why I am not that hopeful. Humans are not the ones building the AGI, egregores are, and spending egregore sums of money. It is very difficult for individuals to support a scene of such magnitude, even if they wanted to. Ultra high net worth individuals seem much poorer relative to the wealth of society than in the past, where scenes and universities (a scene generator) could be funded by individuals or families. I'd guess this is partially because the opportunity cost for smart people is much higher now, and you need to match that (cue title card: Baumol's cost disease kills everyone). In practice I expect some will give objections along various seemingly practical lines, but my experience so far is that these objections are actually generated by an environment that isn't willing to be seen spending gobs of money on low status researchers who mostly produce nothing. i.e. funding the 90%+ percent of a scene that isn't obviously contributing to the emergence of a small cluster that actually does the thing.
As recent experience has shown, exponential processes don't need to be smarter than us to utterly upend our way of life. They can go from a few problems here and there to swamping all other considerations in a span of time too fast to react to, if preparations aren't made and those knowledgeable don't have the leeway to act. We are in the early stages of an exponential increase in the power of AI algorithms over human life, and people who work directly on these problems are sounding the alarm right now. It is plausible that we will soon have processes that can escape the lab just as a virus can, and we as a species are pouring billions into gain-of-function research for these algorithms, with little concomitant funding or attention paid to the safety of such research.
I particularly appreciate the questions that ask one to look at a way that a problem was reified/specified/ontologized in a particular domain and asks for alternative such specifications. I thought Superintelligence (2014) might be net harmful because it introduced a lot of such specifications that I then noticed were hard to think around. I think there are a subset of prompts from the online course/book Framestorming that might be useful there, I'll go see if I can find them.
This seems similar to the SR model of scientific explanation.