This is exactly what I was thinking about though, this idea of monitoring every human on earth seems like a failure of imagination on our part. I'm not safe from predators because I monitor the location of every predator on earth. I admit that many (overwhelming majority probably) of scenarios in this vein are probably pretty bad and involve things like putting only a few humans on ice while getting rid of the rest.

I guess the threat model relies on the overhang. If you need x compute for powerful ai, then you need to control more than all the compute on earth minus x to ensure safety, or something like that. Controlling the people probably much easier.

New-to-me thought I had in response to the kill all humans part. When predators are a threat to you, you of course shoot them. But once you invent cheap tech that can control them you don't need to kill them anymore. The story goes that the AI would kill us either because we are a threat or because we are irrelevant. It seems to me that (and this imports a bunch of extra stuff that would require analysis to turn this into a serious analysis, this is just an idle thought), the first thing I do if I am superintelligent and wanting to secure my position is not take over the earth, which isn't in a particularly useful spot resource wise and instead launch my nanofactory beyond the reach of humans to mercury or something. Similarly, in the nanomachines in everyone's blood that can kill them instantly class of ideas, why do I need at that point to actually pull the switch? I.e. the kill all humans scenario is emotionally salient but doesn't actually clearly follow the power gradients that you want to climb for instrumental convergence reasons?

I would summarize a dimension of the difficulty like this. There are the conditions that give rise to intellectual scenes, intellectual scenes being necessary for novel work in ambiguous domains. There are the conditions that give rise to the sort of orgs that output actions consistent with something like Six Dimensions of Operational Adequacy. The intersection of these two things is incredibly rare but not unheard of. The Manhattan Project was a Scene that had security mindset. This is why I am not that hopeful. Humans are not the ones building the AGI, egregores are, and spending egregore sums of money. It is very difficult for individuals to support a scene of such magnitude, even if they wanted to. Ultra high net worth individuals seem much poorer relative to the wealth of society than in the past, where scenes and universities (a scene generator) could be funded by individuals or families. I'd guess this is partially because the opportunity cost for smart people is much higher now, and you need to match that (cue title card: Baumol's cost disease kills everyone). In practice I expect some will give objections along various seemingly practical lines, but my experience so far is that these objections are actually generated by an environment that isn't willing to be seen spending gobs of money on low status researchers who mostly produce nothing. i.e. funding the 90%+ percent of a scene that isn't obviously contributing to the emergence of a small cluster that actually does the thing.

[$20K in Prizes] AI Safety Arguments Competition

As recent experience has shown, exponential processes don't need to be smarter than us to utterly upend our way of life. They can go from a few problems here and there to swamping all other considerations in a span of time too fast to react to, if preparations aren't made and those knowledgeable don't have the leeway to act. We are in the early stages of an exponential increase in the power of AI algorithms over human life, and people who work directly on these problems are sounding the alarm right now. It is plausible that we will soon have processes that can escape the lab just as a virus can, and we as a species are pouring billions into gain-of-function research for these algorithms, with little concomitant funding or attention paid to the safety of such research.

Alignment research exercises

I particularly appreciate the questions that ask one to look at a way that a problem was reified/specified/ontologized in a particular domain and asks for alternative such specifications. I thought Superintelligence (2014) might be net harmful because it introduced a lot of such specifications that I then noticed were hard to think around. I think there are a subset of prompts from the online course/book Framestorming that might be useful there, I'll go see if I can find them.

Abstractions as Redundant Information

This seems similar to the SR model of scientific explanation.

Counterexamples to some ELK proposals

It seems like the frame of some of the critique is that humans are the authority on human values and want to ensure that the AI doesn't escape that authority in illegible ways. To me it seems like the frame is more like we know that the sensors we have are only goodhartedly entangled with the things we care about and would ourselves prefer the less goodharted hypothetical sensors if we knew how to construct them. And that we'd want the AI to be inhabiting the same frame as us since, to take a page from Mad Investor Chaos, we don't know how lies will propagate through an alien architecture.

I don't know how 'find less goodharted sensors' is instantiated on natural hardware or might have toy versions implemented algorithmically, seems like it would be worth trying to figure out. In a conversation, John mentioned a type of architecture that is forced through an information bottleneck to find a minimal representation of the space. Seemed like a similar direction.

Morality is Scary

You may not be interested in mutually exclusive compression schemas, but mutually exclusive compression schemas are interested in you. One nice thing is that given that the schemas use an arbitrary key to handshake with there is hope that they can be convinced to all get on the same arbitrary key without loss of useful structure.

Biology-Inspired AGI Timelines: The Trick That Never Works

Spoiler tags are borked the way I'm using them.

anyway, another place to try your hand at calibration:

Humbali: No. You're expressing absolute certainty in your underlying epistemology and your entire probability distribution

no he isn't, why?

Humbali is asking for Eliezer to double count evidence. Consilience is hard if you don't do your homework on provenance of heuristic and not just naively counting up outputs who themselves also didn't do their homework.

Or in other words: "Do not cite the deep evidence to me, I was there when it was written"

And another place to take a whack at:

I'm not sure how to lead you into the place where you can dismiss that thought with confidence.

The particular cited example of statusy aliens seems like extreme hypothesis privileging, which often arises from reference class tennis.

