Chris van Merwijk

Wiki Contributions


AGI Ruin: A List of Lethalities

Here is my partial honest reaction, just two points I'm somewhat dissatisfied with (not meant to be exhaustive):
2. "A cognitive system with sufficiently high cognitive powers, given any medium-bandwidth channel of causal influence, will not find it difficult to bootstrap to overpowering capabilities independent of human infrastructure." I would like there to be an argument for this claim that doesn't rely on nanotech, and solidly relies on actually existing amounts of compute. E.g. if the argument relies on running intractable detailed simulations of proteins, then it doesn't count. (I'm not disagreeing with the nanotech example by the way, or saying that it relies on unrealistic amounts of compute, I'd just like to have an argument for this that is very solid and minimally reliant on speculative technology, and actually shows that it is).
6. "We need to align the performance of some large task, a 'pivotal act' that prevents other people from building an unaligned AGI that destroys the world.". You name "burn all GPU's" as an "overestimate for the rough power level of what you'd have to do", but it seems to me that it would be too weak of a pivotal act? Assuming there isn't some extreme change in generally held views, people would consider this an extreme act of terrorism, and shut you down, put you in jail, and then rebuild the GPU's and go on with what they were planning to do. Moreover, now there is probably an extreme taboo on anything AI safety related. (I'm assuming here that law enforcement finds out that you were the one who did this). Maybe the idea is to burn all GPU's indefinitely and forever (i.e. leave nanobots that continually check for GPU's and burn them when they are created), but even this seems either insufficient or undesirable long term depending on what is counted as a GPU. Possibly I'm not getting what you mean, but it just seems completely too weak as an act. 

Comments on CAIS

Responding to this very late, but: If I recall correctly, Eric has told me in personal conversation that CAIS is a form of AGI, just not agent-like AGI. I suspect Eric would agree broadly with Richard's definition.

Optimal play in human-judged Debate usually won't answer your question

"I talk about consequentialists, but not rational consequentialists", ok this was not the impression I was getting. 

Optimal play in human-judged Debate usually won't answer your question

Reading this post a while after it was written: I'm not going to respond to the main claim (which seems quite likely) but just to the specific arguments, which seems suspicious to me. Here are some points:

  • In my model of the standard debate setup with human judge, the human can just use both answers in whichever way it wants, independently of which it selects as the correct answer. The fact that one answer provides more useful information than "2+2=?" doesn't imply a "direct" incentive for the human judge to select that as the correct answer. Upon introspection, I myself would probably say that "4" is the correct answer, while still being very interested in the other answer (the answer on AI risk). I don't think you disagreed with this?
  • At a later point you say that the real reason for why the judge would nevertheless select the QIA as the correct answer is that the judge wants to train the system to do useful things. You seem to say that a rational consequentialist would make this decision. Then at a later point you say that this is probably/plausibly (?) a bad thing: "Is this definitely undesirable? I'm not sure, but probably". But if it really is a bad thing and we can know this, then surely a rational judge would know this, and could just decide not to do it? If you were the judge, would you select the QIA, despite it being "probably undesirable"? 
  • Given that we are talking about optimal play and the human judge is in fact not rational/safe, the debater could manipulate the judge, and so the previous argument doesn't in fact imply that judges won't select QIA's. The debater could deceive and manipulate the judge into (incorrectly) thinking that it should select the QIA, even if you/we currently believe that this would be bad. I agree this kind of deception would probably happen in optimal play (if that is indeed what you meant), but it relies on the judge being irrational or manipulable, not on some argument that "it is rational for a consequentialist judge to select answers with the highest information value".

It seems to me that either we think there is no problem with selecting QIA's as answers, or we think that human judges will be irrational and manipulated, but I don't see the justification in this post for saying "rational consequentialist judges will select QIA's AND this is probably bad".

Finite Factored Sets: Conditional Orthogonality

I think a subpartition of S can be thought of as a partial function on S, or equivalently, a variable on S that has the possible value "Null"/"undefined".

Finite Factored Sets: Orthogonality and Time

Can't you define  for any set  of partitions of , rather than  w.r.t. a specific factorization , simply as  iff ? If so, it would seem to me to be clearer to define  that way (i.e. make 7 rather than 2 from proposition 10 the definition), and then basically proposition 10 says "if  is a subset of factors of a partition then here are a set of equivalent definitions in terms of chimera". Also I would guess that proposition 11 is still true for  rather than just for , though I haven't checked that 11.6 would still work, but it seems like it should.