All of Adele Lopez's Comments + Replies

Adele Lopez's Shortform

[Epistemic status: very speculative]

One ray of hope that I've seen discussed is that we may be able to do some sort of acausal trade with even an unaligned AGI, such that it will spare us (e.g. it would give us a humanity-aligned AGI control of a few stars, in exchange for us giving it control of several stars in the worlds we win).

I think Eliezer is right that this wouldn't work.

But I think there are possible trades which don't have this problem. Consider the scenario in which we Win, with an aligned AGI taking control of our future light-cone. Assuming t... (read more)

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

It seems relatively plausible that you could use a Limited AGI to build a nanotech system capable of uploading a diverse assortment of (non-brain, or maybe only very small brains) living tissue without damaging them, and that this system would learn how to upload tissue in a general way. Then you could use the system (not the AGI) to upload humans (tested on increasingly complex animals). It would be a relatively inefficient emulation, but it doesn't seem obviously doomed to me.

Probably too late once hardware is available to do this though.

AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy

So in a "weird experiment", the infrabayesian starts by believing only one branch exists, and then at some point starts believing in multiple branches?

3Vanessa Kosoy1mo
Multiple branches can only exist transiently during the weird experiment (i.e. neither before nor after). Naturally, if the agent knows in advance the experiment is going to happen, then it anticipates those branches to appear.
AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy

If there aren't other branches, then shouldn't that be impossible? Not just in practice but in principle.

2Vanessa Kosoy1mo
The wavefunction has other branches, because it's the same mathematical object governed by the same equations. Only, the wavefunction doesn't exist physically, it's just an intermediate variable in the computation. The things that exist (corresponding to the Φ variable in the formalism) and the things that are experienced (corresponding to some function of the 2Γ variable in the formalism) only have one branch.
AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy

You can get some weird things if you are doing some weird experiment on yourself where you are becoming a Schrödinger cat and doing some weird stuff like that, you can get a situation where multiple copies of you exist. But if you’re not doing anything like that, you’re just one branch, one copy of everything.

Why does it matter that you are doing a weird experiment, versus the universe implicitly doing the experiment for you via decoherence? If someone else did the experiment on you without your knowledge, does infrabayesianism expect one copy or multiple copies?

3Vanessa Kosoy1mo
By "weird experiment" I mean things like, reversing decoherence. That is, something designed to cause interference between branches of the wavefunction with minds that remember different experiences[1] [#fn-Qj7rCRudmsLunWhWP-1]. Which obviously requires levels of technology we are nowhere near to reaching[2] [#fn-Qj7rCRudmsLunWhWP-2]. As long as decoherence happens as usual, there is only one copy. -------------------------------------------------------------------------------- 1. Ofc it requires erasing their contradicting memories among other things. ↩︎ [#fnref-Qj7rCRudmsLunWhWP-1] 2. There is a possible "shortcut" though, namely simulating minds on quantum computers. Naturally, in this case only the quantum-uploaded-minds can have multiple copies. ↩︎ [#fnref-Qj7rCRudmsLunWhWP-2]
Shah and Yudkowsky on alignment failures

If being versed in cryptography was enough, then I wouldn't expect Eliezer to claim being one of the last living descendents of this lineage.

Why would Zen help (and why do you think that)?

Shah and Yudkowsky on alignment failures

This may, perhaps, be confounded by the phenomenon where I am one of the last living descendants of the lineage that ever knew how to say anything concrete at all.

I've previously noticed this weakness in myself. What lineage did Eliezer learn this from? I would appreciate any suggestions or advice on how to become stronger at this.

This came up with Aysajan about two months ago. An exercise which I recommended for him: first, pick a technical academic paper. Read through the abstract and first few paragraphs. At the end of each sentence (or after each comma, if the authors use very long sentences), pause and write/sketch a prototypical example of what you currently think they're talking about. The goal here is to get into the habit of keeping a "mental picture" (i.e. prototypical example) of what the authors are talking about as you read.

Other good sources on which to try this exerci... (read more)

CFAR used to have an awesome class called "Be specific!" that was mostly about concreteness. Exercises included:

  • Rationalist taboo
  • A group version of rationalist taboo where an instructor holds an everyday object and asks the class to describe it in concrete terms.
  • The Monday-Tuesday game
  • A role-playing game where the instructor plays a management consultant whose advice is impressive-sounding but contentless bullshit, and where the class has to force the consultant to be specific and concrete enough to be either wrong or trivial.
  • People were encouraged t
... (read more)
2kyleherndon3mo
Cryptography was mentioned in this post in a relevant manner, though I don't have enough experience with it to advocate it with certainty. Some lineages of physics (EY points to Feynman) try to evoke this, though it's pervasiveness has decreased. You may have some luck with Zen. Generally speaking, I think if you look at the Sequences, the themes of physics, security mindset, and Zen are invoked for a reason.
Adele Lopez's Shortform

[I may try to flesh this out into a full-fledged post, but for now the idea is only partially baked. If you see a hole in the argument, please poke at it! Also I wouldn't be very surprised if someone has made this point already, but I don't remember seeing such. ]

Dissolving the paradox of useful noise

A perfect bayesian doesn't need randomization.

Yet in practice, randomization seems to be quite useful.

How to resolve this seeming contradiction?

I think the key is that a perfect bayesian (Omega) is logically omniscient. Omega can always fully update on all o... (read more)

Biology-Inspired AGI Timelines: The Trick That Never Works

You're missing the point!

Your arguments apply mostly toward arguing that brains are optimized for energy efficiency, but the important quantity in question is computational efficiency! You even admit that neurons are "optimizing hard for energy efficiency at the expense of speed", but don't seem to have noticed that this fact makes almost everything else you said completely irrelevant!

Biology-Inspired AGI Timelines: The Trick That Never Works

Going to try answering this one:

Humbali: I feel surprised that I should have to explain this to somebody who supposedly knows probability theory. If you put higher probabilities on AGI arriving in the years before 2050, then, on average, you're concentrating more probability into each year that AGI might possibly arrive, than OpenPhil does. Your probability distribution has lower entropy. We can literally just calculate out that part, if you don't believe me. So to the extent that you're wrong, it should shift your probability distributions in the d

... (read more)
Visible Thoughts Project and Bounty Announcement

This plausibly looks like an existing collection of works which seem to be annotated in a similar way: https://www.amazon.com/Star-Wars-Screenplays-Laurent-Bouzereau/dp/0345409817

Christiano, Cotra, and Yudkowsky on AI progress

That seems a bit uncharitable to me. I doubt he rejects those heuristics wholesale. I'd guess that he thinks that e.g. recursive self improvement is one of those things where these heuristics don't apply, and that this is foreseeable because of e.g. the nature of recursion. I'd love to hear more about what sort of knowledge about "operating these heuristics" you think he's missing!

Anyway, it seems like he expects things to seem more-or-less gradual up until FOOM, so I think my original point still applies: I think his model would not be "shaken out" of his fast-takeoff view due to successful future predictions (until it's too late).

5Paul Christiano6mo
He says things like AlphaGo or GPT-3 being really surprising to gradualists, suggesting he thinks that gradualism only works in hindsight. I agree that after shaking out the other disagreements, we could just end up with Eliezer saying "yeah but automating AI R&D is just fundamentally unlike all the other tasks to which we've applied AI" (or "AI improving AI will be fundamentally unlike automating humans improving AI") but I don't think that's the core of his position right now.
Christiano, Cotra, and Yudkowsky on AI progress

It seems like Eliezer is mostly just more uncertain about the near future than you are, so it doesn't seem like you'll be able to find (ii) by looking at predictions for the near future.

It seems to me like Eliezer rejects a lot of important heuristics like "things change slowly" and "most innovations aren't big deals" and so on. One reason he may do that is because he literally doesn't know how to operate those heuristics, and so when he applies them retroactively they seem obviously stupid. But if we actually walked through predictions in advance, I think he'd see that actual gradualists are much better predictors than he imagines.

Matthew Barnett's Shortform

I lean toward the foom side, and I think I agree with the first statement. The intuition for me is that it's kinda like p-hacking (there are very many possible graphs, and some percentage of those will be gradual), or using a log-log plot (which makes everything look like a nice straight line, but are actually very broad predictions when properly accounting for uncertainty). Not sure if I agree with the addendum or not yet, and I'm not sure how much of a crux this is for me yet.

Yudkowsky and Christiano discuss "Takeoff Speeds"

Spending money on R&D is essentially the expenditure of resources in order to explore and optimize over a promising design space, right? That seems like a good description of what natural selection did in the case of hominids. I imagine this still sounds silly to you, but I'm not sure why. My guess is that you think natural selection isn't relevantly similar because it didn't deliberately plan to allocate resources as part of a long bet that it would pay off big.

4Paul Christiano6mo
I think natural selection has lots of similarities to R&D, but (i) there are lots of ways of drawing the analogy, (ii) some important features of R&D are missing in evolution, including some really important ones for fast takeoff arguments (like the existence of actors who think ahead). If someones wants to spell out why they think evolution of hominids means takeoff is fast then I'm usually happy to explain why I disagree with their particular analogy. I think this happens in the next discord log between me and Eliezer.
Ngo and Yudkowsky on AI capability gains

There's more than just differential topology going on, but it's the thing that unifies it all. You can think of differential topology as being about spaces you can divide into cells, and the boundaries of those cells. Conservation laws are naturally expressed here as constraints that the net flow across the boundary must be zero. This makes conserved quantities into resources, for which the use of is convergently minimized. Minimal structures with certain constraints are thus led to forming the same network-like shapes, obeying the same sorts of laws. (See... (read more)

Ngo and Yudkowsky on AI capability gains

I think "deep fundamental theory" is deeper than just "powerful abstraction that is useful in a lot of domains".

Part of what makes a Deep Fundamental Theory deeper is that it is inevitably relevant for anything existing in a certain way. For example, Ramón y Cajal (discoverer of the neuronal structure of brains) wrote:

Before the correction of the law of polarization, we have thought in vain about the usefulness of the referred facts. Thus, the early emergence of the axon, or the displacement of the soma, appeared to us as unfavorable arrangements acting

... (read more)
4Adam Shimi6mo
Can you go into more detail here? I have done a decent amount of maths but always had trouble in physics due to my lack of physical intuition, so it might be completely obvious but I'm not clear about what is "that same thing" or how it explains all your examples? Is it about shortest path? What aspect of differential topology (a really large field) captures it? (Maybe you literally can't explain it to me without me seeing the deep theory, which would be frustrating, but I'd want to know if that was the case. )
Ngo and Yudkowsky on AI capability gains

"you can't make an engine more efficient than a Carnot engine."

That's not what it predicts. It predicts you can't make a heat engine more efficient than a Carnot engine.

Adele Lopez's Shortform

Haha yeah, I'm not surprised if this ends up not working, but I'd appreciate hearing why.

Adele Lopez's Shortform

Elitzur-Vaidman AGI testing

One thing that makes AI alignment super hard is that we only get one shot.

However, it's potentially possible to get around this (though probably still very difficult).

The Elitzur-Vaidman bomb tester is a protocol (using quantum weirdness) by which a bomb may be tested, with arbitrarily little risk. It's interest comes from the fact that it works even when the only way to test the bomb is to try detonating it. It doesn't matter how the bomb works, as long as we can set things up so that it will allow/block a photon based on wheth... (read more)

3Matthew "Vaniver" Graves6mo
IMO this is a 'additional line of defense' boxing strategy instead of simplification. Note that in the traditional version, the 'dud' bit of the bomb can only be the trigger; a bomb that absorbs the photon but then explodes isn't distinguishable from a bomb that absorbs the photon and then doesn't explode (because of an error deeper in the bomb). But let's suppose the quantum computing folks can come up with something like this, where we keep some branches entangled and run analysis of the AI code in only one branch, causing an explosion there but affecting the total outcome in all branches. [This seems pretty implausible to me that you manage to maintain entanglement despite that much impact on the external world, but maybe it's possible.] Then 1) as you point out, we need to ensure that the AI doesn't realize that what it needs to output in that branch and 2) need some sort of way to evaluate "did the AI pass our checks or not?". But, 2 is "the whole problem"!
2Pattern6mo
I think we get enough things referencing quantum mechanics that we should probably explain why that doesn't work (if I it doesn't) rather than just downvoting and moving on.
Some Existing Selection Theorems

Zurek's einselection seems like perhaps another instance of this, or at least related. The basic idea is (very roughly) that the preferred basis in QM is preferred because persistence of information selects for it.

Some Existing Selection Theorems

I think Critch's "Futarchy" theorem counts as a (very nice) selection theorem.

Another (outer) alignment failure story

How bad is the ending supposed to be? Are just people who fight the system killed, and otherwise, humans are free to live in the way AI expects them to (which might be something like keep consuming goods and providing AI-mediated feedback on the quality of those goods)? Or is it more like once humans are disempowered no machine has any incentive to keep them around anymore, so humans are not-so-gradually replaced with machines?

The main point of intervention in this scenario that stood out to me would be making sure that (during the paragraph beginning with... (read more)

6Paul Christiano1y
I think that most likely either humans are killed incidentally as part of the sensor-hijacking (since that's likely to be the easiest way to deal with them), or else AI systems reserve a negligible fraction of their resources to keep humans alive and happy (but disempowered) based on something like moral pluralism or being nice or acausal trade (e.g. the belief that much of their influence comes from the worlds in which they are simulated by humans who didn't mess up alignment and who would be willing to exchange a small part of their resources in order to keep the people in the story alive and happy). I don't think this is infeasible. It's not the intervention I'm most focused on, but it may be the easiest way to avoid this failure (and it's an important channel for advance preparations to make things better / important payoff for understanding what's up with alignment and correctly anticipating problems).
MIRI comments on Cotra's "Case for Aligning Narrowly Superhuman Models"

My guess is that a "clean" algorithm is still going to require multiple conceptual insights in order to create it. And typically, those insights are going to be found before we've had time to strip away the extraneous ideas in order to make it clean, which requires additional insights. Combine this with the fact that at least some of these insights are likely to be public knowledge and relevant to AGI, and I think Eliezer has the right idea here.

3Daniel Kokotajlo1y
OK, fair enough.
Utility Maximization = Description Length Minimization

This gives a nice intuitive explanation for the Jeffery-Bolker rotation which basically is a way of interpreting a belief as a utility, and vice versa.

Some thoughts:

  • What do probabilities mean without reference to any sort of agent? Presumably it has something to do with the ability to "win" De Finetti games in expectation. For avoiding subtle anthropomorphization, it might be good to think of this sort of probability as being instantiated in a bacterium's chemical sensor, or something like that. And in this setting, it's clear it wouldn't mean anything w
... (read more)
4Alex Mennen1y
I don't see the connection to the Jeffrey-Bolker rotation? There, to get the shouldness coordinate, you need to start with the epistemic probability measure, and multiply it by utility; here, utility is interpreted as a probability distribution without reference to a probability distribution used for beliefs.
Problems Involving Abstraction?

Not quite sure how specifically this connects, but I think you would appreciate seeing it.

As a good example of the kind of gains we can get from abstraction, see this exposition of the HashLife algorithm, used to (perfectly) simulate Conway's Game of Life at insane scales.

Earlier I mentioned I would run some nontrivial patterns for trillions of generations. Even just counting to a trillion takes a fair amount of time for a modern CPU; yet HashLife can run the breeder to one trillion generations, and print its resulting population of 1,302,083,334,180,208,337,404 in less than a second.

1johnswentworth2y
Ooh, good one. If I remember the trick to the algorithm correctly, it can indeed be cast as abstraction.
Problems Involving Abstraction?

Entropy and temperature inherently require the abstraction of macrostates from microstates. Recommend reading this: http://www.av8n.com/physics/thermo/entropy.html if you haven't seen this before (or just want an unconfused explanation).

1johnswentworth2y
At some point I need to write a post on purely Bayesian statistical mechanics, in a general enough form that it's not tied to the specifics of physics. I can probably write a not-too-long explanation of how abstraction works in this context. I'll see what I can do.
Forecasting Thread: AI Timelines

Roughly my feelings: https://elicit.ought.org/builder/trBX3uNCd

Reasoning: I think lots of people have updated too much on GPT-3, and that the current ML paradigms are still missing key insights into general intelligence. But I also think enough research is going into the field that it won't take too long to reach those insights.

Adele Lopez's Shortform

It seems that privacy potentially could "tame" a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn't be able to think of anything like that, and might resort to fulfilling the request in a more "normal" way. This doesn't seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.

Adele Lopez's Shortform

Privacy as a component of AI alignment

[realized this is basically just a behaviorist genie, but posting it in case someone finds it useful]

What makes something manipulative? If I do something with the intent of getting you to do something, is that manipulative? A simple request seems fine, but if I have a complete model of your mind, and use it phrase things so you do exactly what I want, that seems to have crossed an important line.

The idea is that using a model of a person that is *too* detailed is a violation of human values. In particular, it violates... (read more)

1Adele Lopez2y
It seems that privacy potentially could "tame" a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn't be able to think of anything like that, and might resort to fulfilling the request in a more "normal" way. This doesn't seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.
Adele Lopez's Shortform

Half-baked idea for low-impact AI:

As an example, imagine a board that's lodged directly by the wall (no other support structures). If you make it twice as wide, then it will be twice as stiff, but if you make it twice as thick, then it will be eight times as stiff. On the other hand, if you make it twice as long, it will be eight times more compliant.

In a similar way, different action parameters will have scaling exponents (or more generally, functions). So one way to decrease the risk of high-impact actions would be to make sure that the scaling expo... (read more)

Topological metaphysics: relating point-set topology and locale theory

Another way to make it countable would be to instead go to the category of posets, Then the rational interval basis is a poset with a countable number of elements, and by the Alexandroff construction corresponds to the real line (or at least something very similar). But, this construction gives a full and faithful embedding of the category of posets to the category of spaces (which basically means you get all and only continuous maps from monotonic function).

I guess the ontology version in this case would be the category of prosets. (Personally, I'm not sure that ontology of the universe isn't a type error).

Soft takeoff can still lead to decisive strategic advantage

Yeah, I think the engineer intuition is the bottleneck I'm pointing at here.

Thoughts from a Two Boxer

I think people make decisions based on accurate models of other people all the time. I think of Newcomb's problem as the limiting case where Omega has extremely accurate predictions, but that the solution is still relevant even when "Omega" is only 60% likely to guess correctly. A fun illustration of a computer program capable of predicting (most) humans this accurately is the Aaronson oracle.

Soft takeoff can still lead to decisive strategic advantage

This post has caused me to update my probability of this kind of scenario!

Another issue related to the information leakage: in the industrial revolution era, 30 years was plenty of time for people to understand and replicate leaked or stolen knowledge. But if the slower team managed to obtain the leading team's source code, it seems plausible that 3 years, or especially 0.3 years, would not be enough time to learn how to use that information as skillfully as the leading team can.

4Lukas Finnveden3y
Hm, my prior is that speed of learning how stolen code works would scale along with general innovation speed, though I haven't thought about it a lot. On the one hand, learning the basics of how the code works would scale well with more automated testing, and a lot of finetuning could presumably be automated without intimate knowledge. On the other hand, we might be in a paradigm where AI tech allows us to generate lots of architectures to test, anyway, and the bottleneck is for engineers to develop an intuition for them, which seems like the thing that you're pointing at.
Topological Fixed Point Exercises

Awesome! I was hoping that there would be a way to do this!

Diagonalization Fixed Point Exercises

Currying doesn't preserve surjectivity. As a simple example, you can easily find a surjective function , but there are no surjective functions .

2Czynski3y
Ah, yes, that makes sense. I got distracted by the definition of Cont(A,B)'s topology
Iteration Fixed Point Exercises

Ex 6:

If at any point , then we're done. So assume that we get a strict increase each time up to . Since there are only elements in the entire poset, and is monotone, has to equal .

Ex 7:

For a limit ordinal , define as the least upper bound of for all . If , then the set for is a set of size that maps into a set of size by taking the value of the element. Since there are no injections between these sets, there must be two ordinals such that. Since ... (read more),

Diagonalization Fixed Point Exercises

Ex 8: (in Python, using a reversal function)

def f(s):
return s[::-1]

dlmt = '"""'
code = """def f(s):
return s[::-1]

dlmt = '{}'
code = {}{}{}
code = code.format(dlmt, dlmt, code, dlmt)
print(f(code))"""
code = code.format(dlmt, dlmt, code, dlmt)
print(f(code))

Topological Fixed Point Exercises

You can use 1d-Sperner to deal with all the cases effectively.

Topological Fixed Point Exercises

Found a nice proof for Sperner's lemma (#9):

First some definitions. Call a d-simplex with vertices colored in (d+1) different colored chromatic. Call the parity of the number of chromatic simplices the chromatic parity.

It's easier to prove the following generalization: Take a complex of d-simplices that form a d-sphere: then any (d+1)-coloring of the vertices will have even chromatic parity.

Proof by induction on d:

Base d=-1: vacuously true.

Assume true for d-1: Say you have an arbitrary complex of d-simplices forming a d-sphere, with an arbitra... (read more)

Thanks! I find this approach more intuitive than the proof of Sperner's lemma that I found in Wikipedia. Along with nshepperd's comment, it also inspired me to work out an interesting extension that requires only minor modifications to your proof:

d-spheres are orientable manifolds, hence so is a decomposition of a d-sphere into a complex K of d-simplices. So we may arbitrarily choose one of the two possible orientations for K (e.g. by choosing a particular simplex P in K, ordering its vertices from 1 to d + 1, and declaring it to be the prototypi... (read more)