Adele Lopez


Problems Involving Abstraction?

Not quite sure how specifically this connects, but I think you would appreciate seeing it.

As a good example of the kind of gains we can get from abstraction, see this exposition of the HashLife algorithm, used to (perfectly) simulate Conway's Game of Life at insane scales.

Earlier I mentioned I would run some nontrivial patterns for trillions of generations. Even just counting to a trillion takes a fair amount of time for a modern CPU; yet HashLife can run the breeder to one trillion generations, and print its resulting population of 1,302,083,334,180,208,337,404 in less than a second.

Problems Involving Abstraction?

Entropy and temperature inherently require the abstraction of macrostates from microstates. Recommend reading this: if you haven't seen this before (or just want an unconfused explanation).

Forecasting Thread: AI Timelines

Roughly my feelings:

Reasoning: I think lots of people have updated too much on GPT-3, and that the current ML paradigms are still missing key insights into general intelligence. But I also think enough research is going into the field that it won't take too long to reach those insights.

Adele Lopez's Shortform

It seems that privacy potentially could "tame" a not-quite-corrigible AI. With a full model, the AGI might receive a request, deduce that activating a certain set of neurons strongly would be the most robust way to make you feel the request was fulfilled, and then design an electrode set-up to accomplish that. Whereas the same AI with a weak model wouldn't be able to think of anything like that, and might resort to fulfilling the request in a more "normal" way. This doesn't seem that great, but it does seem to me like this is actually part of what makes humans relatively corrigible.

Adele Lopez's Shortform

Privacy as a component of AI alignment

[realized this is basically just a behaviorist genie, but posting it in case someone finds it useful]

What makes something manipulative? If I do something with the intent of getting you to do something, is that manipulative? A simple request seems fine, but if I have a complete model of your mind, and use it phrase things so you do exactly what I want, that seems to have crossed an important line.

The idea is that using a model of a person that is *too* detailed is a violation of human values. In particular, it violates the value of autonomy, since your actions can now be controlled by someone using this model. And I believe that this is a significant part of what we are trying to protect when we invoke the colloquial value of privacy.

In ordinary situations, people can control how much privacy they have relative to another entity by limiting their contact with them to certain situations. But with an AGI, a person may lose a very large amount of privacy from seemingly innocuous interactions (we're already seeing the start of this with "big data" companies improving their advertising effectiveness by using information that doesn't seem that significant to us). Even worse, an AGI may be able to break the privacy of everyone (or a very large class of people) by using inferences based on just a few people (leveraging perhaps knowledge of the human connectome, hypnosis, etc...).

If we could reliably point to specific models an AI is using, and have it honestly share its model structure with us, we could potentially limit the strength of its model of human minds. Perhaps even have it use a hardcoded model limited to knowledge of the physical conditions required to keep it healthy. This would mitigate issues such as deliberate deception or mindcrime.

We could also potentially allow it to use more detailed models in specific cases, for example, we could let it use a detailed mind model to figure out what is causing depression in a specific case, but it would have to use the limited model in any other contexts or for any planning aspects of it. Not sure if that example would work, but I think that there are potentially safe ways to have it use context-limited mind models.

Adele Lopez's Shortform

Half-baked idea for low-impact AI:

As an example, imagine a board that's lodged directly by the wall (no other support structures). If you make it twice as wide, then it will be twice as stiff, but if you make it twice as thick, then it will be eight times as stiff. On the other hand, if you make it twice as long, it will be eight times more compliant.

In a similar way, different action parameters will have scaling exponents (or more generally, functions). So one way to decrease the risk of high-impact actions would be to make sure that the scaling exponent is bounded above by a certain amount.

Anyway, to even do this, you still need to make sure the agent's model is honestly evaluating the scaling exponent. And you would still need to define this stuff a lot more rigorously. I think this idea is more useful in the case where you already have an AI with high-level corrigible intent and want to give it a general "common sense" about the kinds of experiments it might think to try.

So it's probably not that useful, but I wanted to throw it out there.

Topological metaphysics: relating point-set topology and locale theory

Another way to make it countable would be to instead go to the category of posets, Then the rational interval basis is a poset with a countable number of elements, and by the Alexandroff construction corresponds to the real line (or at least something very similar). But, this construction gives a full and faithful embedding of the category of posets to the category of spaces (which basically means you get all and only continuous maps from monotonic function).

I guess the ontology version in this case would be the category of prosets. (Personally, I'm not sure that ontology of the universe isn't a type error).

Soft takeoff can still lead to decisive strategic advantage

Yeah, I think the engineer intuition is the bottleneck I'm pointing at here.

Thoughts from a Two Boxer

I think people make decisions based on accurate models of other people all the time. I think of Newcomb's problem as the limiting case where Omega has extremely accurate predictions, but that the solution is still relevant even when "Omega" is only 60% likely to guess correctly. A fun illustration of a computer program capable of predicting (most) humans this accurately is the Aaronson oracle.

Soft takeoff can still lead to decisive strategic advantage

This post has caused me to update my probability of this kind of scenario!

Another issue related to the information leakage: in the industrial revolution era, 30 years was plenty of time for people to understand and replicate leaked or stolen knowledge. But if the slower team managed to obtain the leading team's source code, it seems plausible that 3 years, or especially 0.3 years, would not be enough time to learn how to use that information as skillfully as the leading team can.

Load More