Donald Hobson

Cambridge maths student dph39@cam.ac.uk

Donald Hobson's Comments

What are the most plausible "AI Safety warning shot" scenarios?

I agree that these aren't very likely options. However, given two examples of an AI suddenly stopping when it discovers something, there are probably more for things that are harder to discover. In the pascel mugging example, the agent would stop working, only when it can deduce what potential muggers might want it to do, something much harder than noticing the phenomenon. The myopic agent has little incentive to make a non myopic version of itself. If dedicating a fraction of resources into making a copy of itself reduced the chance of the missile hacking working from 94%, to 93%, we get a near miss.


One book, probably not. A bunch of books and articles over years, maybe.

What are the most plausible "AI Safety warning shot" scenarios?
A "AI safety warning shot" is some event that causes a substantial fraction of the relevant human actors (governments, AI researchers, etc.) to become substantially more supportive of AI research and worried about existential risks posed by AI.

A really well written book on AI safety, or other public outreach campaign could have this effect.

For many events, such as a self driving car crashing, it might be used as evidence for an argument about AI risk.

On to powerful AI systems causing harm, I agree that your reasoning applies to most AI's. There are a few designs that would do something differently. Myopic agents are ones with lots of time discounting within their utility function. If you have a full super-intelligence that wants to do X as quickly as possible, such that the fastest way to do X will also destroy itself, that might be survivable. Consider an AI set to maximize the probability that its own computer case is damaged within the next hour. The AI could bootstrap molecular nanotech, but that would take several hours. The AI thinks that time travel is likely impossible, so by that point, all the mass in the universe can't help it. The AI can hack a nuke and target itself. Much better by its utility function. Nearly max utility. If it can, it might upload a copy of its code to some random computer. (There is some tiny chance that time travel is possible, or that its clock is wrong) So we only get a near miss, if the AI doesn't have enough spare bandwidth or compute to do both. This is assuming that it can't hack reality in a microsecond.

There are a few other scenarios, for instance impact minimising agents. There are some designs of agents that are restricted to have a "small" effect on the future, as a safety measure. This is measured by the difference between what actually happens, and what would happen if it did nothing. When this design understands chaos theory, it will find that all other actions result in too large an effect, and do nothing. It might do a lot of damage before this somehow, depending on circumstances. I think that the AI discovering some fact about the universe that causes the AI to stop optimising effectively is a possible behaviour mode. Another example of this would be pascals mugging. The agent acts dangerously, and then starts outputting gibberish as it capitulates to a parade of fanciful pascals muggers.

Positive Feedback -> Optimization?

Consider a pencil balanced on its point. It has multiple positive feedback loops, (different directions to fall in) and falling far in one direction prevents falling in others. But once it has fallen, it just sits there. That said, evolution can settle into a strong local minimum, and just sit there.

Positive Feedback -> Optimization?

Consider the differential equation where has many positive eigenvalues. This is the simplest case of

a dynamical system containing multiple instabilities (i.e. positive feedback loops),

Where is the selection? It isn't there. You have multiple independent exponential growth rates.

Consider a chaotic system like a double pendulum. Fix to a particular typical solution.

consider as a differential equation in . Here represents the difference between and some other solution to . If you start at then stays at . However, small variations will grow exponentially. After a while, you just get a difference between 2 arbitrary chaotic paths.

I can't see a way of meaningfully describing these as optimizing processes with competing subagents. Arguably could be optimising . However, this doesn't seem canonical, as for any invertable . and describes an exactly isomorphic system, but dosen't preserve modulus. This isomorphism does preserve . That could be the thing being optimised.

Zoom In: An Introduction to Circuits

Is conway's life, with random starting state, interpretable? If you zoom in on any single square, it is trivial to predict what it will do. Zoom out and you need a lot of compute. There is no obvious way to predict if a cell will be on in 1000000 timesteps without brute force simulating the whole thing (at least the past light cone). What would an interpretability tool for conway's life look like?

Robustness to fundamental uncertainty in AGI alignment

When it is all over, we will either have succeed or failed. (The pay-off set is strongly bimodal.)

The magnitude of the pay-off is irrelevant to the optimal strategy. Suppose research program X has a 1% chance of FAI in 10 years, a 1% chance of UFAI in 10 years, and 98% chance of nothing. Is it a good option? That depends on our P(FAI | no AI in 10 years). If FAI will probably arive in 15 years, X is bad. UFAI in 15 years and X may be good. Endorse only those research programms such that you think P(FAI | that research program makes AI) > P(FAI | no one else makes AI before the research program has a chance to). Actually, this assumes unlimited research talent.

Avoiding avenues with small chances of UFAI corresponds to optimism about the outcome.

I think that it should be fairly clear whether an AI project would produce UFAI before it is run. P(friendly) <5% or >80% usually. Probability on serious assesment by competent researchers. So say that future Miri can tell if any given design is friendly in a few months. If some serious safety people study the design and become confident that its FAI then run it. If they think its UFAI, they won't run it. If someone with limited understanding and lack of knowledge of their ignorance manages to build an AI, its UFAI, and they don't know that, so they run it. I am assuming that there are people who don't realise the problem is hard, those that know they can't solve it, and those who can solve it, in order of increasing expertise. Most of the people reading this will be in the middle category. (not counting 10^30 posthuman historians ;-) People in the middle category won't build any ASI. Those in the first category will usually produce nothing, but might produce a UFAI, those in the third might produce a FAI.

Trace: Goals and Principles

Pick an arbitrary set , with a (transitive) group action from a group .

For each element of , you have a copy of all the nodes.

Each symbol now has multiple pointers.

A=Symbol(func=const True)

pa=Pointer(A,g)

C=Symbol(func=xor, parents=[pa,pb])

In your example, and the g being used in the pointers are -1,0,1. As you are only using a finite subset of G, you have the structure of a Caley graph.

The graphs you might want include various latices and infinite trees.

Your evaluation is a function from pairs (s,x) where s is a Symbol and x to values.

How Low Should Fruit Hang Before We Pick It?

Suppose the AI finds a plan with 10^50 impact and 10^1000 utility. I don't want that plan to be run. Its probably a plan that involves taking over the universe and then doing something really high utility. I think a constraint is better than a scaling factor.

Will AI undergo discontinuous progress?

I am not convinced that the distinction between continuous and discontinuous approaches is a feature of the territory. Zoom in in sufficient detail, and you see a continuous wavefunction of electron interacting with the continuous wavefunction of a silicon atom. Zoom out to evolutionary timescales, and the jump from hominids with pointy sticks to ASI is almost instant. The mathematical definition of continuity relies on your function being mathematically formal. Is distance from earth to the moon an even number of plank-lengths? Well there are a huge number of slightly different measurements you could make, depending on when you measure and exactly what points you measure between, and how you deal with relitivistic length contraction, the answer will be different. In a microsecond in which the code that is a fooming AI is doing garbage collection, is AI progress happening? You have identified an empirical variable called AI progress, but whether or not it is continuous depends on exactly how you fill in the details.

Imagine superintelligence has happened and we are discussing the details afterwords. We were in a world basically like this one, and then someone ran some code, which gained total cosmic power within a millisecond. Someone tries to argue that this was a continuous improvement, just a very fast one. What evidence would convince you one way or the other on this?

Will AI undergo discontinuous progress?

Suppose that different tasks take different levels of AI to do better than humans.

Firstly AI can do arithmatic, then play chess, then drive cars ect. Lets also assume that AI is much faster than humans. So imagine that AI research ability was rising from almost nothing to superhuman over the course of a year. A few months in and its inventing stuff like linear regression, impressive, but not as good as current human work on AI. There are a few months where the AI is worse than a serious team of top researchers, but better than an intern. So if you have a nieche use for AI, that can be automatically automated. The AI research AI designs a widget building AI. The humans could have made a widget building AI themselves, but so few widgets are produced, that it wasn't worth it.

Then the AI becomes as good as a top human research team and FOOM. How crazy the world gets before foom depends on how much other stuff is automated first. Is it easier to make an AI teacher, or an AI AI researcher? Also remember that bearocratic delays are a thing, there is a difference between having an AI that does medical diagnosis in a lab, and it being used in every hospital.

Load More