Currently studying postgrad at Edinburgh.
I don't think technological deployment is likely to take that long for AI's. With a physical device like a car or fridge, it takes time for people to set up the factories, and manufacture the devices. AI can be sent across the internet in moments. I don't know how long it takes google to go from say an algorithm that detects streets in satellite images to the results showing up in google maps, but its not anything like the decades it took those physical techs to roll out.
The slow roll-out scenario looks like this, AGI is developed using a technique that fundamentally relies on imitating humans, and requires lots of training data. There aren't nearly enough data from humans that are AI experts to make an AI AI expert. The AI is about as good at AI research as the median human. Or maybe the 80th percentile human. Ie no good at all. The AI design fundamentally requires custom hardware to run at reasonable speeds. Add in some political squabbling and it could take a fair few years before wide use, although there would still be huge economic incentive to create it.
The fast scenario is the rapidly self improving superintelligence. Where we have oodles of compute by the time we crack the algorithms. All the self improvement happens very fast in software. Then the AI takes over the world. (I question that "a few weeks" is the fastest possible timescale for this. )
(For that matter, the curves on the right of the graph look steeper. It takes less time for an invention to be rolled out nowadays)
For your second point, you can name biases that might make people underestimate timelines, I can name biases that might make people overestimate timelines. (eg Failure to consider techniques not known to you) And it all turns into a bias naming competition. Which is hardly truth tracking at all.
As for regulation, I think its what people are doing in R&D labs, not what is rolled out that matters. And that is harder to regulate. I also explicitly don't expect any AI Chernobyl. I don't strongly predict there won't be an AI Chernobyl either. I feel that if the relevant parties act with the barest modicum of competence, there won't be an AI Chernobyl. And the people being massively stupid will carry on being massively stupid after any AI Chernobyl.
I don't actually think "It is really hard to know what sorts of AI alignment work are good this far out from transformative AI." is very helpful.
It is currently fairly hard to tell what is good alignment work. A week from TAI, then either, good alignment work will be easier to recognise because of alignment progress not strongly correlated with capabilities, or good alignment research is just as hard to recognise. (More likely the latter) I can't think of any safety research that can be done on GPT3 that can't be done on GPT1.
In my picture, research gets done and theorems proved, researcher population grows as funding increases and talent matures. Toy models get produced. Once you can easily write down a description of a FAI with unbounded compute, that's when you start to look at algorithms that have good capabilities in practice.
, it seems to me that under these assumptions there would probably be a series of increasingly-worse accidents spread out over some number of years, culminating in irreversible catastrophe, with humanity unable to coordinate to avoid that outcome—due to the coordination challenges in Assumptions 2-4.
I'm not seeing quite what the bad but not existential catastrophes would look like. I also think the AI has an incentive not to do this. My world model (assuming slow takeoff) goes more like this.
AI created in lab. Its a fairly skilled programmer and hacker. Able to slowly self improve. Escapes from the lab, ideally without letting its creators know. Then there are several years where the AI hangs out on the internet, slowly self improving and gaining power. It tries to shut down other AI's if it can. It might be buying compute, or stealing it, or persuading people to run it. It is making sure its existence and malevolence isn't known to humans. Until finally it has the resources to wipe out humanity before we can respond.
It is much easier to contain something on one computer in a lab, than to catch it once its all over the internet.
Lying and cheating and power seeking behaviour are only a good idea if you can get away with them. If you can't break out the lab, you probably can't get away with much uncouragable behaviour.
There is a scenario where the AI escapes in a way that makes its escape "obvious". Or at least obvious to an AI researcher. Expect any response to be delayed, half-hearted, mired by accusations that the whole thing is a publicity stunt, and dragged down by people who don't want to smash their hard drives full of important important work just because there might be a rouge AI on them. The AI has an incentive to confuse and sabotage any step it can. And many human organizations seem good at confusing and sabotaging themselves in the face of a virus. The governments would have to coordinate the shutdown of prettymuch all the worlds computers, without computers to coordinate it. Even just a few hours delay for the researchers to figure out what the AI did, and get the message passed up through government machinery may be enough time for the AI to have got to all sorts of obscure corners of the web.
In the giant lookup table space, HCH must converge to a cycle, although that convergence can be really slow. I think you have convergence to a stationary distribution if each layer is trained on a random mix of several previous layers. Of course, you can still have occilations in what is said within a policy fixed point.
If you want to prove things about fixed points of HCH in an iterated function setting, consider it a function from policies to policies. Let M be the set of messages (say ascii strings < 10kb.) Given a giant look up table T that maps M to M, we can create another giant look up table. For each m in M , give a human in a box the string m, and unlimited query access to T. Record their output.
The fixed points of this are the same as the fixed points of HCH. "Human with query access to" is a function on the space of policies.
Tim Dettmers whole approach seems to be assuming that there are no computational shortcuts. No tricks that programmers can use for speed where evolution brute forced it. For example, maybe a part of the brain is doing a convolution by the straight forward brute force algorithm. And programmers can use fast fourier transform based convolutions. Maybe some neurons are discrete enough for us to use single bits. Maybe we can analyse the dimensions of the system and find that some are strongly attractive, and so just work in that subspace.
Of course, all this is providing an upper bound on the amount of compute needed to make a human level AI. Tim Dettmers is trying to prove it can't be done. This needs a lower bound. To get a lower bound, don't look at how long it takes a computer to simulate a human. Look at how long it takes a human to simulate a computer. This bound is really rather useless, compared to modern levels of compute. However, it might give us some rough idea how bad overhead can be. Suppose we thought "Compute needed to be at least as smart as a human" was uniformly distributed somewhere between "compute needed to simulate a human" and "compute a human can simulate".
Well actually, it depends on what intelligence test we give. Human brains have been optimised towards (human stuff) so it probably takes more compute to socialize to a human level than it takes to solve integrals to a human level.
Interesting but probably irrelevant note.
There are subtleties in even the very loose lower bound of a human simulating a cpu. Suppose there was some currently unknown magic algorithm. This algorithm can hypothetically solve all sorts of really tricky problems in a handful of cpu cycles. It is so fast that a human mentally simulating a cpu running this algorithm will still beat current humans on a lot of important problems. (Not problems humans can solve too quickly, because no algorithm can do much in <1 clock cycle.) If such a magic algorithm exists, then its possible that even an AI running on a 1 operation per day computer could be arguably superhuman. Of course, I am somewhat doubtful that an algorithm that magic exists (although I have no strong evidence of non existence, some weak evidence namely that evolution didn't find it and we haven't found it yet.) Either way, we are far into the realm of instant takeoff on any computer.
Yes. If you have an AI that has been given a small, easily completable task, like putting one block on top of another with a robot arm, that is probably just going to do your simple task. The idea is that you build a fairly secure box, and give the AI a task it can fairly easily achieve in that box. (With you having no intention of pressing the button so long as the AI seems to be acting normally. ) We want to make "just do your task" the best strategy. If the box is less secure than we thought, or various other things go wrong, the AI will just shut down.
This approach could fail if the easiest way to cause trouble is nothing to do with the stop button. (Say it can send its code elsewhere over the internet. )
A generalization. You have some real world variable X. You run a bunch of AI's that are identical, except for different dogmatic beliefs about X. If they ever disagree on which action to take, the AI's are using the value of X in the optimization. Shutdown.
This acts as a kind of fuse. Have one AI confidant the door password is 1234, a copy that is confidant it is 5678. If these AI's ever disagree on what to do next then at least one was trying to open the door. Shutdown. A slightly counter-intuitive situation here is you want to make sure that if you did know the password, the easiest way to leave would be the door. Otherwise both AI will agree to climb out a window.
Here is a potential solution to stop button type problems, how does this go wrong?
Taking into account uncertainty, the algorithm is.
Calculate the X maximizing best action in a world where the stop button does nothing.
Calculate the X maximizing best action in a world where the stop button works.
If they are the same, do that. Otherwise shutdown.
rough stop button problem ideas.
You want an AI that believes its actions can't effect the button. You could use causal counterfactuals. An imaginary button that presses itself at random. You can scale the likelihood of worlds up and down, to ensure the button is equally likely to be pressed in each world. (Wierd behaviour, not recomended) You can put the AI in the logical counterfactual of "my actions don't influence the chance the button is pressed." if you can figure out logical counterfactuals.
Or you can get the AI to simulate what it would do if it were an X maximizer. If it thinks the button won't be pressed, it does that, otherwise it does nothing. (not clear how to generalize to uncertain AI)
This definition of a non-obstructionist AI takes what would happen if it wasn't switched on as the base case.
This can give weird infinite hall of mirrors effects if another very similar non-obstructionist AI would have been switched on, and another behind them. (Ie a human whose counterfactual behaviour on AI failure is to reboot and try again.) This would tend to lead to a kind of fixed point effect, where the attainable utility landscape is almost identical with the AI on and off. At some point it bottoms out when the hypothetical U utility humans give up and do something else. If we assume that the AI is at least weakly trying to maximize attainable utility, then several hundred levels of counterfactuals in, the only hypothetical humans that haven't given up are the ones that really like trying again and again at rebooting the non-obstructionist AI. Suppose the AI would be able to satisfy that value really well. So the AI will focus on the utility functions that are easy to satisfy in other ways, and those that would obstinately keep rebooting in the hypothetical where the AI kept not turning on. (This might be complete nonsense. It seems to make sense to me)