Highlights

Challenges to Christiano’s capability amplification proposal (Eliezer Yudkowsky): A list of challenges faced by iterated distillation and amplification. First, a collection of aligned agents interacting does not necessarily lead to aligned behavior. (Paul's response: That's not the reason for optimism, it's more that there is no optimization pressure to be unaligned.) Second, it's unclear that even with high bandwidth oversight, that a collection of agents could reach arbitrary levels of capability. For example, how could agents with an understanding of arithmetic invent Hessian-free optimization? (Paul's response: This is an empirical disagreement, hopefully it can be resolved with experiments.) Third, while it is true that exact imitation of a human would avoid the issues of RL, it is harder to create exact imitation than to create superintelligence, and as soon as you have any imperfection in your imitation of a human, you very quickly get back the problems of RL. (Paul’s response: He's not aiming for exact imitation, he wants to deal with this problem by having a strong overseer aka informed oversight, and by having techniques that optimize worst-case performance.) Fourth, since Paul wants to use big unaligned neural nets to imitate humans, we have to worry about the possibility of adversarial behavior. He has suggested using large ensembles of agents and detecting and pruning the ones that are adversarial. However, this would require millions of samples per unaligned agent, which is prohibitively expensive. (Paul's response: He's no longer optimistic about ensembles and instead prefers the techniques in this post, but he could see ways of reducing the sample complexity further.)

My opinion: Of all of these, I'm most worried about the second and third problems. I definitely have a weak intuition that there are many important tasks that we care about that can't easily be decomposed, but I'm optimistic that we can find out with experiments. For the point about having to train a by-default unaligned neural net to imitate aligned agents, I'm somewhat optimistic about informed oversight with strong interpretability techniques, but I become a lot less optimistic if we think that won't be enough and need to use other techniques like verification, which seem unlikely to scale that far. In any case, I'd recommend reading this post for a good explanation of common critiques of IDA.

AI and Compute (Dario Amodei et al): Since 2012, when the deep learning revolution began with AlexNet, the amount of compute used in the largest-scale experiments has been doubling every 3.5 months. Initially, people started to use GPUs to scale up, but there wasn't a huge amount o f interest. In 2014-16, as interest in deep learning really began to take off, people started to use a lot of compute to get good results -- but parallelism stopped helping beyond a certain point (~100 GPUs) because the parameter updates from the data were becoming too stale. Since then, we've had algorithmic improvements that allow us to take advantage of more parallelism (huge batch sizes, architecture search, expert iteration), and this has let us scale up the amount of compute thrown at the problem.

My opinion: I did know that the amount of compute used was growing fast, but a 3.5 month doubling time for 6 years running is huge, and there's no reason to expect that it will stop now. It's also interesting to see what made it onto the graph -- there's image classification, machine translation, and neural architecture search (all of which have clear economic incentives), but some of the largest ones are by projects aiming to build AGI (AlphaGo Zero, AlphaZero, and Dota). Notably, deep reinforcement learning just barely makes it on the graph, with DQN two orders of magnitude lower than any other point on the graph. I'm really curious what deep RL could solve given AlphaGo levels of compute.

80K podcast with Allan Dafoe (Allan Dafoe and Rob Wiblin): A long interview with Allan Dafoe about the field of AI policy, strategy and governance. It discusses challenges for AI policy that haven't arisen before (primarily because AI is a dual use technology), the rhetoric around arms races, and autonomous weapons as a means to enable authoritarian regimes, to give a small sampling. One particularly interesting tidbit (to me) was that Putin has said that Russia will give away its AI capabilities to the world, because an arms race would be dangerous.

My opinion: Overall this is a great introduction to the field, I'd probably recommend people interested in the area to read this before any of the more typical published papers. I do have one disagreement -- Allan claims that even if we stopped Moore's law, and stopped algorithmic scientific improvement in AI, there could be some extreme systematic risks that emerge from AI -- mass labor displacement, creating monopolies, mass surveillance and control (through robot repression), and strategic stability. I would be very surprised if current AI systems would be able to lead to mass labor displacement and/or control through robot repression. We are barely able to get machines to do anything in the real world right now -- something has to improve quite drastically, and if it's neither compute nor algorithms, then I don't know what it would be. The other worries seem plausible from the technical viewpoint.

Technical AI alignment

Iterated distillation and amplification

Challenges to Christiano’s capability amplification proposal (Eliezer Yudkowsky): Summarized in the highlights!

Forecasting

Why Is the Human Brain So Efficient? (Liqun Luo): Overall point for this audience is that, despite how slow and imprecise neuron signals are, the human brain beats computers because of how massively parallel it is.

Field building

Critch on career advice for junior AI-x-risk-concerned researchers (Andrew Critch, via Rob Bensinger): A common piece of advice for aspiring AI x-risk researchers is to work on AI capabilities research in order to skill up so they can later contribute to safety. However, Critch is worried that such researchers will rationalize their work as being "relevant to safety", leading to a false sense of security since AI researchers are now surrounded by people who are "concerned about safety", but aren't actually doing safety research. Note that Critch would still advise young researchers to get into grad school for AI, but to be aware of this effect and not feel any pressure to do safety research and to avoid rationalizing whatever research they are doing.

My opinion: I feel pretty unqualified to have an opinion here on how strong this effect is -- it's pretty far outside of my experience. At the very least it's a consideration we should be aware about, and Critch supports it better in the full post, so I'd recommend you read it.

Near-term concerns

Fairness and bias

Delayed Impact of Fair Machine Learning (Lydia T. Liu et al): Consider a bank that has to choose which loan applications should be approved based on a credit score. Typically, fairness in this setting is encoded by saying that there should be some sort of parity between groups (and different criteria have been proposed for what actually should be the same). However, if you model the actual outcomes that come from the decision (namely, profit/loss to the bank and changes in credit score to the applicant), you can see that standard fairness criteria lead to suboptimal outcomes. As a result, in general you want to look at the delayed impact of ML models.

My opinion: This actually feels quite related to the value alignment problem -- in general, we care about things besides fairness, and if we try to optimize directly for fairness, then we'll be giving up good outcomes on other dimensions. It's another case of Goodhart's law, where "fairness" was a proxy for "good for disadvantaged groups".

Machine ethics

Tech firms move to put ethical guard rails around AI (Tom Simonite): A description of the ethics boards that tech companies are putting up.

AI strategy and policy

80K podcast with Allan Dafoe (Allan Dafoe and Rob Wiblin): Summarized in the highlights!

Policy Researcher (OpenAI): There is a job opportunity at OpenAI as a policy researcher, which does not seem to have any formal requirements.

My opinion: It seems like a lot of the best policy work is happening at OpenAI (see for example the OpenAI charter), I strongly encourage people to apply!

AI capabilities

Reinforcement learning

Reward Estimation for Variance Reduction in Deep Reinforcement Learning (Joshua Romoff et al)

Critiques (Capabilities)

To Build Truly Intelligent Machines, Teach Them Cause and Effect (Kevin Hartnett interviewing Judea Pearl): An interview with Judea Pearl about causality, deep learning, and where the field is going.

My opinion: This is fairly superficial, if you've read any of the other things that Pearl himself has written about deep learning, you'll know all of this already.

Miscellaneous (Capabilities)

AI and Compute (Dario Amodei et al): Summarized in the highlights!

How artificial intelligence is changing science (Nathan Collins): AI is being used in many different projects across many different fields at Stanford. This post has a list of a whole bunch of scientific projects that AI is helping with.

AI ALIGNMENT FORUM
AF