Alignment Newsletter is a weekly publication with recent content relevant to AI alignment around the world. Find all Alignment Newsletter resources here. In particular, you can look through this spreadsheet of all summaries that have ever been in the newsletter.

Audio version here (may not be up yet).

Please note that while I work at DeepMind, this newsletter represents my personal views and not those of my employer.


Could Advanced AI Drive Explosive Economic Growth? (Tom Davidson) (summarized by Rohin): Some (AN #121) previous (AN #105) work (AN #145) has suggested that by 2100 there is a non-trivial chance that AI could lead to explosive growth, that is, a growth rate of 30% (i.e. a doubling time of 2-3 years), 10x the current growth rate of ~3%. What does economics have to say about the matter?

This report investigates the following three stories:

1. Ignorance story: In this story, we don’t know how growth is determined, and attempts to forecast it based on models of how growth works are likely to be wrong. Note that this is perfectly compatible with explosive growth. We know that the growth rate has increased by orders of magnitude over the past millennia; so on an ignorance story we certainly shouldn’t rule out that the growth rate could increase by an order of magnitude again.

2. Standard story: This story focuses on the last ~century of growth, noting that the growth rate has stayed relatively constant at 2-3% per year, and thus predicting that future growth will be exponential (i.e. a constant growth rate), or possibly subexponential.

3. Explosive story: This story focuses on growth models with positive feedback loops, in which increased output leads to increased inputs which leads to even larger outputs, resulting in superexponential (and explosive) growth.

The author is interested in whether explosive growth is plausible, and so is most interested in arguments that argue for the standard story and against the ignorance or explosive stories, or vice versa. The main empirical facts we have are that the growth rate increased (maybe continuously, maybe not, it’s hard to tell) until about a century ago, when it plateaued at the current level of 2-3%. What can we then learn from economic growth models?

1. Ideas-based models of economic growth suggest that growth in output is driven primarily by the rate at which we get ideas (leading to technological improvement), which in turn is driven by population size, which in turn is driven by output (completing the positive feedback cycle). This predicts increases in the growth rate as long as population growth rate is increasing. A century ago, we underwent the “demographic transition” where, as we produced more output, instead of having more kids we became richer, breaking the positive feedback loop and preventing population size from growing. This fits our empirical facts well, and if we now assume that AI can also generate ideas, then the feedback loop is reestablished and we should expect explosive growth.

2. Economists have tried to find growth models that robustly predict exponential growth alongside a slowly growing population, but have mostly not found such models, suggesting that our current exponential growth might be an anomaly that will eventually change. The best explanations of exponential growth imply that future growth will be sub-exponential given that population growth is predicted to slow down.

3. Most economic growth models, including the ones in the previous point, predict explosive growth if you add in an assumption that AI systems can replace human workers.

Thus, it seems that economic growth theory suggests that explosive growth is probable, conditional on the assumption that we develop AI systems that can replace arbitrary human workers.

You could object to these arguments on several grounds. The ones that the author finds partially convincing are:

1. We don’t see any trends of explosive growth right now -- this suggests that we at least won’t see explosive growth in the next couple of decades (though it’s harder to make claims all the way out to 2100).

2. If there are a few key “bottleneck” tasks that (a) are crucial for growth and (b) can’t be automated by AI, then those tasks may limit growth.

3. There may be physical limits on growth that we haven’t yet encountered: for example, growth may be bottlenecked on running experiments in the real world, extracting and transporting raw materials, delays for humans to adjust to new technology, etc.

Another objection is that ideas are getting harder to find, which would surely prevent explosive growth. The author is not convinced by this objection, because the growth models predicting explosive growth already take this into account, and still predict explosive growth. (Roughly, the superexponential increase in the inputs “overpowers” the exponential increase in the difficulty of finding good ideas.)

Read more: Blog post

Rohin's opinion: I find the economic take on AI to be particularly interesting because it makes the “automation” frame on AI the default one, as opposed to the “superintelligent goal-directed agent” frame that we often work with in AI alignment. The critical assumption needed in this automation frame is that AI systems can automate ~every task that a human worker could do. This is what enables the positive feedback loop to work (which is the automation version of recursive self-improvement).

I generally prefer the automation frame for thinking about and predicting how AI systems are integrated into the world, while preferring the agent frame for thinking about how AI systems might cause alignment problems (i.e. ignoring misuse and structural risks (AN #46)). Many of my disagreements with CAIS (AN #40) feel like cases where I think it is appropriate to use the agent frame rather than the automation frame. I would classify several newer alignment risk (AN #50) stories (AN #146) as taking the same agent-based cause of alignment failure as in (say) Superintelligence, but then telling a story in which the deployment of the misaligned AI system is automation-based.

I think it is generally worth spending some time meditating on the growth models explored in this post, and what implications they would have for AI development (and thus for AI alignment). For example, some models emphasize that there are many different tasks and suggest (not conclusively) that we’ll have different AI systems for different tasks. In such a world, it doesn’t seem very useful to focus on teaching AI systems about humanity’s true values, as they are going to be asked to do particular tasks that are pretty divorced from these “true values”.

Note that I am not an economist. This means that there’s a higher chance than usual that I’ve accidentally inserted an erroneous claim into this summary and opinion. It is also the reason why I don’t usually summarize econ papers that are relevant to AI -- I’ve summarized this one because it’s explained at a level that I can understand. If you’re interested in this area, other papers include Economic Growth Given Machine Intelligence and Economic growth under transformative AI.



AXRP Episode 8 - Assistance Games (Daniel Filan and Dylan Hadfield-Menell) (summarized by Rohin): As with most other podcasts, I will primarily link you to my past summaries of the papers discussed in the episode. In this case they were all discussed in the special issue AN #69 on Human Compatible and the various papers relevant to it. Some points that I haven’t previously summarized:

1. The interviewee thinks of assistance games as an analytical tool that allows us to study the process by which humans convey normative information (such as goals) to an AI system. Normally, the math we write down takes the objective as given, whereas an assistance game uses math that assumes there is a human with a communication channel to the AI system. We can thus talk mathematically about how the human communicates with the AI system.

2. This then allows us to talk about issues that might arise. For example, assistive bandits (AN #70) considers the fact that humans might be learning over time (rather than starting out as optimal).

3. By using assistance games, we build the expectation that our AI systems will have ongoing oversight and adaptation directly into the math, which seems significantly better than doing this on an ad hoc basis (as is currently the case). This should help both near-term and long-term systems.

4. One core question is how we can specify a communication mechanism that is robust to misspecification. We can operationalize this as: if your AI system is missing some relevant features about the world, how bad could outcomes be? For example, it seems like demonstrating what you want (i.e. imitation learning) is more robust than directly saying what the goal is.

5. One piece of advice for deep learning practitioners is to think about where the normative information for your AI system is coming from, and whether it is sufficient to convey what you want. For example, large language models have trillions of parameters, but only hundreds of decisions inform the choice of what data to train them on -- is that enough? The language we train on has lots of normative content -- does that compensate?

6. Dylan says: “if you’re interested in doing this type of work and you thought this conversation was fun and you’d like to have more conversations like it with me, I’ll invite you to apply to MIT’s EECS PhD program next year and mention me in your application.”

Rohin's opinion: I’m a big fan of thinking about how normative information is transferred from us to our agents -- I frequently ask myself questions like “how does the agent get the information to know X”, where X is something normative like “wireheading is bad”.

In the case of large neural nets, I generally like assistance games as an analysis tool for thinking about how such AI systems should behave at deployment time, for the reasons outlined in the podcast. It’s less clear what the framework has to say about what should be done about training time, when we don’t expect to have a human in the loop (or we expect that to be a relative minority of our training data).

To be clear, this should be taken as an endorsement of thinking about assistance games: my point is just that (according to me) it is best to think of them in relation to deployment, not training. A framework doesn’t have to apply to everything in order to be well worth thinking about.


Parameter counts in Machine Learning (Jaime Sevilla et al) (summarized by Rohin): This post presents a dataset of the parameter counts of 139 ML models from 1952 to 2021. The resulting graph is fairly noisy and hard to interpret, but suggests that:

1. There was no discontinuity in model size in 2012 (the year that AlexNet was published, generally acknowledged as the start of the deep learning revolution).

2. There was a discontinuity in model size for language in particular some time between 2016-18.

Rohin's opinion: You can see my thoughts on the trends in model size in this comment.

Deep limitations? Examining expert disagreement over deep learning (Carla Zoe Cremer) (summarized by Rohin): This paper reports on the results of a qualitative survey of 25 experts, conducted in 2019 and early 2020, on the possibility of deep learning leading to high-level machine intelligence (HLMI), defined here as an “algorithmic system that performs like average adults on cognitive tests that evaluate the cognitive abilities required to perform economically relevant tasks”. Experts disagreed strongly on whether deep learning could lead to HLMI. Optimists tended to focus on the importance of scale, while pessimists tended to emphasize the need for additional insights.

Based on the interviews, the paper gives a list of 40 limitations of deep learning that some expert pointed to, and a more specific list of five areas that both optimists and pessimists pointed to as in support of their views (and thus would likely be promising areas to resolve disagreements). The five areas are (1) abstraction; (2) generalization; (3) explanatory, causal models; (4) emergence of planning; and (5) intervention.


Truth, Lies, and Automation: How Language Models Could Change Disinformation (Ben Buchanan et al) (summarized by Rohin): Ever since the publication of GPT-2 (AN #46), the research community has worried about the use of such language models for disinformation campaigns. Disinformation campaigns have happened before: Russia produced thousands of pieces of such content leading up to the 2016 US presidential election. That campaign used large numbers of human workers. Could a future campaign become significantly more effective through the use of large language models?

This report notes that for this threat model, it is primarily worrying if GPT-3 can be used to enable significantly better results, because the monetary cost of hiring humans is not typically a bottleneck for major actors. While GPT-3 by itself is not likely to achieve this, perhaps it can serve as an effective tool for humans, such that the human-machine team can get better results than either one individually.

The authors perform several tests of their own to establish a lower bound on how well human-machine teams can perform currently. They investigate six types of disinformation tasks and find that either GPT-3 can do them easily, or only some human effort is needed to get results that are perceived as high quality by humans, suggesting that this could be a real risk. Unfortunately, it is hard to tell what aspects are actually important for successful disinformation, and this was not something they could ethically check, so it is hard to draw confident conclusions from the study about whether GPT-3 would be useful for disinformation campaigns in practice. (Although their one study on Mechanical Turk did find that GPT-3-generated arguments on international issues like sanctions on China were found to be persuasive and led to significant changes in the proportion of people with the given position.)

One particularly worrying aspect is that the authors found it easier to get GPT-3 to generate extremist content because providing an extremist headline makes it easy to “locate” the appropriate tone and style; whereas with a more moderate headline, GPT-3 might not correctly infer the desired tone or style because the moderate headline could be consistent with lots of tones and styles.

Rohin's opinion: The most interesting part of this report for me was the example outputs that the authors gave in the report, which showcase how GPT-3 can be used to “argue” in support or against a variety of topics, in a manner meant to be persuasive to a specific group of people (for example, arguing to Jews that they should vote Democratic / vote Republican / not vote).

(I put “argue” in quotation marks because the outputs hardly feel like what I would call “arguments” for a position, instead simply appealing to something the group agrees with and stating with barely any argument / evidence that this implies the position to be argued for. However, I also have the same critique of most “arguments” that I see on Twitter -- I don’t think I could distinguish the GPT-3 generated arguments from real human tweets.)



Beijing Academy of Artificial Intelligence announces 1,75 trillion parameters model, Wu Dao 2.0 (summarized by Rohin): There’s a good chance you’ve heard of the new Wu Dao 2.0 language model, with over 1 trillion parameters. Unfortunately, as far as I know there is no technical writeup describing this model, so I’m going to refrain from commenting on it. You can see other people’s takes in the linked LessWrong post, on ChinAI, and on


I'm always happy to hear feedback; you can send it to me, Rohin Shah, by replying to this email.


An audio podcast version of the Alignment Newsletter is available. This podcast is an audio version of the newsletter, recorded by Robert Miles.

New Comment