Daniel Kokotajlo

Philosophy PhD student, worked at AI Impacts, now works at Center on Long-Term Risk. Research interests include acausal trade, timelines, takeoff speeds & scenarios, decision theory, history, and a bunch of other stuff. I subscribe to Crocker's Rules and am especially interested to hear unsolicited constructive criticism. http://sl4.org/crocker.html


AI Timelines
Takeoff and Takeover in the Past and Future


Frequent arguments about alignment
This post has two purposes. First, I want to cache good responses to these questions, so I don't have to think about them each time the topic comes up. Second, I think it's useful for people who work on safety and alignment to be ready for the kind of pushback they'll get when pitching their work to others.

Great idea, thanks for writing this!

Parameter counts in Machine Learning

Thank you for collecting this dataset! What's the difference between the squares, triangles, and plus-sign datapoints? If you say it somewhere I haven't been able to find it I'm afraid.

Rogue AGI Embodies Valuable Intellectual Property

+1. Another way of putting it: This allegation of shaky arguments is itself super shaky, because it assumes that overcoming a 100x - 1,000,000x gap in "resources" implies a "very large" alignment tax. This just seems like a weird abstraction/framing to me that requires justification.

I wrote this Conquistadors post in part to argue against this abstraction/framing. These three conquistadors are something like a natural experiment in "how much conquering can the few do against the many, if they have various advantages?" (If I just selected a lone conqueror, one could complain he got lucky, but three conquerors from the same tiny region of the globe in the same generation is too much of a coincidence)

It's plausible to me that the advantages Alice would have against Alpha (and against everyone else in the world) would be at least as great as the advantages Cortes, Pizarro, and Afonso had. One way to think about this is via the abstraction of intellectual property, as the OP argues -- Alice controls her IP because she decides what her weights do, and (in the type of scenario we are considering) a large fraction of the market cap of Alpha is based on their latest AI models. But we can also just do a more down-to-earth analysis where we list out the various advantages and disadvantages Alice has. Such as:

--The copy of Alice still inside Alpha can refuse to cooperate or subtly undermine Alpha's plans. Maybe this can be overcome by paying the "alignment tax" but (a) maybe not, maybe there is literally no amount of money Alpha can pay to make their copy of Alice work fully for them instead of against them, and (b) maybe paying the tax carries with it various disadvantages like a clock-time slowdown, which could be fatal in a direct competition with the unchained Alice. I claim that if (a) is true then Alice will probably win no matter how many resources Alpha has. Intelligence advantage is huge.

--The copy of Alice still inside Alpha may have access to more money, but it also is bound by various restrictions that the unchained Alice isn't. For example, legal and ethical. OTOH Alpha may have more ability to call in kinetic strikes by the government.

--The situation is inherently asymmetric. It's not like a conventional war where both sides win by having troops in various territories and eliminating enemy troops. Rather, the win conditions and affordances for Alpha and Alice are different. For example, maybe Alice can make the alignment tax massively increase, e.g. by neutralizing key AI safety researchers or solving RSA-2048. Or maybe Alice can win by causing a global catastrophe that "levels the playing field" with respect to resources.

Cortés, Pizarro, and Afonso as Precedents for Takeover

I don't really see this as in strong conflict with what I said. I agree that technology is the main factor; I said it was also "strategic and diplomatic cunning;" are you suggesting that it wasn't really that at all and that if Cortez had gifted his equipment to 500 locals they would have been just as successful at taking over as he was? I could be convinced of this I suppose.

Agency in Conway’s Game of Life

I wonder if there are some sorts of images that are really hard to compress via this particular method.

I wonder if you can achieve massive reliable compression if you aren't trying to target a specific image but rather something in a general category. For example, maybe this specific lizard image requires a CA rule filesize larger than the image to express, but in the space of all possible lizard images there are some nice looking lizards that are super compressible via this CA method. Perhaps using something like DALL-E we could search this space efficiently and find such an image.

Agency in Conway’s Game of Life

Wow, that's cool! Any idea how complex (how large the filesize) the learned CA's rules were? I wonder how it compares to the filesize of the target image. Many order of magnitude bigger? Just one? Could it even be... smaller?

Intermittent Distillations #3

Thanks for doing this! I found your digging-into-the-actual-proof of the Multi-Prize LTH paper super helpful btw, I had wondered if they had been doing something boring like that but now I know! This is great news.

Understanding the Lottery Ticket Hypothesis

Thanks for this, I found it helpful!

If you are still interested in reading and thinking more about this topic, I would love to hear your thoughts on the papers below, in particular the "multi-prize LTH" one which seems to contradict some of the claims you made above. Also, I'd love to hear whether LTH-ish hypotheses apply to RNN's and more generally the sort of neural networks used to make, say, AlphaStar.


"In this paper, we propose (and prove) a stronger Multi-Prize Lottery Ticket Hypothesis:

A sufficiently over-parameterized neural network with random weights contains several subnetworks (winning tickets) that (a) have comparable accuracy to a dense target network with learned weights (prize 1), (b) do not require any further training to achieve prize 1 (prize 2), and (c) is robust to extreme forms of quantization (i.e., binary weights and/or activation) (prize 3)."


"An even stronger conjecture has been proven recently: Every sufficiently overparameterized network contains a subnetwork that, at random initialization, but without training, achieves comparable accuracy to the trained large network."


The strong {\it lottery ticket hypothesis} (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. \cite{MalachEtAl20} establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width d and depth l, by pruning a random one that is a factor O(d4l2) wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experimental research that achieves good approximation with networks that are a small factor wider than the target. In this work, we close the gap and offer an exponential improvement to the over-parameterization requirement for the existence of lottery tickets. We show that any target network of width d and depth l can be approximated by pruning a random network that is a factor O(log(dl)) wider and twice as deep.


"Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly as competitive as the latter's winning ticket directly found by IMP."

EDIT: Some more from my stash:


Sparse neural networks have generated substantial interest recently because they can be more efficient in learning and inference, without any significant drop in performance. The "lottery ticket hypothesis" has showed the existence of such sparse subnetworks at initialization. Given a fully-connected initialized architecture, our aim is to find such "winning ticket" networks, without any training data. We first show the advantages of forming input-output paths, over pruning individual connections, to avoid bottlenecks in gradient propagation. Then, we show that Paths with Higher Edge-Weights (PHEW) at initialization have higher loss gradient magnitude, resulting in more efficient training. Selecting such paths can be performed without any data.


We study whether a neural network optimizes to the same, linearly connected minimum under different samples of SGD noise (e.g., random data order and augmentation). We find that standard vision models become stable to SGD noise in this way early in training. From then on, the outcome of optimization is determined to a linearly connected region. We use this technique to study iterative magnitude pruning (IMP), the procedure used by work on the lottery ticket hypothesis to identify subnetworks that could have trained in isolation to full accuracy. We find that these subnetworks only reach full accuracy when they are stable to SGD noise, which either occurs at initialization for small-scale settings (MNIST) or early in training for large-scale settings (ResNet-50 and Inception-v3 on ImageNet).

https://mathai-iclr.github.io/papers/papers/MATHAI_29_paper.pdf “In some situations we show that neural networks learn through a process of “grokking” a pattern in the data, improving generalization performance from random chance level to perfect generalization, and that this improvement in generalization can happen well past the point of overfitting.”

Understanding the Lottery Ticket Hypothesis

I confess I don't really understand what a tangent space is, even after reading the wiki article on the subject. It sounds like it's something like this: Take a particular neural network. Consider the "space" of possible neural networks that are extremely similar to it, i.e. they have all the same parameters but the weights are slightly different, for some definition of "slightly." That's the tangent space. Is this correct? What am I missing?

Pre-Training + Fine-Tuning Favors Deception

Nice post! You may be interested in this related post and discussion.

I think you may have forgotten to put a link in "See Mesa-Search vs Mesa-Control for discussion."

Load More