With apologies for the belated response: I think greghb makes a lot of good points here, and I agree with him on most of the specific disagreements with Daniel. In particular:
Overall, I think that datasets/environments are plausible as a major blocker to transformative AI, and I think Bio Anchors would be a lot stronger if it had more to say about this.
I am sympathetic to Bio Anchors's bottom-line quantitative estimates despite this, though (and to be clear, I held all of these positions at the time Bio Anchors was drafted). It's not easy for me to explain all of where I'm coming from, but a few intuitions:
Hopefully that gives a sense of where I'm coming from. Overall, I think this is one of the most compelling objections to Bio Anchors; I find it stronger than the points Eliezer focuses on above (unless you are pretty determined to steel-man any argument along the lines of "Brains and AIs are different" into a specific argument about the most important difference.)
I don't think I am following the argument here. You seem focused on the comparison with evolution, which is only a minor part of Bio Anchors, and used primarily as an upper bound. (You say "the number is so vastly large (and actually unknown due to the 'level of details' problem) that it's not really relevant for timelines calculations," but actually Bio Anchors still estimates that the evolution anchor implies a ~50% chance of transformative AI this century.)
Generally, I don't see "A and B are very different" as a knockdown counterargument to "If A required ___ amount of compute, my guess is that B will require no more." I'm not sure I have more to say on this point that hasn't already been said - I acknowledge that the comparisons being made are not "tight" and that there's a lot of guesswork, and the Bio Anchors argument doesn't go through without some shared premises and intuitions, but I think the needed intuitions are reasonable and an update from Bio-Anchors-ignorant starting positions is warranted.
Noting that I commented on this (and other topics) here: https://www.cold-takes.com/replies-to-ancient-tweets/#20220331
The Bio Anchors report is intended as a tool for making debates about AI timelines more concrete, for those who find some bio-anchor-related bound helpful (e.g., some think we should lower bound P(AGI) at some reasonably high number for any year in which we expect to hit a particular kind of "biological anchor"). Ajeya's work lengthened my own timelines, because it helped me understand that some bio-anchor-inspired arguments for shorter timelines didn't have as much going for them as I'd thought; but I think it may have shortened some other folks'.
(The presentation of the report in the Most Important Century series had a different aim. That series is aimed at making the case that we could be in the most important century, to a skeptic.)
I don't personally believe I have a high-enough-quality estimate using another framework that I'd be justified in ignoring bio-anchors-based reasoning, but I don't think it's wild to think someone else might have such an estimate.
I agree with this. I often default to acting as though we have ~10-15 years, partly because I think leverage is especially high conditional on timelines in that rough range.
I'm not sure why this isn't a very general counterexample. Once we've decided that the human imitator is simpler and faster to compute, don't all further approaches (e.g., penalizing inconsistency) involve a competitiveness hit along these general lines? Aren't they basically designed to drag the AI away from a fast, simple human imitator toward a slow, complex reporter? If so, why is that better than dragging the AI from a foreign ontology toward a familiar ontology?
Can you explain this: "In Section: specificity we suggested penalizing reporters if they are consistent with many different reporters, which effectively allows us to use consistency to compress the predictor given the reporter." What does it mean to "use consistency to compress the predictor given the reporter" and how does this connect to penalizing reporters if they are consistent with many different predictors?
Here are a couple of hand-wavy "stub" proposals that I sent over to ARC, which they thought were broadly intended to be addressed by existing counterexamples. I'm posting them here so they can respond and clarify why these don't qualify.
*Proposal 1: force ontological compatibility*
On page 34 of the ELK gdoc, the authors talk about the possibility that training an AI hard enough produces a model that has deep mismatches with human ontology - that is, it has a distinct "vocabulary of basic concepts" (or nodes in a Bayes net) that are distinct from the ones humans can build understanding of (via doing science on the compute budget available). Because of this, even AI assistance can't help humans understand everything the SmartVault AI understands. This is central to the challenge that most of the writeup is contending with - if not for the mismatch, "AIs explaining things to humans" could ensure that the trickery we're worried about doesn't happen.
The proposal here is to include a term in the loss function that incentivizes the AI to have a human-compatible ontology. For a cartoonish example, imagine that the term works this way: "The AI model gets a higher score to the degree that people doing 'digital neuroscience' would have an easier time, and find more interesting things, probing its 'digital brain.'" So an AI with neurons corresponding to diamonds, robbers, sensors, etc. would outscore an AI whose neurons can't easily be seen to correspond to any human-familiar concepts.
With this term included, you can "train the AI hard" without having it diverge from human-friendly ontology.
*Proposal 2: "mini-model"*
In this proposal, we construct a head for the AI model whose role is to construct a small AI model that performs as well as possible on the same (predictor, reporter) tasks. In each round of training, the master AI model can modify the mini-model; it also receives a loss based on how well the *previous round's* mini-model performs on this set of examples. Crucially, the mini-model is sufficiently small that we're confident it is "significantly less intelligent" than the (perhaps AI-assisted) human supervisor.
The basic hope here is that a "human imitator" model would construct a small model that also tries to imitate the human supervisor - but that this model would perform very badly, because trying to imitate the _reasoning process of a larger model_ is a lot more difficult than simply trying to reason about the world and translate concepts. Or, the "human imitator" model could build a "mini-model" based on entirely different principles, but the hope is that this makes things harder for it compared to the direct translator, which is just compressing what it has already built.
There are lots of potential failure modes here, e.g. maybe it's just not that hard to have a mini-model that successfully imitates the human. I didn't get that far with this one, but it was apparently enough for ARC to think it's already counterexampled by existing counterexamples :)
The bad reporter needs to specify the entire human model, how to do inference, and how to extract observations. But the complexity of this task depends only on the complexity of the human’s Bayes net.If the predictor's Bayes net is fairly small, then this may be much more complex than specifying the direct translator. But if we make the predictor's Bayes net very large, then the direct translator can become more complicated — and there is no obvious upper bound on how complicated it could become. Eventually direct translation will be more complex than human imitation, even if we are only trying to answer a single narrow category of questions.
The bad reporter needs to specify the entire human model, how to do inference, and how to extract observations. But the complexity of this task depends only on the complexity of the human’s Bayes net.
If the predictor's Bayes net is fairly small, then this may be much more complex than specifying the direct translator. But if we make the predictor's Bayes net very large, then the direct translator can become more complicated — and there is no obvious upper bound on how complicated it could become. Eventually direct translation will be more complex than human imitation, even if we are only trying to answer a single narrow category of questions.
This isn't clear to me, because "human imitation" here refers (I think) to "imitation of a human that has learned as much as possible (on the compute budget we have) from AI helpers." So as we pour more compute into the predictor, that also increases (right?) the budget for the AI helpers, which I'd think would make the imitator have to become more complex.
In the following section, you say something similar to what I say above about the "computation time" penalty ("If the human simulator had a constant time complexity then this would be enough for a counterexample. But the situation is a little bit more complex, because the human simulator we’ve described is one that tries its best at inference.") I'm not clear on why this applies to the "computation time" penalty and not the complexity penalty. (I also am not sure whether the comment on the "computation time" penalty is saying the same thing I'm saying; the meaning of "tries its best" is unclear to me.)