Ethan Perez

I'm a Ph.D. student doing research on Natural Language Processing.

My research focuses on developing question-answering methods that generalize to harder questions than we have supervision for. Learning from human examples (supervised learning) won't scale to these kinds of questions, so I am investigating other paradigms that recursively break down harder questions into simpler ones (i.e., Debate and Iterated Amplification). Check out my website for more information about me/my research:

Wiki Contributions


[Link] A minimal viable product for alignment

What do you (or others) think is the most promising, soon-possible way to use language models to help with alignment? A couple of possible ideas:

  1. Using LMs to help with alignment theory (e.g., alignment forum posts, ELK proposals, etc.)
  2. Using LMs to run experiments (e.g., writing code, launching experiments, analyzing experiments, and repeat)
  3. Using LMs as research assistants (what Ought is doing with Elicit)
  4. Something else?
[Link] A minimal viable product for alignment

I understand that deceptive models won't show signs of deception :) That's why I made the remark of models not showing signs of prerequisites to scary kinds of deception. Unless you think there are going to be no signs of deception or any prerequisites, for any models before we get deceptive ones?

It also seems at least plausible that models will be imperfectly deceptive before they are perfectly deceptive, in which case we will see signs (e.g., in smaller models)

AMA Conjecture, A New Alignment Startup

I'm curious why you believe that having products will be helpful? A few particular considerations I would be interested to hear your take on:

  1. There seems to be abundant EA donor funding available from sources like FTX without the need for a product / for attracting non-EA investors
  2. Products require a large amount of resources to build/maintain
  3. Profitable products also are especially prone to accelerating race dynamics
AMA Conjecture, A New Alignment Startup

Why did you decide to start a separate org rather than joining forces with an existing org? I'm especially curious since state-of-the-art models are time-consuming/compute-intensive/infra-intensive to develop, and other orgs with safety groups already have that infrastructure. Also, it seems helpful to have high communication bandwidth between people working on alignment, in a way that is impaired by having many different orgs (especially if the org plans to be non-disclosure by default). Curious to hear how you are thinking about these things!

AMA Conjecture, A New Alignment Startup

Are you planning to be in-person or have some folks working remotely? Other similar safety orgs don't seem that flexible with in-person requirements, so it'd be nice to have a place for alignment work for those outside of {SF, London}

[Link] A minimal viable product for alignment

What are people's timelines for deceptive alignment failures arising in models, relative to AI-based alignment research being useful?

Today's language models are on track to become quite useful, without showing signs of deceptive misalignment or its eyebrow-raising pre-requisites (e.g., awareness of the training procedure), afaik. So my current best guess is that we'll be able to get useful alignment work from superhuman sub-deception agents for 5-10+ years or so. I'm very curious if others disagree here though

AI x-risk reduction: why I chose academia over industry

What are your thoughts for subfields of ML where research impact/quality depends a lot on having lots of compute?

In NLP, many people have the view that almost all of the high impact work has come from industry over the past 3 years, and that the trend looks like it will continue indefinitely. Even safety-relevant work in NLP seems much easier to do with access to larger models with better capabilities (Debate/IDA are pretty hard to test without good language models). Thus, safety-minded NLP faculty might end up in a situation where none of their direct work is very impactful, but all of the expected impact is by graduating students who end up going to work in industry labs in particular.  How would you think about this kind of situation?

Forecasting Thread: AI Timelines

Here is my Elicit Snapshot.

I'll follow the definition of AGI given in this Metaculus challenge, which roughly amounts to a single model that can "see, talk, act, and reason." My predicted distribution is a weighted sum of two component distributions described below:

  1. Prosaic AGI (25% probability). Timeline: 2024-2037 (Median: 2029): We develop AGI by scaling and combining existing techniques. The most probable paths I can foresee loosely involves 3 stages: (1) developing a language model with human-level language ability, then (2) giving it visual capabilities (i.e., talk about pictures and videos, solve SAT math problems with figures), and then (3) giving it capabilities to intelligently act in the world (i.e., trade stocks or navigate webpages). Below are my timelines for the above stages:
    1. Human-level Language Model: 1.5-4.5 years (Median: 2.5 years). We can predictably improve our language models by increasing model size (parameter count), which we can do in the following two ways:
      1. Scaling Language Model Size by 1000x relative to GPT3. 1000x is pretty feasible, but we'll hit difficult hardware/communication bandwidth constraints beyond 1000x as I understand.
      2. Increasing Effective Parameter Count by 100x using modeling tricks (Mixture of Experts, Sparse Tranformers, etc.)
    2. +Visual Capabilities: 2-6 extra years (Median: 4 years). We'll need good representation learning techniques for learning from visual input (which I think we mostly have). We'll also need to combine vision and language models, but there are many existing techniques for combining vision and language models to try here, and they generally work pretty well. A main potential bottleneck time-wise is that the language+vision components will likely need to be pretrained together, which slows the iteration time and reduces the number of research groups that can contribute (especially for learning from video, which is expensive). For reference, Language+Image pretrained models like ViLBERT came out 10 months after BERT did.
    3. +Action Capabilities: 0-6 extra years (Median: 2 years). GPT3-style zero-shot or few-shot instruction following is the most feasible/promising approach to me here; this approach could work as soon as we have a strong, pretrained vision+language model. Alternatively, we could use that model within a larger system, e.g. a policy trained with reinforcement learning, but this approach could take a while to get to work.
  2. Breakthrough AGI (75% probability). Timeline: Uniform probability over the next century: We need several, fundamental breakthroughs to achieve AGI. Breakthroughs are hard to predict, so I'll assume a uniform distribution that we'll hit upon the necessary breakthroughs at any year <2100, with 15% total probability mass after 2100 (a rough estimate); I'm estimating 15% roughly based on a 5% probability that we won't find the right insights by 2100, 5% probability that we have the right insights but not enough compute by 2100, and 5% probability to account for planning fallacy, unknown unknowns, and the fact that a number of top AI researchers believe that we are very far from AGI.

My probability for Prosaic AGI is based on an estimated probability of each of the 3 stages of development working (described above):

P(Prosaic AGI) = P(Stage 1) x P(Stage 2) x P(Stage 3) = 3/4 x 2/3 x 1/2 = 1/4


Updates/Clarification after some feedback from Adam Gleave:

  • Updated from 5% -> 15% probability that AGI won't happen by 2100 (see reasoning above). I've updated my Elicit snapshot appropriately.
  • There are other concrete paths to AGI, but I consider these fairly low probability to work first (<5%) and experimental enough that it's hard to predict when they will work. For example, I can't think of a good way to predict when we'll get AGI from training agents in a simulated, multi-agent environment (e.g., in the style of OpenAI's Emergent Tool Use paper). Thus, I think it's reasonable to group such other paths to AGI into the "Breakthrough AGI" category and model these paths with a uniform distribution.
  • I think you can do better than a uniform distribution for the "Breakthrough AGI" category, by incorporating the following information:
    • Breakthroughs will be less frequent as time goes on, as the low-hanging fruit/insights are picked first. Adam suggested an exponential decay over time / Laplacian prior, which sounds reasonable.
    • Growth of AI research community: Estimate the size of the AI research community at various points in time, and estimate the pace of research progress given that community size. It seems reasonable to assume that the pace of progress will increase logarithmically in the size of the research community, but I can also see arguments for why we'd benefit more or less from a larger community (or even have slower progress).
    • Growth of funding/compute for AI research: As AI becomes increasingly monetizable, there will be more incentives for companies and governments to support AI research, e.g., in terms of growing industry labs, offering grants to academic labs to support researchers, and funding compute resources - each of these will speed up AI development.