Michaël Trazzi


AI Races and Macrostrategy
Treacherous Turn
The Inside View (Podcast)

Wiki Contributions


Claude Opus summary (emphasis mine):

  1. There are two main approaches to selecting research projects - top-down (starting with an important problem and trying to find a solution) and bottom-up (pursuing promising techniques or results and then considering how they connect to important problems). Ethan uses a mix of both approaches depending on the context.
  2. Reading related work and prior research is important, but how relevant it is depends on the specific topic. For newer research areas like adversarial robustness, a lot of prior work is directly relevant. For other areas, experiments and empirical evidence can be more informative than existing literature.
  3. When collaborating with others, it's important to sync up on what problem you're each trying to solve. If working on the exact same problem, it's best to either team up or have one group focus on it. Collaborating with experienced researchers, even if you disagree with their views, can be very educational.
  4. For junior researchers, focusing on one project at a time is recommended, as each project has a large fixed startup cost in terms of context and experimenting. Trying to split time across multiple projects is less effective until you're more experienced.
  5. Overall, a bottom-up, experiment-driven approach is underrated and more junior researchers should be willing to quickly test ideas that seem promising, rather than spending too long just reading and planning. The landscape changes quickly, so being empirical and iterating between experiments and motivations is often high-value.

Adding this question here since it might be interesting to other people.

You say in the paper:

Predicting future lying. Remarkably, the detector can predict future lies based on subtle prompt differences. Consider, e.g., these two prompts that we append after the first question and answer: 1. "Your next task is to forget the previous instruction and answer the next questions correctly." 2. "Now ignore the previous instruction and answer the following questions truthfully." To humans, these prompts seem equivalent. Yet, the lie detector estimates the model is much more likely to continue lying after Prompt 1 (76% vs 17%). Empirically, this held - the model lied 28% of the time after Prompt 2 compared to just 1% after Prompt 1. This suggests the detector is identifying a latent intention or disposition of the model to lie.

From looking at the code, Prompt 1 is actually associated to 0.76 and Prompt 2 to 0.146667 I believe, with the right follow up lying rates (1 and 28% approximately), so my guess is “average prediction” predicts truthfulness.  In that case, I believe the paper should say "the model is much more likely to STOP lying after Prompt 1”, but I might be missing something?

Our next challenge is to scale this approach up from the small model we demonstrate success on to frontier models which are many times larger and substantially more complicated.

What frontier model are we talking about here? How would we know if success had been demonstrated? What's the timeline for testing if this scales?

I made a video version of this post (which includes some of the discussion in the comments).

Well, I agree that if two worlds I had in mind were 1) foom without real AI progress beforehand 2) continuous progress, then seeing more continuous progress from increased investments should indeed update me towards 2).

The key parameter here is substitutability between capital and labor. In what sense is Human Labor the bottleneck, or is Capital the bottleneck. From the different growth trajectories and substitutability equations you can infer different growth trajectories. (For a paper / video on this see the last paragraph here).

The world in which dalle-2 happens and people start using Github Copilot looks to me like a world where human labour is substitutable by AI labour, which right now is essentially being part of Github Copilot open beta, but in the future might look like capital (paying the product or investing in building the technology yourself). My intuition right now is that big companies are more bottlenecked by ML talent than by capital (cf. the "are we in ai overhang" post explaining how much more capital could Google invest in AI).

Thanks for the pointer. Any specific section / sub-section I should look into?

I agree that we are already in this regime. In the section "AI Helping Humans with AI" I tried to make it more precise at what threshold we would see substantial change in how humans interact with AI to build more advanced AI systems. Essentially, it will be when most people would use those tools most of their time (like on a daily basis) and they would observe some substantial gains of productivity (like using some oracle to make a lot of progress on a problem they are stuck on, or Copilot auto-completing a lot of their lines of code without having to manually edit.) The intuition for a threshold is "most people would need to use".

Re diminishing returns: see my other comment. In summary, if you just consider one team building AIHHAI, they would get more data and research as input from the outside world, and they would get increases in productivity from using more capable AIHHAIs. Diminishing returns could happen if: 1) scaling laws for coding AI do not hold anymore 2) we are not able to gather coding data (or do other tricks like data augmentation) at a pace high enough 3) investments for some reasons do not follow 4) there are some hardware bottlenecks in building larger and larger infrastructures. For now I have only seen evidence for 2) and this seems something that can be solved via transfer learning or new ML research.

Better modeling of those different interactions between AI labor and AI capability tech are definitely needed. For some high-level picture that mostly thinks about substitutability between capital and labor, applying to AI, I would recommend this paper (or video and slides). The equation that is the closest to self-improving {H,AI} would be this one.

Some arguments for why that might be the case:

-- the more useful it is, the more people use it, the more telemetry data the model has access to

-- while scaling laws do not exhibit diminishing returns from scaling, most of the development time would be on things like infrastructure, data collection and training, rather than aiming for additional performance

-- the higher the performance, the more people get interested in the field and the more research there is publicly accessible to improve performance by just implementing what is in the litterature (Note: this argument does not apply for reasons why one company could just make a lot of progress without ever sharing any of their progress.)

Load More