All of stuhlmueller's Comments + Replies

Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns

Rohin has created his posterior distribution! Key differences from his prior are at the bounds:

  • He now assigns 3% rather than 0.1% to the majority of AGI researchers already agreeing with safety concerns.
  • He now assigns 40% rather than 35% to the majority of AGI researchers agreeing with safety concerns after 2100 or never.

Overall, Rohin’s posterior is a bit more optimistic than his prior and more uncertain.

Ethan Perez’s snapshot wins the prize for the most accurate prediction of Rohin's posterior. Ethan kept a similar distribution shape w... (read more)

Ought: why it matters and ways to help

Thanks for this post, Paul!

NOTE: Response to this post has been even greater than we expected. We received more applications for experiment participant than we currently have the capacity to manage so we are temporarily taking the posting down. If you've applied and don't hear from us for a while, please excuse the delay! Thanks everyone who has expressed interest - we're hoping to get back to you and work with you soon.

Factored Cognition

What I'd do differently now:

  • I'd talk about RL instead of imitation learning when I describe the distillation step. Imitation learning is easier to explain, but ultimately you probably need RL to be competitive.
  • I'd be more careful when I talk about internal supervision. The presentation mixes up three related ideas:
    • (1) Approval-directed agents: We train an ML agent to interact with an external, human-comprehensible workspace using steps that an (augmented) expert would approve.
    • (2) Distillation: We train an ML agent to implement a function fro
... (read more)