AI ALIGNMENT FORUM
AF

Nikola Jurkovic
Ω10021
Message
Dialogue
Subscribe

contact: jurkovich.nikola@gmail.com

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
0nikola's Shortform
1y
0
METR: Measuring AI Ability to Complete Long Tasks
Nikola Jurkovic4mo116

This has been one of the most important results for my personal timelines to date. It was a big part of the reason why I recently updated from ~3 year median to ~4 year median to AI that can automate >95% of remote jobs from 2022, and why my distribution overall has become more narrow (less probability on really long timelines).

Reply
Scenario Forecasting Workshop: Materials and Learnings
Nikola Jurkovic1y20

Romeo Dean and I ran a slightly modified version of this format for members of AISST and we found it a very useful and enjoyable activity!

We first gathered to do 2 hours of reading and discussing, and then we spent 4 hours switching between silent writing and discussing in small groups.

The main changes we made are:

  1. We removed the part where people estimate probabilities of ASI and doom happening by the end of each other’s scenarios.
  2. We added a formal benchmark forecasting part for 7 benchmarks using private Metaculus questions (forecasting values at Jan 31 2025):
    1. GPQA
    2. SWE-bench
    3. GAIA
    4. InterCode (Bash)
    5. WebArena
    6. Number of METR tasks completed
    7. ELO on LMSys arena relative to GPT-4-1106

We think the first change made it better, but in hindsight we would have reduced the number of benchmarks to around 3 (GPQA, SWE-bench and LMSys ELO), or given participants much more time.

Reply
Inflection.ai
2y
13Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]
3mo
0