Logan Zoellner


Sorted by New

Wiki Contributions


Let's take a concrete example.  

Assume you have an AI that could get 100% on every Putnam test, do you think it would be reasonable or not to assume such an AI would also display superhuman performance at solving the Yang-Mills Mass Gap?

This doesn't include working out advances in fundamental physics, or designing a fusion reactor, or making breakthroughs in AI research.

Why don't all of these fall into the self-play category?  Physics, software and fusion reactors can all be simulated.  

I would be mildly surprised if a sufficiently large language model couldn't solve all of Project Euler+Putnam+MATH dataset.

I strongly doubt we live in a data-limited AGI timeline

  1. Humans are trained using much less data than Chinchilla
  2. We haven't even begun to exploit forms of media other than text (Youtube alone is >2OOM bigger)
  3. self-play allows for literally limitless amounts of data
  4. regularization methods mean data constraints aren't nearly as important as claimed
  5. In the domains where we have exhausted available data, ML models are already weakly superhuman

I’m not quite sure what you mean here.


In the standard picture of a reinforcement learner, suppose you get to specify the reward function and i get to specify the "agent".  No matter what reward function you choose, I claim I can make an agent that both: 1) gets a huge reward compared to some baseline implementation 2) destroys the world.  In fact, I think most "superintelligent" systems have this property for any reward function you could specify using current ML techniques.

Now switch the order, I design the agent first and ask you for an arbitrary reward function.  I claim that there exist architectures which are: 1) useful, given the correct reward function 2) never, under any circumstances destroy the world.

What loss function(s), when sent into a future AI’s brain-like configuration of neocortex / hippocampus / striatum / etc.-like learning algorithms, will result in an AGI that is definitely not trying to literally exterminate humanity?


Specifying a correct loss functions is not the right way to think about the Alignment Problem.  A system's architecture matters much more than its loss function for determining whether or not it is dangerous.  In fact, there probably isn't even a well-defined loss function that would remain aligned under infinite optimization pressure.

I think you're confounding two questions:

  1. Does AIHHAI accelerate AI?
  2. If I observe AIHHAI does this update my priors towards Fast/Slow Takeoff?


I think it's pretty clear that AIHHAI accelerates AI development (without Copilot, I would have to write all those lines myself).


However, I think that observing AIHHAI should actually update your priors towards Slow Takeoff (or at least Moderate Takeoff).  One reason is because humans are inherently slower than machines, and as Amdahl reminds us if something is composed of a slow thing and a fast thing, it cannot go faster than the slow thing.  


The other reason is that AIHHAI should cause you to lower your belief in a threshold effect.  The original argument for Foom went something like "if computers can think like humans, and one thing humans can do is make better computers, then once computers are as smart as humans, computers will make even better computers... ergo foom."  In other words, Foom relies on the belief that there is a critical threshold which leads to an intelligence explosion.  However in a world where we observe AIHHAI, this is direct evidence against such a critical threshold, since it an example of a sub-human intelligence helping a human-level intelligence to advance AI.


The alternative model to Foom is something like this: "AI development is much like other economic growth, the more resources you have, the faster it goes."  AIHHAI is a specific example of such an economic input, where spending more of something helps us go faster.