Wiki Contributions

Comments

Shah and Yudkowsky on alignment failures

CFAR used to have an awesome class called "Be specific!" that was mostly about concreteness. Exercises included:

  • Rationalist taboo
  • A group version of rationalist taboo where an instructor holds an everyday object and asks the class to describe it in concrete terms.
  • The Monday-Tuesday game
  • A role-playing game where the instructor plays a management consultant whose advice is impressive-sounding but contentless bullshit, and where the class has to force the consultant to be specific and concrete enough to be either wrong or trivial.
  • People were encouraged to make a habit of saying "can you give an example?" in everyday conversation. I practiced it a lot.

IIRC, Eliezer taught the class in May 2012? He talks about the relevant skills here and here. And then I ran it a few times, and then CFAR dropped it; I don't remember why.

My take on higher-order game theory

Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.

My take on higher-order game theory

No, I just took a look. The spin glass stuff looks interesting!

My take on higher-order game theory

I think you're saying , right? In that case, since embeds into , we'd have embedding into . So not really a step up.

If you want to play ordinal games, you could drop the requirement that agents are computable / Scott-continuous. Then you get the whole ordinal hierarchy. But then we aren't guaranteed equilibria in games between agents of the same order.

I suppose you could have a hybrid approach: Order is allowed to be discontinuous in its order- beliefs, but higher orders have to be continuous? Maybe that would get you to .

Yudkowsky and Christiano discuss "Takeoff Speeds"

I apologize, I shouldn't have leapt to that conclusion.

Yudkowsky and Christiano discuss "Takeoff Speeds"

it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.

By Gricean implicature, "everyone still dies" is relevant to the post's thesis. Which implies that the post's thesis is that humanity will not go extinct. But the post is about the rate of AI progress, not human extinction.

This seems like a bucket error, where "will takeoff be fast or slow?" and "will AI cause human extinction?" are put in the same bucket.

Yudkowsky and Christiano discuss "Takeoff Speeds"

The central hypothesis of "takeoff speeds" is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets

No, the slow takeoff model just precludes there being one big secret that unlocks both 30%/year growth and dyson spheres. It's totally compatible with a bunch of medium-sized $1B secrets that different actors discover, adding up to hyperbolic economic growth in the years leading up to "rising out of the atmosphere".

Rounding off the slow takeoff hypothesis to "lots and lots of little innovations adding up to every key AGI threshold, which lots of actors are investing $10 million in at a time" seems like black-and-white thinking, demanding that the future either be perfectly Thielien or perfectly anti-Thielien. The real question is a quantitative one — how lumpy will takeoff be?

Yudkowsky and Christiano discuss "Takeoff Speeds"

"Takeoff Speeds" has become kinda "required reading" in discussions on takeoff speeds. It seems like Eliezer hadn't read it until September of this year? He may have other "required reading" from the past four years to catch up on.

(Of course, if one predictably won't learn anything from an article, there's not much point in reading it.)

[This comment is no longer endorsed by its author]Reply
Confusions re: Higher-Level Game Theory

I feel excited about this framework! Several thoughts:

I especially like the metathreat hierarchy. It makes sense because if you completely curry it, each agent sees the foe's action, policy, metapolicy, etc., which are all generically independent pieces of information. But it gets weird when an agent sees an action that's not compatible with the foe's policy.

You hinted briefly at using hemicontinuous maps of sets instead of or in addition to probability distributions, and I think that's a big part of what makes this framework exciting. Maybe if one takes a bilimit of Scott domains or whatever, you can have an agent that can be understood simultaneously on multiple levels, and so evade commitment races. I haven't thought much about that.

I think you're right that the epiphenomenal utility functions are not good. I still think using reflective oracles is a good idea. I wonder if the power of Kakutani fixed points (magical reflective reasoning) can be combined with the power of Kleene fixed points (iteratively refining commitments).

Reflexive Oracles and superrationality: prisoner's dilemma

See also this comment from 2013 that has the computable version of NicerBot.

Load More