Raymond Arnold

LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

Wikitag Contributions

Comments

Sorted by

That's not really what I had in mind, but I had in mind something less clear than I thought. The spirit is about "can the AI come up with novel concepts",

I think one reason I think the current paradigm is "general enough, in principle", is that I don't think "novel concepts" is really The Thing. I think creativity / intelligence mostly is about is combining concepts, it's just that really smart people are 

a) faster in raw horsepower and can handle more complexity at a time

b) have a better set of building blocks to combine or apply to make new concepts (which includes building blocks for building better building blocks)

c) have a more efficient search for useful/relevant building blocks (both metacognitive and object-level).

Maybe you believe this, and think that "well yeah, it's the efficient search that's the important part, which we still don't actually have a real working version of?"?

It seems like the current models have basically all the tools a moderately smart human have, with regards to generating novel ideas, and the thing that they're missing is something like "having a good metacognitive loop such that they notice when they're doing a fake/dumb version of things, and course correcting" and "persistently pursue plans over long time horizons." And it doesn't seem to have zero of either of those, just not enough to get over some hump.

I don't see what's missing that a ton of training on a ton of diverse, multimodal tasks + scaffoldin + data flywheel isn't going to figure out.

(my guess is you took more like 15-25 minutes per question? Hard to tell from my notes, you may have finished early but I don't recall it being crazy early)

(This seems like more time than Buck was taking – the goal was to not get any wrong so it wasn't like people were trying to crank through them in 7 minutes)

The problems I gave were (as listed in the csv for the diamond problems) 

  • #1 (Physics) (1 person got right, 3 got wrong, 1 didn't answer)
  • #2 (Organic Chemistry), (John got right, I think 3 people didn't finish)
  • #4 (Electromagnetism), (John and one other got right, 2 got wrong)
  • #8 (Genetics) (3 got right including John)
  • #10 (Astrophysics) (5 people got right)

I at least attempted to be filtering the problems I gave you for GPQA diamond, although I am not very confident that I succeeded. 

(Update: yes, the problems John did were GPQA diamond. I gave 5 problems to a group of 8 people, and gave them two hours to complete however many they thought they could complete without getting any wrong)

Note: I plan to extend the Nomination phase through ~Monday, I didn't mean for it to end partway through the weekend.

I haven't had much success articulating why.

I'd be interested in a more in-depth review where you take another pass at this.

A thing unclear to me: is it worth hiding the authors from the Voting page?

On the first LessWrong Review, we deliberately hid authors and randomized the order of the voting results. A few year later, we've mostly shifted towards "help people efficiently sort through the information" rather than "making sure the presentation is random/fair." It's not like people don't know who the posts are by once they start reading them.

Curious what people think.

I would find this post easier to remember and link to if it were called "Serial vs Parallel Research Time", or something like that which points more at the particular insight the post provides.

Yeah the LW team has been doing this sort of thing internally, still in the experimental phase. I don't know if we've used all the tricks listed here yet. 

I think two major cruxes for me here are:

  • is it actually tractable to affect Deepmind's culture and organizational decisionmaking
  • how close to the threshold is Anthropic for having a good enough safety culture?

My current best guess is that Anthropic is still under the threshold for good enough safety culture (despite seeming better than I expected in a number of ways), and meanwhile that Deepmind is just too intractably far gone. 

I think people should be hesitant to work at any scaling lab, but, I think Anthropic might be possible to make "the one actually good scaling lab", and I don't currently expect that to be tractable at Deepmind and I think "having at least one" seems good for the world (although it's a bit hard for me to articulate why at the moment)

I am interested in hearing details about Deepmind that anyone thinks should change my mind about this.

This viewpoint is based on having spent at least 10s of hours trying to learn and about influence both org's culture, at various times.

In both cases, I don't get the sense that people at the orgs really have a visceral sense that "decisionmaking processes can be fake", I think they will be fake by default and the org is better modeled as following general incentives, and DeepMind has too many moving people and moving parts at a low enough density that it doesn't seem possible to fix. For me to change my mind about this, I would need to someone there to look me in the eye and explain that they do have a visceral sense of how organizational decisionmaking processes can be fake, and why they nonetheless think DeepMind is tractable to fix. I assume it's hard for @Rohin Shah and @Neel Nanda can't really say anything publicly that's capable of changing my mind for various confidentiality and political reasons, but, like, that's my crux.

(conving me in more general terms "Ray, you're too pessimistic about org culture" would hypothetically somehow work, but, you have a lot of work to do given how thoroughly those pessimistic predictions came true about OpenAi)

I think Anthropic also has this problem, but the threshold of almost-aligned-leadership and actually-pretty-aligned people that it feels at least possible to me for the to fix it. The main things that would persuade me that they are over the critical threshold is if they publicly spent social capital on clearly spelling out why the x-risk problem is hard, and made explicit plans to not merely pause for a bit when they hit an RSP threshold, but (at least in some circumstances) advocate strongly for global government shutdown for like 20+ years.

Load More