AI ALIGNMENT FORUM
AF

Raymond Arnold
Ω7203217308
Message
Dialogue
Subscribe

LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
0Raemon's Shortform
8y
0
johnswentworth's Shortform
Raymond Arnold11d22

yeah, it's less that I'd bet it works now, just, whenever it DOES start working, it seems likely it'd be through this mechanism.

⚖ If Thane Ruthenis thinks there are AI tools that can meaningfully help with Math by this point, did they first have a noticeable period (> 1 month) where it was easier to get work out of them via working in lean-or-similar? (Raymond Arnold: 25% & 60%)

(I had a bit of an epistemic rollercoaster making this prediction, I updated "by the time someone makes an actually worthwhile Math AI, even if lean was an important part of it's training process, it's probably not that hard to do additional fine tuning that gets it to output stuff in a more standard mathy format. But, then, it seemed like it was still going to be important to quickly check it wasn't blatantly broken as part of the process)

Reply
johnswentworth's Shortform
Raymond Arnold11d20

Nod.

[disclaimer, not a math guy, only barely knows what he's talking about, if this next thought is stupid I'm interested to learn more]

I don't expect this to fix it right now, but, one thing I don't think you listed is doing the work in lean or some other proof assistant that lets you check results immediately? I expect LLMs to first be able to do math in that format because it's the format you can actually do a lot of training in. And it'd mean you can verify results more quickly.

My current vague understanding is that lean is normally too cumbersome to be a reasonable to work in, but, that's the sort of thing that could change with LLMs in the mix.

Reply
johnswentworth's Shortform
Raymond Arnold11d96

I've updated marginally towards this (as a guy pretty focused on LLM-augmentation. I anticipated LLM brain rot, but it still was more pernicious/fast than I expected)

I do still think some-manner-of-AI-integration is going to be an important part of "moving forward" but probably not whatever capitalism serves up.

I have tried out using them pretty extensively for coding. The speedup is real, and I expect to get more real. Right now it's like a pretty junior employee that I get to infinitely micromanage. But it definitely does lull me into a lower agency state where instead of trying to solve problems myself I'm handing them off to LLMs much of the time to see if it can handle it. 

During work hours, I try to actively override this, i.e. have the habit "send LLM off, and then go back to thinking about some kind of concrete thing (although often a higher level strategy." But, this becomes harder to do as it gets later in the day and I get more tired. 

One of the benefits of LLMs is that you can do moderately complex cognitive work* while tired (*that a junior engineer could do). But, that means by default a bunch of time is spent specifically training the habit of using LLMs in a stupid way. 

(I feel sort of confused about how people who don't use it for coding are doing. With coding, I can feel the beginnings of a serious exoskeleton that can build structures around me with thought. Outside of that, I don't know of it being more than a somewhat better google).

I currently mostly avoid interactions that treat the AI like a person-I'm-talking to. That way seems most madness inducing.

Reply
Foom & Doom 1: “Brain in a box in a basement”
Raymond Arnold16d10

I'm curious what's the argument that felt most like "oh"

Reply
Self-fulfilling misalignment data might be poisoning our AI models
Raymond Arnold3mo-1-4

I think I understood your article, and was describing which points/implications seemed important. 

I think we probably agree on predictions for nearterm models (i.e. that including this training data makes it more likely for them to deceive), I just don't think it matters very much if sub-human-intelligence AIs deceive. 

Reply
Reframing AI Safety as a Neverending Institutional Challenge
Raymond Arnold4mo39

I do periodically think about this and feel kind of exhausted at the prospect, but it does seem pretty plausibly correct. Good to have a writeup of it.

It particularly seems likely to be the right mindset if you think survival right now depends on getting some kind of longish pause (at least on the sort of research that'd lead to RSI+takeoff)

Reply
Self-fulfilling misalignment data might be poisoning our AI models
Raymond Arnold4mo*62

My current guess is:

1. This is more relevant for up-to-the first couple generations of "just barely superintelligent" AIs.

2. I don't really expect it to be the deciding factor after many iterations of end-to-end RSI that gets you to the "able to generate novel scientific or engineering insights much faster than a human or institution could." 

I do think it's plausible that the initial bias towards "evil/hackery AI" could start it off in a bad basin of attraction, but a) even if you completely avoided that, I would still basically expect this to rediscover this on it's own as it gained superhuman levels of competence, b) one of the things I most want to use a slightly-superhuman AI to do is to robustly align massively superhuman AI, and I don't really see how to do that without directly engaging with the knowledge of the failure modes there.

I think there are other plans that route more though "use STEM AI to build an uploader or bioenhancer, and then have an accelerated human-psyche do the technical philosophy necessary to handle the unbounded alignment case. I could see that being the right call, and I could imagine the bias from the "already knows about deceptive alignment etc" being large-magnitude enough to matter in the initial process. [edit: In those cases I'd probably want to filter out a lot more than just "unfriendly AI strategies"]

But, basically, how this applies depends on what it is you're trying to do with the AI, and what stage/flavor of AI you're working with and how it's helping.

Reply
Views on when AGI comes and on strategy to reduce existential risk
Raymond Arnold5mo1-1

That's not really what I had in mind, but I had in mind something less clear than I thought. The spirit is about "can the AI come up with novel concepts",

I think one reason I think the current paradigm is "general enough, in principle", is that I don't think "novel concepts" is really The Thing. I think creativity / intelligence mostly is about is combining concepts, it's just that really smart people are 

a) faster in raw horsepower and can handle more complexity at a time

b) have a better set of building blocks to combine or apply to make new concepts (which includes building blocks for building better building blocks)

c) have a more efficient search for useful/relevant building blocks (both metacognitive and object-level).

Maybe you believe this, and think that "well yeah, it's the efficient search that's the important part, which we still don't actually have a real working version of?"?

It seems like the current models have basically all the tools a moderately smart human have, with regards to generating novel ideas, and the thing that they're missing is something like "having a good metacognitive loop such that they notice when they're doing a fake/dumb version of things, and course correcting" and "persistently pursue plans over long time horizons." And it doesn't seem to have zero of either of those, just not enough to get over some hump.

I don't see what's missing that a ton of training on a ton of diverse, multimodal tasks + scaffoldin + data flywheel isn't going to figure out.

Reply
johnswentworth's Shortform
Raymond Arnold6mo30

(my guess is you took more like 15-25 minutes per question? Hard to tell from my notes, you may have finished early but I don't recall it being crazy early)

Reply1
johnswentworth's Shortform
Raymond Arnold6mo10

(This seems like more time than Buck was taking – the goal was to not get any wrong so it wasn't like people were trying to crank through them in 7 minutes)

The problems I gave were (as listed in the csv for the diamond problems) 

  • #1 (Physics) (1 person got right, 3 got wrong, 1 didn't answer)
  • #2 (Organic Chemistry), (John got right, I think 3 people didn't finish)
  • #4 (Electromagnetism), (John and one other got right, 2 got wrong)
  • #8 (Genetics) (3 got right including John)
  • #10 (Astrophysics) (5 people got right)
Reply
Load More
Guide to the LessWrong Editor
3mo
Guide to the LessWrong Editor
3mo
Guide to the LessWrong Editor
3mo
Guide to the LessWrong Editor
3mo
(+317)
Sandbagging (AI)
3mo
Sandbagging (AI)
3mo
(+88)
AI "Agent" Scaffolds
4mo
AI "Agent" Scaffolds
4mo
(+340)
AI Products/Tools
4mo
(+121)
Language Models (LLMs)
4mo
Load More
27The 2023 LessWrong Review: The Basic Ask
7mo
6
27Review AI Alignment posts to help figure out how to make a proper AI Alignment review
3y
14
63What's Up With Confusingly Pervasive Goal Directedness?
3y
0
28Introducing the AI Alignment Forum (FAQ)
7y
0
23Announcing AlignmentForum.org Beta
7y
17
0Raemon's Shortform
8y
0