I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality. (Longer bio.)
I don't want to double the comment count I submit to Recent Discussion, so I'll just update this comment with the things I've cut.
12/06/2023 Comment on Originality vs. Correctness
It's fun to take the wins of one culture and apply them to the other, people are very shocked that you found some hidden value to be had (though it often isn't competitive value / legible to the culture). And if you manage to avoid some terrible decison people speak about how wise you are to have noticed.(Those are the best cases, often of course people are like "this is odd, I'm going to pretend I didn't see this" and then move on.)
It's fun to take the wins of one culture and apply them to the other, people are very shocked that you found some hidden value to be had (though it often isn't competitive value / legible to the culture). And if you manage to avoid some terrible decison people speak about how wise you are to have noticed.
(Those are the best cases, often of course people are like "this is odd, I'm going to pretend I didn't see this" and then move on.)
For too long, I have erred on the side of writing too much.
The first reason I write is in order to find out what I think.
This often leaves my writing long and not very defensible.
However, editing the whole thing is so much extra work after I already did all the work figuring out what I think.
Sometimes it goes well if I just scrap the whole thing and concisely write my conclusion.
But typically I don't want to spend the marginal time.
Another reason my writing is too long is because I have extra thoughts I know most people won't find useful.
But I've picked up a heuristic that says it's good to share actual thinking because sometimes some people find it surprisingly useful, so I hit publish anyway.
Nonetheless, I endeavor to write shorter.
So I think I shall experiment with cutting the bits off of comments that represent me thinking aloud, but aren't worth the space in the local conversation.
And I will put them here, as the dregs of my cognition. I shall hopefully gather data over the next month or two and find out whether they are in fact worthwhile.
I just gave this a re-read, I forgot what a trip it is to read the thoughts of Eliezer Yudkowsky. It continues to be some of my favorite stuff in recent years written on LessWrong.
It's hard to relate to the world with a level of mastery over basic ideas as Eliezer has. I don't mean with this to vouch that his perspective is certainly correct, but I believe it is at least possible, and so I think he aspires to a knowledge of reality that I rarely if ever aspire to. Reading it inspires me to really think about how the world works, and really figure out what I know and what I don't. +9
(And the smart people dialoguing with him here are good sports for keeping up their side of the argument.)
They are not being treated worse than foot soldiers, because they do not have an enemy army attempting to murder them during the job. (Unless 'foot soldiers' itself more commonly used as a metaphor for 'grunt work' and I'm not aware of that.)
I am surprised to see the Open Philanthropy network taking all of the powerful roles here.
The initial Trustees are:Jason Matheny: CEO of the RAND CorporationKanika Bahl: CEO & President of Evidence ActionNeil Buddy Shah: CEO of the Clinton Health Access Initiative (Chair)Paul Christiano: Founder of the Alignment Research CenterZach Robinson: Interim CEO of Effective Ventures US
The initial Trustees are:
In case it's not apparent:
And, as a reminder for those who've forgotten, Holden Karnofsky's wife Daniela Amodei is President of Anthropic (and so Holden is the brother-in-law of Dario Amodei).
I think the argument here basically implies that language models will not produce any novel, useful concepts in any existing industries or research fields that get substantial adoption (e.g. >10% of ppl use it, or a widely cited paper) in those industries, in the next 3 years, and if it did this, then the end would be nigh (or much nigher).
To be clear, you might get new concepts from language models about language if you nail some Chris Olah style transparency work, but the language model itself will not output ones that aren't about language in the text.
It's... possible this is actually the single best example of a public doublecrux writeup that I know of?
This sentence was confusing to me given that the post does not mention 'double crux', but I mentioned it to someone and they said to think of it as the mental motion and not the explicit format, and that makes more sense to me.
And if you block any one path to the insight that the earth is round, in a way that somehow fails to cripple it, then it will find another path later, because truths are interwoven. Tell one lie, and the truth is ever-after your enemy.
In case it's of any interest, I'll mention that when I "pump this intuition", I find myself thinking it essentially impossible to expect we could ever build a general agent that didn't notice that the world was round, and I'm unsure why (if I recall correctly) I sometimes I read Nate or Eliezer write that they think it's quite doable in-principle, just much harder than the effort we'll be giving it.
This perspective leaves me inclined to think that we ought to only build very narrow intelligences and give up on general ones, rather than attempt to build a fully general intelligence but with a bunch of reliably self-incorrecting beliefs about the existence or usefulness of deception (and/or other things).
(I say this in case perhaps Nate has a succinct and motivating explanation of why he thinks a solution does exist and is not actually that impossibly difficult to find in theory, even while humans-on-earth may never do so.)
the AGI was NOT exercising its intelligence & reason & planning etc. towards an explicit, reflectively-endorsed desire for “I am being helpful / I am being docile / I am acting with integrity / blah blah”.
I am naively more scared about such an AI. That AI sounds more like if I say "you're not being helpful, please stop" that it will respond "actually I thought about it, I disagree, I'm going to continue doing what I think is helpful".
And these are both real obstacles. But there are deeper obstacles, that seem to me more central, and that I haven't observed others to notice on their own.
I brainstormed some possible answers. This list is a bit long. I'm publishing this comment because it's not worth the half hour to make it concise, yet it seems worth trying the exercise before reading the post and possibly others will find it worth seeing my quick attempt.
I think the last two bullets are probably my best guesses. Nonetheless here is my list:
After writing this, I am broadly unclear whether I am showing how deception is still a problem, or showing how other problems still exist if you solve the obvious problems of deception.
Added: Wow, this post is so much richer than my guesses. I think I was on some okay lines, but I suspect it would take like 2 months to 2 years of actively trying before I would be able to write something this detailed. Not to mention that ~50% of the work is knowing which question to ask in the first place, and I did not generate this question.
Added2: The point made in Footnote 3 is pretty similar to my last two bullets.