I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality.
Longer bio: www.lesswrong.com/posts/aG74jJkiPccqdkK3c/the-lesswrong-team-page-under-construction#Ben_Pace___Benito
I've felt like the problem of counterfactuals is "mostly settled" for about a year, but I don't think I've really communicated this online.
Wow that's exciting! Very interesting that you think that.
The rules say we must use consequentialism, but good people are deontologists, and virtue ethics is what actually works.—Eliezer Yudkowsky, Twitter
The rules say we must use consequentialism, but good people are deontologists, and virtue ethics is what actually works.
—Eliezer Yudkowsky, Twitter
You're welcome. Yeah "invented the concept" and "named the concept" are different (and both important!).
Here it is: https://www.facebook.com/yudkowsky/posts/10152443714699228?comment_id=10152445126604228
Rob Miles (May 2014):
Ok, I've given this some thought, and I'd call it:"Corrigible Reasoning"using the definition of corrigible as "capable of being corrected, rectified, or reformed". (And of course AIs that don't meet this criterion are "Incorrigible")
Ok, I've given this some thought, and I'd call it:
using the definition of corrigible as "capable of being corrected, rectified, or reformed". (And of course AIs that don't meet this criterion are "Incorrigible")
I'm 94% confident it came from a Facebook thread where you blegged for help naming the concept and Rob suggested it. I'll have a look now to find it and report back.
Edit: having a hard time finding it, though note that Paul repeats the claim at the top of his post on corrigibility in 2017.
Curated. This post gives me a lot of context on your prior writing (unaligned benchmark, strategy stealing assumption, iterated amplification, imitative generalization), it helps me understand your key intuitions behind the plausibility of alignment, and it helps me understand where your research is headed.
When I read Embedded Agency, I felt like I then knew how to think productively about the main problems MIRI is working on by myself. This post leaves me feeling similarly about the problems you've been working on for the past 6+ years.
So thanks for that.
I'd like to read a version of this post where each example is 10x the length and analyzes it more thoroughly... I could just read all of your previous posts on each subject, though they're fairly technical. (Perhaps Mark Xu will write it such a post, he did a nice job previously on Solomonoff Induction.)
I'd also be pretty interested in people writing more posts presenting arguments for/against plausible stories for the failure of Imitative Generalization, or fleshing out the details of a plausible story such that we can see more clearly if the story is indeed plausible. Basically, making contributions in the ways you outline.
Aside: Since the post was initially published, some of the heading formatting was lost in an edit, so I fixed that before curating it.
Edit: Removed the line "After reading it I have a substantially higher probability of us solving the alignment problem." Understanding Paul's research is a big positive, but I'm not actually sure I stand by it leading to a straightforward change in my probability.
This post gives great insight into your research methodology, thanks for writing it.
After that point I think you are mostly in applied world, and I think that applied investments are likely to ultimately dwarf the empirical investments by orders of magnitude even if it turns out that we found a really good algorithm on paper.
You contrast ‘applied’ and ‘empirical’ here, but they sound the same to me. Is it a typo and you meant ‘applied’ and ‘theoretical’? That would make sense to me.
This was a very solid post and I've curated it. Here are some of the reasons:
The main hesitation I have around curating this is that I'm kind of scared of all recommendations for "interesting ideas for using machine learning that might be really important for AGI". This part of me feels like everything has the potential to be used for capabilities, and that someone pursuing this line of work may do so quite inadvertently (and it will not be possible to "put the ball back in the urn"), or just end up proving their competence at building scalably useful models and then getting hired by a research lab to do work that will give them lots of money to do stuff with big models.
I am scared about most everything in this space, yet I don't think I endorse "no action", and this does seem to me like one of the most promising and careful posts and approaches. For me the most cruxy sections were "Isn't this not neglected because lots of people want useful AI" and "Will this cause harm by increasing investment in scaling AI?". For the first, I think if I believed the claims Ajeya makes as strongly as I think Ajeya does, I'd feel notably more relieved overall about encouraging this kind of work. For the second I didn't feel persuaded by the arguments. I think that there are few people who are respected remotely on these sorts of questions or who are thinking strategically about them (especially in public). I think the x-risk and alignment communities have in the past had 100:1 outsized impact with actions taken and research directions pursued.
In sum, I continue to be generally terrified, but this was an excellent post and one of the relatively least terrifying things. Thank you very much for the post.
Woop! I'm very pleased for you. I occasional plot about how to help people (including you-in-particular) to do this, and I'm very pleased that it has happened without my intervention! Beth continues to do good things, my appreciation to her.
I look forward greatly to your proportional increase in insights being posted to LessWrong :D
For me, this is the paper where I learned to connect ideas about delegation to machine learning. The paper sets up simple ideas of mesa-optimizers, and shows a number of constraints and variables that will determine how the mesa-optimizers will be developed – in some environments you want to do a lot of thinking in advance then delegate execution of a very simple algorithm to do your work (e.g. this simple algorithm Critch developed that my group house uses to decide on the rent for each room), and in some environments you want to do a little thinking and then delegate a very complex algorithm to figure out what to do (e.g. evolution is very stupid and then makes very complex brains to figure out what to do in lots of situations that humans encountered in the EEA).
Seeing this more clearly in ML shocked me with the level of inadequacy that ML has for being able to do this with much direction whatsoever. It just doesn't seem like something that we have much control of. Of course I may be wrong, and there are some simple proposals (though that have not worked so far). Nonetheless, it's a substantial step forward in discussing delegation in modern ML systems. It discusses lots of related ideas very clearly.
Definitely should be included in the review. I expect to vote on this with something like +5 to +8.
I don't do research in this area, I expect others like Daniel Filan and Adam Shimi will have more detailed opinions of the sequence's strengths and weaknesses. (Nonetheless I stand by my assessment and will vote accordingly.)