Glad it was helpful!
Let me first acknowledge that your write-up is significantly more thorough than pretty much all content on LessWrong, and that I found the particular examples interesting. I also appreciated that you included a related work section in your write-up. The reason I commented on this post and not others is because it's one of the few ML posts on LessWrong that seemed like it might teach me something, and I wish I had made that more clear before posting critical feedback (I was thinking of the feedback as directed at Oliver / Raemon's moderation norms, ... (read more)
Thanks so much, I really appreciate this comment. I think it'll end up improving this post/the upcoming paper.
(I might reply later to specific points)
I'll just note that I, like Dan H, find it pretty hard to engage with this post because I can't tell whether it's basically the same as the Ludwig Schmidt paper (my current assumption is that it is). The paragraph the authors added didn't really help in this regard.
I'm not sure what you mean about whether the post was "missing something important", but I do think that you should be pretty worried about LessWrong's collective epistemics that Dan H is the only one bringing this important point up, and that rather than being rewarded for doing so or engaged w... (read more)
I, like Dan H, find it pretty hard to engage with this post because I can't tell whether it's basically the same as the Ludwig Schmidt paper (my current assumption is that it is). The paragraph the authors added didn't really help in this regard.
The answer is: No, our work is very different from that paper. Here's the paragraph in question:
Editing Models with Task Arithmetic explored a "dual" version of our activation additions. That work took vectors between weights before and after finetuning on a new task, and then added or subtracted task-specific weig
Yup, I agree with this, and think the argument generalizes to most alignment work (which is why I'm relatively optimistic about our chances compared to some other people, e.g. something like 85% p(success), mostly because most things one can think of doing will probably be done).
It's possibly an argument that work is most valuable in cases of unexpectedly short timelines, although I'm not sure how much weight I actually place on that.
Yup! That sounds great :)
Thanks Ruby! Now that the other posts are out, would it be easy to forward-link them (by adding links to the italicized titles in the list at the end)?
Finding the min-max solution might be easier, but what we actually care about is an acceptable solution. My point is that the min-max solution, in most cases, will be unacceptably bad.
And in fact, since min_x f(theta,x) <= E_x[f(theta,x)], any solution that is acceptable in the worst case is also acceptable in the average case.
Thanks! I appreciated these distinctions. The worst-case argument for modularity came up in a past argument I had with Eliezer, where I argued that this was a reason for randomization (even though Bayesian decision theory implies you should never randomize). See section 2 here: The Power of Noise.
Re: 50% vs. 10% vs. 90%. I liked this illustration, although I don't think your argument actually implies 50% specifically. For instance if it turns out that everyone else is working on the 50% worlds and no one is working on the 90% worlds, you should probably wo... (read more)
I think this probably depends on the field. In machine learning, solving problems under worst-case assumptions is usually impossible because of the no free lunch theorem. You might assume that a particular facet of the environment is worst-case, which is a totally fine thing to do, but I don't think it's correct to call it the "second-simplest solution", since there are many choices of what facet of the environment is worst-case.
One keyword for this is "partial specification", e.g. here is a paper I wrote that makes a minimal set of statistical assumptions... (read more)
Cool paper! One brief comment is this seems closely related to performative prediction and it seems worth discussing the relationship.
Edit: just realized this is a review, not a new paper, so my comment is a bit less relevant. Although it does still seem like a useful connection to make.
My basic take is that there will be lots of empirical examples where increasing model size by a factor of 100 leads to nonlinear increases in capabilities (and perhaps to qualitative changes in behavior). On median, I'd guess we'll see at least 2 such examples in 2022 and at least 100 by 2030.
At the point where there's a "FOOM", such examples will be commonplace and happening all the time. Foom will look like one particularly large phase transition (maybe 99th percentile among examples so far) that chains into more and more. It seems possible (though not c... (read more)
Thanks. For time/brevity, I'll just say which things I agree / disagree with:> sufficiently capable and general AI is likely to have property X as a strong default [...]
I generally agree with this, although for certain important values of X (such as "fooling humans for instrumental reasons") I'm probably more optimistic than you that there will be a robust effort to get not-X, including by many traditional ML people. I'm also probably more optimistic (but not certain) that those efforts will succeed.
[inside view, modest epistemology]: I don't have... (read more)
I'm not (retroactively in imaginary prehindsight) excited by this problem because neither of the 2 possible answers (3 possible if you count "the same") had any clear-to-my-model relevance to alignment, or even AGI. AGI will have better OOD generalization on capabilities than current tech, basically by the definition of AGI; and then we've got less-clear-to-OpenPhil forces which cause the alignment to generalize more poorly than the capabilities did, which is the Big Problem. Bigger models generalizing better or worse doesn't say anything obvio... (read more)
Not sure if this helps, and haven't read the thread carefully, but my sense is your framing might be eliding distinctions that are actually there, in a way that makes it harder to get to the bottom of your disagreement with Adam. Some predictions I'd have are that:
* For almost any experimental result, a typical MIRI person (and you, and Eliezer) would think it was less informative about AI alignment than I would. * For almost all experimental results you would think they were so much less informative as to not be worthwhile. * There's a sma... (read more)
I would agree with you that "MIRI hates all experimental work" / etc. is not a faithful representation of this state of affairs, but I think there is nevertheless an important disagreement MIRI has with typical ML people, and that the disagreement is primarily about what we can learn from experiments.
Ooh, that's really interesting. Thinking about it, I think my sense of what's going on is (and I'd be interested to hear how this differs from your sense):
Thanks, sounds good to me!
Actually, another issue is that unsupervised translation isn't "that hard" relative to supervised translation--I think that you can get pretty far with simple heuristics, such that I'd guess making the model 10x bigger matters more than making the objective more aligned with getting the answer right (and that this will be true for at least a couple more 10x-ing of model size, although at some point the objective will matter more).
This might not matter as much if you're actually outputting explanations and not just translating from one language to another. Although it is probably true that for tasks that are far away from the ceiling, "naive objective + 10x larger model" will outperform "correct objective".
Thanks Paul, I generally like this idea.
Aside from the potential concerns you bring up, here is the most likely way I could see this experiment failing to be informative: rather than having checks and question marks in your tables above, really the model's ability to solve each task is a question of degree--each table entry will be a real number between 0 and 1. For, say, tone, GPT-3 probably doesn't have a perfect model of tone, and would get <100% performance on a sentiment classification task, especially if done few-shot.
The issue, then, is that the ... (read more)
This doesn't seem so relevant to capybaralet's case, given that he was choosing whether to accept an academic offer that was already extended to him.