Productive Mistakes, Not Perfect Answers

Mostly I'd agree with this, but I think there needs to be a bit of caution and balance around:

How do we get more streams of evidence? By making productive mistakes. By attempting to leverage weird analogies and connections, and iterating on them. We should obviously recognize that most of this will be garbage, but you’ll be surprised how many brilliant ideas in the history of science first looked like, or were, garbage.

Do we want variety? Absolutely: worlds where things work out well likely correlate strongly with finding a variety of approaches.

However, there's some risk in Do(increase variety). The ideal is that we get many researchers thinking about the problem in a principled way, and variety happens. If we intentionally push too much for variety, we may end up with a lot of wacky approaches that abandoned too much principled thinking too early. (I think I've been guilty of this at times)

That said, I fully agree with the goal of finding a variety of approaches. It's just rather less clear to me how much an individual researcher should be thinking in terms of boosting variety. (it's very clear that there should be spaces that provide support for finding different approaches, so I'm entirely behind that; currently it's much more straightforward to work on existing ideas than to work on genuinely new ones)

Certainly many great ideas initially looked like garbage - but I'll wager a lot of garbage initially looked like garbage too. I'd be interested in knowing more about the hidden-greatness-garbage: did it tend to have any common recognisable qualities at the time? Did it tend to emerge from processes with common recognisable qualities? In environments with shared qualities?...

[-]Logan Riggs4y20

It’s also clear when reading these works and interacting with these researchers that they all get how alignment is about dealing with unbounded optimization, they understand fundamental problems and ideas related to instrumental convergence, the security mindset, the fragility of value, the orthogonality thesis…

I bet Adam will argue about this (or something similar) is the minimal we want for a research idea, because I agree with your idea that we shouldn’t expect solution to alignment to fall out of the marketing program for Oreos. We want to constrain it to at least “has a plausible story on reducing x-risk” and maybe what’s mentioned in the quote as well.

[-]Joe Collman4y20

For sure I agree that the researcher knowing these things is a good start - so getting as many potential researchers to grok these things is important.

My question is about which ideas researchers should focus on generating/elaborating given that they understand these things. We presumably don't want to restrict thinking to ideas that may overcome all these issues - since we want to use ideas that fail in some respects, but have some aspect that turns out to be useful.

Generating a broad variety of new ideas is great, and we don't want to be too quick in throwing out those that miss the target. The thing I'm unclear about is something like:

What target(s) do I aim for if I want to generate the set of ideas with greatest value?

I don't think that "Aim for full alignment solution" is the right target here.
I also don't think that "Aim for wacky long-shots" is the right target - and of course I realize that Adam isn't suggesting this.
(we might find ideas that look like wacky long-shots from outside, but we shouldn't be aiming for wacky long-shots)

But I don't have a clear sense of what target I would aim for (or what process I'd use, what environment I'd set up, what kind of people I'd involve...), if my goal were specifically to generate promising ideas (rather than to work on them long-term, or to generate ideas that I could productively work on).

Another disanalogy with previous research/invention... is that we need to solve this particular problem. So in some sense a history of:
[initially garbage-looking-idea] ---> [important research problem solved] may not be relevant.

What we need is: [initially garbage-looking-idea generated as attempt to solve x] ---> [x was solved]
It's not good enough if we find ideas that are useful for something, they need to be useful for this.

I expect the kinds of processes that work well to look different from those used where there's no fixed problem.

^{^}

I’m not discussing applied alignment research here, like the work of Redwood, but I also find this part crucial and productive. It’s just that such work is less about “formulating a solution” and more about “exploring the models and the problems experimentally”, which fit well with the model I’m drawing here.

^{^}

I’m currently finishing a sequence arguing for more pluralism in alignment and providing an abstraction of the alignment problem that I find particularly good for generating new approaches and understanding how all the different takes and perspectives relate.

^{^}

The range where many short timelines put the bulk of their probability mass.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

40

Productive Mistakes, Not Perfect Answers

40