Adam Shimi

Full time independent alignment researcher. Although instead of having a concrete research agenda, I mostly study epistemic strategies (https://www.alignmentforum.org/s/LLEJJoaYpCoS5JYSY) and help other alignment researchers by extracting their processes, modelling them, and explaining with less inferential distances their positions and disagreements. (Also PhD in the theory of distributed computing).

Sequences

Epistemic Cookbook for Alignment
Reviews for the Alignment Forum
AI Alignment Unwrapped
Deconfusing Goal-Directedness

Wiki Contributions

Comments

Intuitions about solving hard problems

I like this pushback, and I'm a fan of productive mistakes. I'll have a think about how to rephrase to make that clearer. Maybe there's just a communication problem, where it's hard to tell the difference between people claiming "I have an insight (or proto-insight) which will plausibly be big enough to solve the alignment problem", versus "I have very little traction on the alignment problem but this direction is the best thing I've got". If the only effect of my post is to make a bunch of people say "oh yeah, I meant the second thing all along", then I'd be pretty happy with that.

When phrased like that, I agree with you. I am personally relatively suspicious of claims by a bunch of people to have found a path to alignment, but actually excited by some of their productive mistakes (as discussed a bit in my post).

I also fully agree that I want people to use the second, and my "history of alignment" research direction aims at concretely teasing the productive mistakes and revealed bits of evidence without falling for the "this is obviously a solution" or "this is obviously not a solution and thus useless".

Why do I care about this? It has uncomfortable tinges of status regulation, but I think it's important because there are so many people reading about this research online, and trying to find a way into the field, and often putting the people already in the field on some kind of intellectual pedestal. Stating clearly the key insights of a given approach, and their epistemic status, will save them a whole bunch of time. E.g. it took me ages to work through my thoughts on myopia in response to Evan's posts on it, whereas if I'd known it hinged on some version of the insight I mentioned in this post, I would have immediately known why I disagreed with it.

+1000. And teasing out more generally the assumptions, the insights, the new parts of works and approach is I think super necessary and on my research agenda. That's also part of the reason why I feel asking newcomers to be distillers is not necessarily a great idea: good distillation of the type we're discussing requires IMO quite a deep understanding of the landscape, the problem and the underlying ideas. Otherwise you at best get a decent summary, and we need more.

As an example of (I claim) doing this right, see the disclaimer on my "shaping safer goals" sequence: "Note that all of the techniques I propose here are speculative brainstorming; I'm not confident in any of them as research directions, although I'd be excited to see further exploration along these lines." Although maybe I should make this even more prominent.

Haven't reread your sequence in quite some time, but I think the value of such exploratory sequence is to make clearer the intuitions underlying the direction, even if they haven't lead yet to productive mistakes. So I like your disclaimer, but I think the even better way of doing this is to clarify for different posts and ideas what are the intuitions you're building on and where the current formalims/descriptions/analogies are failing to capture them.

Lastly, I don't think I'm actually comparing Darwin and Einstein's mature theories to Turing's incomplete theory. As I understand it, their big insights required months or years of further work before developing into mature theories (in Darwin's case, literally decades).

This might also be a bit of miscommunication, but I felt like your discussion of Turing could also have applied especially in Darwin's case, where the initial insight required a lot of additional pieces and clarification to make a clean and ordered theory that you can actually defend. Generally I was pointing at the risk of hindsight bias, where the fact that the insight is clean and powerful once the full theory is known and considered didn't mean it was so compelling at the time it was thought of. (Which is also a general empirical claim about the history of scientific progress, to explore ;) )

Intuitions about solving hard problems

Thanks for the answer.

One thing I would say in response to your comment, Adam, is that I don't usually see the message of your linked post as being incompatible with Richard's main point. I think one usually does have or does need productive mistakes that don't necessarily or obviously look like they are robust partial progress. But still, often when there actually is a breakthrough, I think it can be important to look for this "intuitively compelling" explanation. So one thing I have in mind is that I think it's usually good to be skeptical if a claimed breakthrough seems to just 'fall out' of a bunch of partial work without there being a compelling explanation after the fact.

Hum, I'm not sure I'm following your point. Do you mean that you can have both productive mistakes and intuitively compelling explanations when the final (or even intermediary breakthrough) is reached? Then I totally agree. My point was more that if you only use Richard's heuristic, I expect you to not reach the breakthrough because you would have killed in the bud many productive mistakes that actually lead the way there.

There's also a very kuhnian thing here that I didn't really mention in my previous comment (except on the Galileo part): the compellingness of an answer is often stronger after the fact, when you work in the paradigm that it lead too. That's another aspect of productive mistakes or even breakthrough: they don't necessarily look right or predict more from the start, and evaluating their consequences is not necessarily obvious.

Intuitions about solving hard problems

I like that you're proposing an explicit heuristic inspired by the history of science for judging research directions and approaches, and acknowledge that it leads to conclusion that are counter intuitive to my Richard-model (pushing for Agents foundations for example), so you're not just retrofitting your own conclusion AFAIK. I also like that you're applying it to object-level directions in alignment — that's something I'm working on at the moment for my own research, based on your pushback.

That being said, my prediction/retrodiction is that this is too strong a criteria, for reasons already discussed in this post. Basically I expect that for most if not all great scientific solutions you mention, if you back up enough (sometimes you don't need to back up that far), you will find a step, an idea, an insight that proved crucial down the line but didn't look like the right type. Even in the post there's a sort of weird double standard where you implicitly discuss Darwin and Einstein after they have matured their theory, whereas you talk about Turing before he proves a non-trivial result or design a non-trivial algorithm. The extension of my prediction here is that during the long process that these thinkers (and others examples) took to arrive at their insights, they built on models and ideas that revealed bits of evidence but where jank, incorrect, and eventually used as scaffolding then thrown away.

Note that this is a rather empirical prediction about the history of science, and that I'm curious of any counterexample you or anybody else would find to it.

Another issue I see is that often the insights redefined the rules of the game. Galileo for example (in the Feyerabend interpretation at least) redefines "rest state" and "movement" to be consistent with a moving earth. You could say that this is insight that are compelling, but from the perspective at that time of history, it looks more like changing the rules of the game (of what a theory of the stars has to deal with) by changing the natural interpretations associated with it.

Refine: An Incubator for Conceptual Alignment Research Bets

Sorry to make you work more, but happy to fill a much needed niche. ^^

Refine: An Incubator for Conceptual Alignment Research Bets

Thanks! Yes, this is very much an experiment, and even if it fails, I expect it to be a productive mistake we can learn from. ;)

Takeoff speeds have a huge effect on what it means to work on AI x-risk

I disagree, so I'm curious about what are great examples for you of good research on alignment that is not done by x-risk motivated people? (Not being dismissive, I'm genuinely curious, and discussing specifics sounds more promising than downvoting you to oblivion and not having a conversation at all).

What to include in a guest lecture on existential risks from AI?

I have a framing of AI risks scenarios that I think is more general and more powerful than most available online, and that might be a good frame before going into examples. It's not posted yet (I'm finishing the sequence now) but I could sent somethings to you if you're interested. ;)

AMA Conjecture, A New Alignment Startup

(I will be running the Incubator at Conjecture)

The goal for the incubator is to foster new conceptual alignment research bets that could go on to become full-fledged research directions, either at Conjecture or at other places. We’re thus planning to select mostly on the quality we expect for a very promising independent conceptual researcher, that is proactivity (see Paul Graham’s Relentlessly Resourceful post) and some interest or excitement about not fully tapped streams of evidence (see this recent post).

Although experience with alignment could help, it might also prove a problem if it comes with too strong ontological commitment and limits exploration of unusual research directions and ideas. The start of the program will include a lot of discussion and sharing a map of alignment and mental moves that I (Adam) have been building over the last few months, so this should bring people up to speed to do productive research.

If you have any more questions about this, feel free to reach me either on LW or at my Conjecture email.

Reply to Eliezer on Biological Anchors

Thanks for the answer!

Unfortunately, I don't have the time at the moment to answer in detail and have more of a conversation, as I'm fully focused on writing a long sequence about pushing for pluralism in alignment and extracting the core problem out of all the implementation details and additional assumption. I plan on going back to analyzing timeline research in the future, and will probably give better answers then.

That being said, here are quick fire thoughts:

  • I used the evolution case because I consider it the most obvious/straightforward case, in that it sounds so large that everyone instantly assumes that it gives you an upper bound.
  • My general impression about this report (and one I expect Yudkowsky to share) is that it didn't made me update at all. I already updated from GPT and GPT3, and I didn't find new bits of evidence in the report and the discussions around it, despite the length of it. My current impression (please bear in mind that I haven't taken the time to study the report from that angle, so I might change my stance) is that this report, much like a lot of timeline work, seems like it takes as input a lot of assumption, and gives as output far less than was assumed. It's the opposite of compression — a lot of assumptions are needed to conclude things that aren't that strong and constraining.
Job Offering: Help Communicate Infrabayesianism

Great idea!

I was talking with someone just today about the need for something like that. I also expect that InfraBayes is one of the approaches of alignment that need a lot of translation effort to become palatable to proponents of other approaches.

Additional point though: I feel like the applicant should probably also have a decent understanding of alignment, as a lot of the value of such a communication and translation (for me at least) would come from understanding its value for alignment.

Load More