Introduction: a new hope
If you’ve read anything I wrote in the last 6 months, you know that I’m into the epistemology of alignment. More precisely, I’m into figuring out how to leverage epistemic tools and insights for helping both researchers make progress and newcomers find their footing, as well as proposing new directions for interesting research bets.
Yet I’ve been quite miserable these last few weeks, even months, because of a situation I felt trapped in: seeing all these fires that I thought epistemic tools could put out, yet spending so much time on them that I couldn’t learn and build the tools needed in the first place. “How can I possibly take the time to get stronger as an epistemologist?”, I asked myself
Did you notice? This is what Mark Manson calls a VCR question: a question that is trivial and easy to answer from the outside, but looks impossible to deal with from the inside. When looking at it objectively, the answer is clearly
That’s what multiple friends in alignment made me realize these last few days. I should and can just dedicate most of my time to that training for the foreseeable future. Moreover, it sounds like a far better idea than trying to put out fire after fire with my current metaphorical glass of water.
So I’ve decided to spend the next 2 months focusing only on that training. I’ll still do a couple of mentoring and field-building activities, but all my research time will be dedicated to finding, learning, building and honing new epistemic tools.
This post is a short explanation of what this will look like, and what I will post on LW as a consequence of this training.
Thanks to Connor, John, Alex, Rob, and Flora for convincing me to just do it.
Scholarship, with applications
What I have in mind amounts to reading and studying a bunch of papers, books and other resources. It’s not a training with particular exercises, but instead a search for the tools of my trade, be they already forged or in need of crafting.
But what exactly will I be reading? That’s the fun part: almost anything works, as long as it points towards interesting epistemology. Philosophy of science of course, but also physics and engineering, science-fiction predictions, evolutionary biology, economic criticism, formalization of sociological theories, principles of animation, and many more. I have a loooooong list of resources to pick from, and will definitely shift topics for variety and thoroughness.
Still, don’t I risk just reading random stuff instead of learning the sort of tools I care about? That’s where the applications come from. I’ve listed a bunch of concrete applications for the tools I’m learning, and keeping these in mind will help me dismiss irrelevant parts of my readings and focus on the important questions. These are:
- (Deconfusion) As I’ve written before, I’m interested both in building a mental library of examples of good deconfusion (both constructive and critical) and in learning abstract approaches and techniques to deconfuse complex topics.
- (Indirect Experiment Design) One topic I’m interested in contributing in the future is how to design applied alignment experiments that actually tell you something about AGI and alignment, instead of just being random ML experiments. This has been done, but no one knows how to do it reproducibly AFAIK, and I hope that looking at creative and indirect approaches to experiment design will give me tools for thinking about that.
- (Revealing Hidden Bits of Evidence) This one is slightly harder to unpack; it’s basically about the techniques and strategies one can use to extract more evidence from the already available information. The archetypal example, presented in the sequences, is Einstein leveraging the shape of known physical laws themselves for finding General Relativity. This is also related to hitting a small target in a large space. By looking at both examples of such unlikely successes, and at more philosophical analyses of approaches to extract evidence, I hope to get the toolkit necessary to justify the epistemic strategies of alignment, find issues where they hide, and propose new approaches.
- (Making Interesting Mistakes) I’ve been thinking quite a lot lately about mistakes, and how it seems that making interesting mistakes (mistakes that lead to interesting places and ideas) is a massive part of intellectual progress. So this category is about looking at mistakes in science and other fields that actually helped in showing the way and making progress. I also hope to extract some rules of thumbs for making more interesting mistakes.
- (Thinking about Historical, Moral and Social Processes) Relatively straightforward. I feel like I’m missing a bunch of frames for thinking about many of the more human processes, and I expect that through studying resources for the previous categories, I’ll pick a bunch of ways of thinking about history, morality and society.
- (Paradigm analysis and design) I’m more and more convinced that Kuhn’s paradigms are not the right way of thinking about alignment, which means I’m looking for alternative perspectives and variants.
Most resources on my list probably contain insights about more than one application; I haven’t decided yet if I’ll present all of these insights together or in separate posts. Because yes, I intend to write a bunch of posts during my training.
Actually writing stuff up
In the last 6 months, I wrote 14 research-related posts — that’s one every 12 days on average. Not really impressive. Even more so because I’ve been thinking about a lot of concepts and ideas during these 6 months, that I never ended up writing because it would take too much effort to polish them off.
So I’ve decided to channel my inner John on this project, and write quickly while maintaining epistemic rigor. That is, I’ll write multiple posts per week about my readings and studies, explaining why I was initially attracted to the resources or topic, what I got out of them, and what I would want to explore further given the time.
To be clear, I don’t plan on writing reference posts or incredibly detailed analyses; the goal here is to share my ideas and realizations without letting my natural editor censor me too much. I expect that you might find some of these posts stimulating, but their goal is far more to focus and sharpen my thinking than to make the strongest case for the idea.
Conclusion: wake up and take a step
From now on and for the next two months, every post I make will be part of my epistemology training. That’s something I felt was needed for a long time, but refused to let myself focus on it. I’m honestly incredibly excited, and the little that I already did today comforts me that there are many fruitful avenues to explore for this project.
With that said, see you in the next post.