Someone asked me about this, so here are my quick thoughts.

Although I've learned a lot of math over the last year and a half, it still isn't my comparative advantage. What I do instead is,

Find a problem

that seems plausibly important to AI safety (low impact), or a phenomenon that's secretly confusing but not really explored (instrumental convergence). If you're looking for a problem, corrigibility strikes me as another thing that meets these criteria, and is still mysterious.

Think about the problem

Stare at the problem on my own, ignoring any existing thinking as much as possible. Just think about what the problem is, what's confusing about it, what a solution would look like. In retrospect, this has helped me avoid anchoring myself. Also, my prior for existing work is that it's confused and unhelpful, and I can do better by just thinking hard. I think this is pretty reasonable for a field as young as AI alignment, but I wouldn't expect this to be true at all for e.g. physics or abstract algebra. I also think this is likely to be true in any field where philosophy is required, where you need to find the right formalisms instead of working from axioms.

Therefore, when thinking about whether "responsibility for outcomes" has a simple core concept, I nearly instantly concluded it didn't, without spending a second glancing over the surely countless philosophy papers wringing their hands (yup, papers have hands) over this debate. This was the right move. I just trusted my own thinking. Lit reviews are just proxy signals of your having gained comprehension and coming to a well-considered conclusion.

Concrete examples are helpful: at first, thinking about vases in the context of impact measurement was helpful for getting a grip on low impact, even though it was secretly a red herring. I like to be concrete because we actually need solutions - I want to learn more about the relationship between solution specifications and the task at hand.

Make simplifying assumptions wherever possible. Assume a ridiculous amount of stuff, and then pare it down.

Don't formalize your thoughts too early - you'll just get useless mathy sludge out on the other side, the product of your confusion. Don't think for a second that having math representing your thoughts means you've necessarily made progress - for the kind of problems I'm thinking about right now, the math has to sing with the elegance of the philosophical insight you're formalizing.

Forget all about whether you have the license or background to come up with a solution. When I was starting out, I was too busy being fascinated by the problem to remember that I, you know, wasn't allowed to solve it.

Obviously, there are common-sense exceptions to this, mostly revolving around trying to run without any feet. It would be pretty silly to think about logical uncertainty without even knowing propositional logic. One of the advantages of immersing myself in a lot of math isn't just knowing more, but knowing what I don't know. However, I think it's rare to secretly lack the basic skills to even start on the problem at hand. You'll probably know if you are, because all your thoughts keep coming back to the same kind of confusions about a formalism, or something. Then, you look for ways to resolve the confusion (possibly by asking a question on LW or in the MIRIx Discord), find the thing, and get back to work.

Stress-test thoughts

So you've had some novel thoughts, and an insight or two, and the outlines of a solution are coming into focus. It's important not to become enamored with what you have, because it stops you from finding the truth and winning. Therefore, think about ways in which you could be wrong, situations in which the insights don't apply or in which the solution breaks. Maybe you realize the problem is a bit ill-defined, so you refactor it.

The process here is: break the solution, deeply understand why it breaks, and repeat. Don't get stuck with patches; there's a rhythm you pick up on in AI alignment, where good solutions have a certain flavor of integrity and compactness. It's OK if you don't find it right away. The key thing to keep in mind is that you aren't trying to pass the test cases, but rather to find brick after brick of insight to build a firm foundation of deep comprehension. You aren't trying to find the right equation, you're trying to find the state of mind that makes the right equation obvious. You want to understand new pieces of the world, and maybe one day, those pieces will make the difference.

ETA: I think a lot of these skills apply more broadly. Emotional trust in one's own ability to think seems important for taking actions that aren't e.g. prescribed by an authority figure. Letting myself just think lets me be light on my mental feet, and bold in where those feet lead me.

ETA 2: Apparently simulating drop-caps:

ike this

isn't the greatest idea. Formatting edit.


New Comment