This work was done while at Conjecture.

Epistemic Status: Palimpsest

Better epistemology should make you stronger.

Which is why at Conjecture's' epistemology team we are so adamant on improving our models of knowledge-production: this feels like the key to improving alignment research across the board, given the epistemological difficulties of the field.

Yet we have failed until now to make our theory of impact legible, both to ourselves and to well-meaning external reviewers. The most sorely missing piece is the link between better models of knowledge-production and quick improvements of alignment research, in the shorter timelines that we expect at Conjecture .

Following interviews that we conducted with a handful of alignment researchers (John Wentworth, Vanessa Kosoy, Evan Hubinger, Abram Demski, Steve Byrnes, Conjecture researchers, and Refine participants), we want to present our current best guess for framing how our epistemology research can make alignment research stronger: revealing, analyzing, and expanding or replacing what we call "Cached Methodologies" — patterns that encode how research is supposed to proceed in a given context, for example the idea that we need to prove a statement to learn if it's true or not. Given that this involves bringing to light and questioning cached thoughts about methodology, we dub this approach methodological therapy.

Note that we definitely also want to leverage better models of knowledge-production to help field-builders and newcomers; our current focus on explicit applications for established researchers comes from two key factors: we're currently only three people in the epistemology team, which means we have to prioritize; and we expect that models and tools useful to established researchers will prove instrumental in building ones for field-builders and newcomers.

On credit: this idea emerged from discussions within the epistemology team (composed of Adam Shimi, Lucas Teixeira, and Daniel Clothiaux); Adam contributed the overall perspective, the unification and the therapy metaphor (from Bachelard); Lucas contributed the focus on actual cached patterns in researchers' minds and interviews; Daniel contributed the expansion of the idea to encompass more alternatives (for examples more ways of doing experiments) as well as many insights from the literature. This post is written by Adam; "we" means that the opinion is that of the whole team, and "I" means it's only Adam's opinion.

Thanks to Clem von Stengel and Andrea Motti for feedback on drafts.

How Do Researchers Think About Research?

Our starting point was conducting interviews with researchers, in order to improve the product-market fit of our epistemological research by better understanding and targeting the needs of researchers (and in the future of field-builders and newcomers).

We ended up asking explicitly about the kind of bottlenecks researchers dealt with, but also more general probes about how they were spending their time. Some interviewees articulated bottlenecks by themselves, while others didn't really see explicit bottlenecks but identified particularly costly activities in their research.

In particular, four research activities were often highlighted as difficult and costly (here in order of decreasing frequency of mention):

  • Running experiments
  • Formalizing intuitions
  • Unifying disparate insights into a coherent frame
  • Proving theorems

I don't know what your first reaction to this list is, but for us, it was something like: "Oh, none of these activities seems strictly speaking necessary in knowledge-production." Indeed, a quick look at history presents us with cases where each of those activities was bypassed:

What these examples highlight is the classical failure when searching for the need of customers: to anchor too much on what people ask for explicitly, instead of what they actually need. The same applies when getting feedback on writing, as Neil Gaiman pointed out so well:

(Writing Advice, Neil Gaiman, 2012)

Remember: when people tell you something’s wrong or doesn’t work for them, they are almost always right. When they tell you exactly what they think is wrong and how to fix it, they are almost always wrong.

So it's incredibly relevant that researchers highlight such activities as particularly costly: those are clearly the places where they spend much of their time or "hard thinking". Yet it might be a mistake to conclude that the best way of helping their research is to accelerate these particular activities. After all, what if there exist cheaper alternatives? Or what if the way the activity is conceptualized in the researcher's mind misses other ways of going about the same activity — like believing that experiments are only for falsification might hide the value of exploratory experiments?

All this requires a deeper analysis of the methodologies (implicit or explicit) used by researchers to go about producing knowledge and solving problems.

Psychoanalysis of Cached Methodologies

Obviously, methodologies (patterns about what research should look like and when to do a certain activity like running an experiment) can prove incredibly productive: someone with a model of how proof or experiments work is able to do more than someone completely lacking this knowledge. These methodologies amplify our agency by giving us more powerful tools to optimize for knowledge-production and problem-solving.

And yet there is a clear failure mode, which almost always happens: the methodology become cached. Where before it was an explicit model, an acknowledged part of the map, it now feels integral to the territory. And thus except if it fails in a spectacular fashion, it ossifies.

This story reappears again and again in the history of science and technology, and in the philosophy of science too. For example, almost every science that took form after the early successes of physics tried to model itself according to it; it inherited the cached methodologies of physicists in contexts where they didn't worked anymore. Or look at the so-called scientific method, which actually hides away both the generation of good hypotheses, and the full role experiments can play (including in hypothesis generation). Both of these are valuable maps (in some context) that became confused with the territory — cached methodologies.

Gaston Bachelard, the criminally underread (in the anglophone world) French philosopher of science from the early 20th century, believed that progress in science depended on the bringing to light and the reshaping of such cached assumptions and methodologies in order to make them fit for the current needs of scientific exploration. He called such episodes epistemological ruptures[1] and he even coined the expression "Philosophy of Non"[2] to describe this approach to science that focuses on the correction of cached patterns in general.

An example of such progress through epistemological rupture is Einstein's relativity, which required unearthing multiple cached patterns (notably about simultaneity) before becoming possible. Similarly, Faraday's and Maxwell's work on electromagnetism required them to reveal and correct the cached pattern of action-at-a-distance that had been integral to mathematical physics since Newton. Both of these also leveraged different methodologies than the way other physicists were using at the time.

Bachelard's favorite metaphor for this process is psychoanalysis, where the cached pattern (Bachelard would say epistemological obstacle) is brought to the fore, and then overcome and reshaped to better fit the needs of the moment.

We follow his trail and frame our approach as methodological therapy: unearthing the cached methodologies that govern how actual researchers go about their work, and work with them and the larger perspective we get from the history and philosophy of science in order to improve these methodologies and adapt them to the context at hand (here the difficult epistemic circumstances of alignment).

Still, before moving forward, let's remember Eliezer's warning about thinking that such improvement of cached methodologies are easy. (Here he's talking about Einstein figuring out General Relativity without much focus on raw experimental data.)

(Einstein's Speed, Eliezer Yudkowsky, 2008)

I will not say, "Don't try this at home." I will say, "Don't think this is easy."  We are not discussing, here, the victory of casual opinions over professional scientists.  We are discussing the sometime historical victories of one kind of professional effort over another.  Never forget all the famous historical cases where attempted armchair reasoning lost.

Framing Questions For Methodological Therapy

The metaphor of therapy highlights a key insight: you need to stay in conversation with actual people. Without that, the therapy becomes an academic exercise of sharing models that might be vaguely related to the problem at hand. This direct interaction is guided by two framing questions:

  • (Epistemic Aims) What are the researchers trying to accomplish?
  • (Cached Methodologies) What cached methodologies are they following?

On the other hand, a therapist that merely listen and understands, but cannot take a wider view, will only be minimally helpful. We thus still need to build better models of knowledge-production, but geared towards the actual needs of researchers in the field. This vindicates some of our intuitions about the irrelevance of some traditional questions in philosophy of science, like the realism/anti-realism debate. And it prompts the following framing questions:

  • (Alternatives for Epistemic Aims) How did historical scientists and inventors fulfill the epistemic aims of the researchers we're helping? What alternative methodologies did they follow?
  • (History of Methodologies) Which methodologies were used through the history of science and technology? What did they allow thinkers to do? What were their limitations and failure modes (when cached)?

Last but not least, therapy can easily fuck people up if it's done badly. We thus want to be careful about replacing a methodology with another without taking into account the value and usefulness of the initial one; otherwise we might slow down research rather than accelerate it.

  • (Value of Cached Methodology) What purpose do the cached methodologies serve?
  • (Risks of Methodological Therapy) How can unearthing a cached methodology and/or replacing it lead to adverse effects?

We're excited about this framing because it structures and grounds the interaction between our abstract research into models of knowledge-production and the concrete needs of researchers. These concrete epistemic aims and cached methodologies turn our vague wonderings about the different ways of revealing and producing evidence, into applications at the heart of alignment research.

Robustness To Scaling Down

Even if you buy into this line of research, we are still missing one key ingredient for our short-timelines theory of change: how much therapy is needed before being concretely useful to alignment researchers? Or phrased differently, is the value of this research robust to scaling down? After all, if to accelerate alignment research we need to wait until we fully solve all the questions above, we're probably not going to help a lot in a 5 to 15 years time frame.

We are not too worried about this, because answering even one or two of our framing questions will probably prove useful to researchers. If you're agentic enough, better understanding your aims and the cached methodologies you use already expands your range and your ability to adapt to different situations. It would obviously be more useful to have better methodologies to use, detailed models of when to use each alternative, or even a full algorithm of how to extract evidence in the relevant ways — but these massive accomplishments are not necessary for the epistemology to make you stronger.

We also expect that partial progress on better and better methodologies to give positive returns quickly, and to keep doing so until the eventual optimal ones; this is based on observing that often big progress in science emerges from relatively small changes in methodologies.

Feedback Loops With Reality

This post mainly takes the angle of "How epistemology can make alignment research stronger", but the opposite is also definitely part of the picture here: methodological therapy with actual researchers is bound to improve our epistemology. Already the interviews we conducted and the reflections that went into this post gave much needed structure to our many threads of exploration.

What Is Specific To Alignment Here?

Lastly, you might wonder why alignment was not much mentioned in the theory of change above. Sure, we anchored our framing in the needs of alignment researchers, but except that, all of our arguments seem to transfer to the rest of science (and Bachelard's model was about science in general).

We certainly believe that this work should also make us stronger at science in general. Yet there are two non-trivial ways in which it is heavily biased towards alignment:

  • The most obvious one is that we will interact directly with alignment researchers, and thus our methodological therapy will be tailored for the specific needs of such researchers. Said differently, if our more ambitious interventions should work across all of science, the more specific and short-term ones would transfer less easily as they would be biased towards what is useful for concrete alignment researchers and their actual aims and cached methodologies.
  • The second, deeper reason, is that we believe alignment to be fundamentally harder than the vast majority of science and technology, due to the range of simplifying assumptions which cannot be assumed for alignment. As such, we expect our work to have to tackle difficulties that most fields of science and technology (certainly ML) don't have to consider, or not all at once. Which means that these other fields would be less incentivized to pick up the fruits of our research than the alignment researchers who need all the help they can get.

Conclusion: Becoming Stronger Through Epistemology

We believe that progress in epistemology should make you stronger — better models of knowledge-production should make you better and faster researchers. Following interviews with research and Gaston Bachelard's work, we frame this improvement through methodological therapy, the analysis and revealing of epistemic aims and cached methodologies than can then be confronted with historical examples, expanded, and reshaped to better serve the needs of the researchers.

We then articulated framing questions guiding the process of epistemological therapy.

  • (Epistemic Aims) What are the researchers trying to accomplish?
  • (Cached Methodologies) What cached methodologies are they following?
  • (Alternatives for Epistemic Aims) How did historical scientists and inventors fulfill the epistemic aims of the researchers we're helping? What alternative methodologies did they follow?
  • (History of Methodologies) Which methodologies were used through the history of science and technology? What did they allow thinkers to do? What were their limitations and failure modes (when cached)?
  • (Value of Cached Methodology) What purpose do the cached methodologies serve?
  • (Risks of Methodological Therapy) How can unearthing a cached methodology and/or replacing it lead to adverse effects?

In light of this perspective, we are currently focusing on working with the interpretability team at Conjecture and the Refine participants to iterate on this process of methodological therapy, while keeping in mind the framing questions related to risks and downsides. After this trial run, we expect to expand our efforts towards more researchers and also field-builders and newcomers.

  1. ^

    If that sounds familiar to Kuhn's paradigm shifts and scientific revolutions, it definitely captures some of the same points. Now you might realize why french philosophers of science were not as excited or interested by Kuhn's book when it came out: the notion had been in their university courses for over 50 years at that point.

  2. ^

    It has been translated "Philosophy of No" in english, but that is missing in my opinion the double meaning that Bachelard aimed for, which is that in french, the word for no and the word use to indicate negation in non-euclidian are the same one, "non".

27

2 comments, sorted by Click to highlight new comments since: Today at 1:09 PM
New Comment

Glad to see you're working on this. It seems even more clearly correct (the goal, at least :)) for not-so-short timelines. Less clear how best to go about it, but I suppose that's rather the point!

A few thoughts:

  1. I expect it's unusual that [replace methodology-1 with methodology-2] will be a pareto improvement: other aspects of a researcher's work will tend to have adapted to fit methodology-1. So I don't think the creation of some initial friction is a bad sign. (also mirrors therapy - there's usually a [take things apart and better understand them] phase before any [put things back together in a more adaptive pattern] phase)
    1. It might be useful to predict this kind of thing ahead of time, to develop a sense of when to expect specific side-effects (and/or predictably unpredictable side effects).
  2. I do think it's worth interviewing at least a few carefully selected non-alignment researchers. I basically agree with your alignment-is-harder case. However, it also seems most important to be aware of things the field is just completely missing.
    1. In particular, this may be useful where some combination of cached methodologies is a local maximum for some context. Knowing something about other hills seems useful here.
      1. I don't expect it'd work to import full sets of methodologies from other fields, but I do expect there are useful bits-of-information to be had.
    2. Similarly, if thinking about some methodology x that most alignment researchers currently use, it might be useful to find and interview other researchers that don't use x. Are they achieving [things-x-produces] in other ways? What other aspects of their methodology are missing/different?
      1. This might hint both at how a methodology change may impact alignment researchers, and how any negative impact might be mitigated.
  3. Worth considering that there's less of a risk in experimenting (kindly, that is) on relative newcomers than on experienced researchers. It's a good idea to get a clear understanding of the existing process of experienced researchers. However, once we're in [try this and see what happens] mode there's much less downside with new people - even abject failure is likely to be informative, and the downside in counterfactual object-level research lost is much smaller in expectation.

Thanks for the kind words and useful devil's advocate! (I'm expecting nothing less from you ;p)

  1. I expect it's unusual that [replace methodology-1 with methodology-2] will be a pareto improvement: other aspects of a researcher's work will tend to have adapted to fit methodology-1. So I don't think the creation of some initial friction is a bad sign. (also mirrors therapy - there's usually a [take things apart and better understand them] phase before any [put things back together in a more adaptive pattern] phase)
    1. It might be useful to predict this kind of thing ahead of time, to develop a sense of when to expect specific side-effects (and/or predictably unpredictable side effects)

I agree that pure replacement of methodology is a massive step that is probably premature before we have a really deep understanding both of the researcher's approach and of the underlying algorithm for knowledge production. Which is why in my model, this comes quite late; instead the first step are more revealing the cached methodology to the researcher, and showing alternatives from History of Science (and Technology) to make more options and approaches credible for them.

Also looking at the "sins of the fathers" for philosophy of science (how methodologies have fucked up people across history) is part of our last set of framing questions. ;)
 

  1. I do think it's worth interviewing at least a few carefully selected non-alignment researchers. I basically agree with your alignment-is-harder case. However, it also seems most important to be aware of things the field is just completely missing.
    1. In particular, this may be useful where some combination of cached methodologies is a local maximum for some context. Knowing something about other hills seems useful here.
      1. I don't expect it'd work to import full sets of methodologies from other fields, but I do expect there are useful bits-of-information to be had.
    2. Similarly, if thinking about some methodology x that most alignment researchers currently use, it might be useful to find and interview other researchers that don't use x. Are they achieving [things-x-produces] in other ways? What other aspects of their methodology are missing/different?
      1. This might hint both at how a methodology change may impact alignment researchers, and how any negative impact might be mitigated.

Two reactions here:

  1. I agree with the need to find things that are missing and alternatives, which is where the history and philosophy of science works come to help. One advantage of it is that you can generally judge whether the methodology was successful or problematic in hindsight there, compared to interviews.
  2. I hadn't thought about interviewing other researchers. I expect it to be less efficient in a lot of ways than the HPS work, but I'm also now on the lookout for the option, so thanks!
  1. Worth considering that there's less of a risk in experimenting (kindly, that is) on relative newcomers than on experienced researchers. It's a good idea to get a clear understanding of the existing process of experienced researchers. However, once we're in [try this and see what happens] mode there's much less downside with new people - even abject failure is likely to be informative, and the downside in counterfactual object-level research lost is much smaller in expectation.

I see what you're pointing out. A couple related thoughts:

  1. The benefits of working with established researchers is that you have a historical record of what they did, which makes it easier to judge whether you're actually helping.
  2. I also expect helping established researchers to be easier on some dimensions, because they have more experience learning new models and leveraging them.
  3. Related to your first point, I don't worry too much about messing people up because the initial input will far less invasive than replacements of methodologies wholesale. But we're still investigating the risks to be sure we're not doing something net negative.