Financial status: This is independent research, now supported by a grant. I welcome financial support .

Epistemic status: This is in-progress thinking.

This post is part of a sequence on the accumulation of knowledge. Our goal is to articulate what it means for knowledge to accumulate within a physical system.

The challenge is this: given a closed physical system, if I point to a region and tell you that knowledge is accumulating in this region, how would you test my claim? What are the physical characteristics of the accumulation of knowledge? What is it, exactly, about an artifact inscribed with instructions for building advanced technology that makes it so different from an ordinary rock, or from a video camera that has been travelling the cosmos recording data since the beginning of the universe? We are looking for a definition of knowledge at the level of physics.

The previous four posts have explored four possible accounts for what the accumulation of knowledge consists of. Three of the accounts viewed knowledge as a correspondence between map and territory, exploring different operationalizations of "correspondence". The fourth account viewed the accumulation of knowledge as an increasing sensitivity of actions to the environment. We found significant counter-examples to each of the four accounts, and I remain personally unsatisfied with any of the four accounts.

This post will briefly review literature on this topic.

Constructive definitions of knowledge

One could view probability theory as a definition of knowledge. Probability theory would say that knowledge is a mapping from claims about the world to numerical credences in a way that obeys the laws of probability theory, and that the accumulation of knowledge happens when we condition our probabilities on more and more evidence. Probability theory is not usually presented as an account of what knowledge is, but it could be.

However, probability theory is most useful when we are designing machines from scratch rather than when we are interpreting already-constructed machines such as those produced by large-scale search in machine learning. It might be possible to reverse-engineer a neural network and determine whether it encodes a program that approximately implements Bayesian inference, but it would take a lot of work to do that, and it would have to be done on a case-by-case basis. Probability theory is not really intended to give an account of what it means for a physical system to accumulate knowledge, just as an instruction manual for assembling some particular desktop computer is not on its own a very good definition of what computation is.

In the same way we could view all the various formal logical systems, as well as logical induction and other non-probability-theory systems of credences as constructive accounts of knowledge, although these, like probability theory, do not help us much with our present task.

Philosophical analysis of knowledge

There is an extensive philosophical literature concerning the analysis of knowledge. This area received a great deal of attention during the second half of the 20th century. The basic shape of this literature seems to be a rejection of the view that knowledge is "justified true belief", followed by counterexamples to the "justified true belief" paradigm, followed by many attempts to augment the "justified true belief" definition of knowledge, followed by further counterexamples refuting each proposal, followed by whole books being written about general-purpose counterexample construction methods, followed by a general pessimism about coming to any satisfying definition of knowledge. My main source on this literature has been the SEP article on the topic, which of course gives me only a limited perspective on this enormous body of work. I hope that others better-versed in this literature will correct any mistakes that I have made in this section.

The "justified true belief" position, which much of the 20th century literature was written to either refute or fix, is that a person knows a thing if they believe the thing, are justified in believing the thing, and the thing is in fact true.

The classic counterexample to this definition is the case where a person believes a thing due to sound reasoning, but unbeknownst to them were fooled in some way in their perceptions, yet the thing itself turns out to be true by pure luck:

Imagine that we are seeking water on a hot day. We suddenly see water, or so we think. In fact, we are not seeing water but a mirage, but when we reach the spot, we are lucky and find water right there under a rock. Can we say that we had genuine knowledge of water? The answer seems to be negative, for we were just lucky. (Dreyfus 1997: 292)[1]

Counter-examples of this form are known as Gettier cases. There are many, many proposed remedies in the form of a fourth condition to add to the "justified true belief" paradigm. Some of them are:

  • The belief must not have been formed on the basis of any false premise

  • The belief must have been formed in a way that is sensitive to the actual state of the world

  • The thing itself must be true in other nearby worlds where the same belief was formed

  • The belief must have been formed by a reliable cognitive process

  • There must be a causal connection from the way the world is to the belief

The follow-up counterexamples that rebounded off these proposals eventually led to the following situation:

After some decades of such iterations, some epistemologists began to doubt that progress was being made. In her 1994 paper, "The Inescapability of Gettier Problems", Linda Zagzebski suggested that no analysis sufficiently similar to the JTB analysis could ever avoid the problems highlighted by Gettier’s cases. More precisely, Zagzebski argued, any analysans of the form JTB+X, where X is a condition or list of conditions logically independent from justification, truth, and belief, would be susceptible to Gettier-style counterexamples. She offered what was in effect a recipe for constructing Gettier cases:

(1) Start with an example of a case where a subject has a justified false belief that also meets condition X.

(2) Modify the case so that the belief is true merely by luck.

Zagzebski herself proposed "the belief must not be true merely by luck" as a possible fourth condition for the existence of knowledge, but even this fell to further counterexamples.

Overall I found this literature extremely disappointing as an attempt to really get at what knowledge is:

  • The literature seems to be extremely concerned with what certain words mean, particularly the words "knowledge" and "belief". It’s all about what is meant when someone says "I know X" or "I believe X", which is a reasonable question to ask but seems to me like just one way to investigate the actual phenomenon of knowledge out there in the world, yet the literature seems unreasonably fixated on this word-centric style of investigation.

  • The literature seems to be centered around various subjects and their cognition. The definitions are phrased in terms of a certain subject believing something or having good reason for believing something, and this notion that there are subjects that have beliefs seems to be taken as primary by all writers. I suspect that an unseen presupposition of the agent model is at work in sneaking in premises that create the confusion that everyone seems to agree exists in the field. An alternative to taking subjects as primary is to take an underlying mechanistic universe as primary, as this sequence has done. Now there are pros and cons to this mechanistic-universe approach in comparison to the subject-cognition approach. It is not that one approach is outright better than the other. But it is disappointing to see an entire field apparently stuck inside a single way of approaching the problem (the subject-cognition approach), with apparently no-one considering alternatives.

  • The literature spends a lot of time mapping out various possible views and the relationship between them. It is frequently written that "internalists believe this", "state internalism implies this", "access internalism implies that", and so on. It’s good to map out all these views but sometimes the mapping out of views seems to become primary and the direct investigation of what knowledge actually is seems to take a back seat.

Overall I have not found this literature helpful as a means to determine whether the entities we are training with machine learning understand more about the world than we expected, nor how we might construct effective goal-directed entities in a world without Cartesian boundaries.

Eliezer on knowledge

Much of Eliezer’s writing in The Sequences concerns how it is that we acquire knowledge about the world, and what reason we have to trust our knowledge. He suggests this as the fundamental question of rationality:

I sometimes go around saying that the fundamental question of rationality is Why do you believe what you believe?

However, most of Eliezer’s writings about knowledge are closer to the constructive frame than the mechanistic frame. That is, they are about how we can acquire knowledge ourselves and how we can build machines that acquire knowledge, not about how we can detect the accumulation of knowledge in systems for which we have no design blueprints.

One exception to this is Engines of Cognition, which does address the question from a mechanistic frame, and is a significant inspiration for the ideas explored in this sequence (as well as for my previous writings on optimization). In the essay, Eliezer demonstrates, eloquently as ever, that the laws of thermodynamics prohibit the accumulation of knowledge in the absence of physical evidence:

"Forming accurate beliefs requires a corresponding amount of evidence" is a very cogent truth both in human relations and in thermodynamics: if blind faith actually worked as a method of investigation, you could turn warm water into electricity and ice cubes. Just build a Maxwell's Demon that has blind faith in molecule velocities.


So unless you can tell me which specific step in your argument violates the laws of physics by giving you true knowledge of the unseen, don't expect me to believe that a big, elaborate clever argument can do it either.

In the language of the present sequence, what Eliezer points out is that information is a necessary condition for knowledge. I agree that information is a necessary condition for knowledge although I have argued in this sequence that it is not a sufficient condition.

Other writings

I suspect that this literature review has missed a lot of relevant writing both here on lesswrong and elsewhere. I would very much appreciate pointers to further resources.


This sequence has investigated the question of what it means for knowledge to accumulate within a region within a closed physical system. Our motivations were firstly to detect the accumulation of knowledge as a safety measure for entities produced by large-scale search, and secondly to investigate agency without imposing any agent/environment boundary.

The first three definitions we explored were variations on the theme of measuring mutual information between map and territory. We attempted to do this by looking at just a single configuration of the system, then by looking at a distribution over configurations, and then by looking at digital abstraction layers. The counterexamples given for each of these three proposals seem to show that none of these definitions are sufficient conditions for the accumulation of knowledge, though the second of these, and perhaps even the third, seems to be a necessary condition.

We then explored a definition of knowledge accumulation as a coupling between map and territory that shows up as a sensitivity of the part of the system that has "knowledge" to the remainder of the system.

Overall, I am not sure that any of these definitions are really on the right track, but I am very enthusiastic about the question itself. I like the frame of "what would it mean at the level of physics for…" as a means to break out of the agent-centric frame that the philosophical literature on this subject often seems to be inside. I think that keeping in mind the practical goals of AI safety helps to keep the investigation from becoming a conceptual Rorschach test.

  1. Dreyfus, George B.J., 1997, Recognizing Reality: Dharmakirti’s Philosophy and its Tibetan Interpretations, Albany, NY: SUNY Press. ↩︎

New Comment