G Gordon Worley III

If you are going to read just one thing I wrote, read The Problem of the Criterion.

More AI related stuff collected over at PAISRI


Formal Alignment


Value extrapolation partially resolves symbol grounding

This doesn't really seem like solving symbol grounding, partially or not, so much as an argument that it's a non-problem for the purposes of value alignment.

$1000 USD prize - Circular Dependency of Counterfactuals

Agreed. That said, I don't think counterfactuals are in the territory. I think I said before that they were in the map, although I'm now leaning away from that characterisation as I feel that they are more of a fundamental category that we use to draw the map.

Yes, I think there is something interesting going on where human brains seem to operate in a way that makes counterfactuals natural. I actually don't think there's anything special about counterfactuals, though, just that the human brain is designed such that thoughts are not strongly tethered to sensory input vs. "memory" (internally generated experience), but that's perhaps only subtly different than saying counterfactuals rather than something powering them is a fundamental feature of how our minds work.

$1000 USD prize - Circular Dependency of Counterfactuals

I don't think they're really at odds. Zack's analysis cuts off at a point where the circularity exists below it. There's still the standard epistemic circularity that exists whenever you try to ground out any proposition, counterfactual or not, but there's a level of abstraction where you can remove the seeming circularity by shoving it lower or deeper into the reduction of the proposition towards grounding out in some experience.

Another way to put this is that we can choose what to be pragmatic about. Zack's analysis choosing to be pragmatic about counterfactuals at the level of making decisions, and this allows removing the circularity up to the purpose of making a decision. If we want to be pragmatic about, say, accurately predicting what we will observe about the world, then there's still some weird circularity in counterfactuals to be addressed if we try to ask questions like "why these counterfactuals rather than others?" or "why can we formulate counterfactuals at all?".

Also I guess I should be clear that there's no circularity outside the map. Circularity is entirely a feature of our models of reality rather than reality itself. That's way, for example, the analysis on epistemic circularity I offer is that we can ground things out in purpose and thus the circularity was actually an illusion of trying to ground truth in itself rather than experience.

I'm not sure I've made this point very clearly elsewhere before, so sorry if that's a bit confusing. The point is that circularity is a feature of the relative rather than the absolute, so circularity exists in the map but not the territory. We only get circularity by introducing abstractions that can allow things in the map to depend on each other rather than the territory.

$1000 USD prize - Circular Dependency of Counterfactuals

I think A is solved, though I wouldn't exactly phrase it like that, more like counterfactuals make sense because they are what they are and knowledge works the way it does.

Zack seems to be making a claim to B, but I'm not expert enough in decision theory to say much about it.

$1000 USD prize - Circular Dependency of Counterfactuals

I mostly agree with Zack_M_Davis that this is a solved problem, although rather than talking about a formalization of causality I'd say this is a special case of epistemic circularity and thus an instance of the problem of the criterion. There's nothing unusual going on with counterfactuals other than that people sometimes get confused about what propositions are (e.g. they believe propositions have some sort of absolute truth beyond causality because they fail to realize epistemology is grounded in purpose rather than something eternal and external to the physical world) and then go on to get mixed up into thinking that something special must be going on with counterfactuals due to their confusion about propositions in general.

I don't know if I'll personally get around to explaining this in more detail, but I think this is low hanging fruit since it falls out so readily from understanding the contingency of epistemology caused by the problem of the criterion.

Integrating Three Models of (Human) Cognition

Thanks for this thorough summary. At this point the content has become spread over a books worth of posts, so it's handy to have this high level, if long, summary!

Drug addicts and deceptively aligned agents - a comparative analysis

Thanks for this interesting read.

I think there's similar work that can be done to find safety analogues from a large number of fields. Some that come to mind include organizational design, market analysis, and design of public institutions.

Daniel Kokotajlo's Shortform

Some of my own:

  • SSDs
  • laptops
  • CDs
  • digital cameras
  • modems
  • genome sequencing
  • automatic transmissions for cars that perform better than a moderately skilled human using a manual transmission can
  • cheap shipping
  • solar panels with reasonable power generation
  • breathable wrinkle free fabrics that you can put in the washing machine
  • bamboo textiles
  • good virtual keyboards for phones
  • scissor switches
  • USB
  • GPS
Selection Theorems: A Program For Understanding Agents

Interesting. Selection theorems seem like a way of identifying the purposes or source of goal directness in agents that seems obvious to us yet hard to pin down. Compare also the ground of optimization.

David Wolpert on Knowledge

I don't really have a whole picture that I think says more than what others have. I think there's something to knowing as the act of operationalizing information, by which I mean a capacity to act based on information.

To make this more concrete, consider a simple control system like a thermostat or a steam engine governor. These systems contain information in the physical interactions we abstract away to call "signal" that's sent to the "controller". If we had only signal there'd be no knowledge because that's information that is not used to act. The controller creates knowledge by having some response it "knows" to perform when it gets the signal.

This view then doesn't really distinguish knowledge from purpose in a cybernetic sense, and I think that seems reasonable at first blush. This let's us draw a hard line between "dead" information like words in a book and "live" information like words being read.

Of course this doesn't necessarily make all the distinctions we'd hope to make, since this makes no difference between a thermostat and a human when it comes to knowledge. Personally I think that's correct. There's perhaps some interesting extra thing to say about the dynamism of these two systems (the thermostat is an adaption executor only, the human is that and something capable of changing itself intentionally), but I think that's separate from the knowledge question.

Obviously this all hinges on a particular sort of deflationary approach to these terms to have them make sense with the weakest possible assumptions and covering the broadest classes of systems. Whether or not this sort of "knowledge" I'm proposing here is useful for much is another question.

Load More