G Gordon Worley III

If you are going to read just one thing I wrote, read The Problem of the Criterion.

More AI related stuff collected over at PAISRI


Formal Alignment


Actually, I kind of forgot what ended up in the paper, but then I remembered so wanted to update my comment.

There was an early draft of this paper that talked about deontology, but because there are so many different forms of deontology it was hard to come up with arguments where there wasn't some version of deontological reasoning that broke the argument, so I instead switched to talking about the question of moral facts independent of ethical system. That said, the argument I make in the paper suggesting that moral realism is more dangerous than moral antirealism or nihilism to assume is quite similar to the concerns with deontology. Namely, if an AI assumes an ethical system can be made up of rules, then it will fail in the case where no set of rules can capture the best ethics for humans, so poses a risk of false positives among deontological AI.

Hopefully the arguments about moral facts are still useful, and you might find the style of argumentation useful to your purposes.

I don't see it in the references so you might find this paper of mine (link is to Less Wrong summary, which links to full thing) interesting because within it I include an argument suggesting building AI that assumes deontology is strictly more risky than building one that does not.

If the mind becomes much more capable than the surrounding minds, it does so by being on a trajectory of creativity: something about the mind implies that it generates understanding that is novel to the mind and its environment.


I don't really understand this claim enough to evaluate it. Can you expand a bit on what you mean by it? I'm unsure about the rest of the post because it's unclear to me what the premise your top-line claim rest upon means.

to answer my own question:

Level of AI risk concern: high

General level of risk tolerance in everyday life: low

Brief summary of what you do in AI: first tried to formalize what alignment would mean, this led me to work on a program of deconfusing human values that reached an end of what i could do, now have moved on to writing about epistemology that i think is critical to understand if we want to get alignment right

Anything weird about you: prone to anxiety, previously dealt with OCD, mostly cured it with meditation but still pops up sometimes

I think I disagree. Based on your presentation here, I think someone following a policy inspired by this post would be more likely to cause existential catastrophe by pursuing a promising false positive that actually destroys all future value in our Hubble volume. I've argued we need to focus on minimizing false positive risk rather than optimizing for max expected value, which is what I read this as proposing we do.

This post brought to mind a thought: I actually don't care very much about arguments about how likely doom is and how pessimistic or optimistic to be since they are irrelevant, to my style of thinking, for making decisions related to building TAI. Instead, I mostly focus on downside risks and avoiding them because they are so extreme, which makes me look "pessimistic" but actually I'm just trying to minimize the risk of false positives in building aligned AI. Given this framing, it's actually less important, in most cases, to figure out how likely something is, and more important to figure out how likely doom is if we are wrong, and carefully navigate the path that minimizes the risk of doom, regardless of what the assessment of doom is.

A good specific example of trying to pull this kind of shell game is perhaps HCH. I don't recall if someone made this specific critique of it before, but it seems like there's some real concern that it's just hiding the misalignment rather than actually generating an aligned system.

In classical Chinese philosophy there's the concept of shi-fei or "this not that". A key bit of the idea, among other things, is that all knowledge involves making distinctions, and those distinctions are judgments, and so if you want to have knowledge and put things into words you have to make this-not-that style judgements of distinction to decide what goes in what category.

More recently here on the forum, Abram has written about teleosemantics, which seems quite relevant to your investigations in this post.

The teleosemantic picture is that epistemic accuracy is a common, instrumentally convergent subgoal; and "meaning" (in the sense of semantic content) arises precisely where this subgoal is being optimized. 

I think this is exactly right. I often say things like "accurate maps are extremely useful to things like survival, so you and every other living thing has strong incentives to draw accurate maps, but this is contingent on the extent to which you care about e.g. survival".

So to see if I have this right, the difference is I'm trying to point at a larger phenomenon and you mean teleosemantics to point just at the way beliefs get constrained to be useful.

Cool. For what it's worth, I also disagree with many of my old framings. Basically anything written more than ~1 year ago is probably vaguely but not specifically endorsed.

Load More