G Gordon Worley III

If you are going to read just one thing I wrote, read The Problem of the Criterion.

More AI related stuff collected over at PAISRI


Formal Alignment


Alright, fair warning, this is an out there kind of comment. But I think there's some kind of there there, so I'll make it anyway.

Although I don't have much of anything new to say about it lately, I spent several years really diving into developmental psychology and my take on most of it is that its an attempt to map changes in the order of complexity of the structure thoughts can take on. I view the stages of human psychological development as building up the mental infrastructure to be able to hold up to three levels of fully-formed structure (yes, this is kind of handwavy about what a fully-formed structure is) in your mind simultaneously without effort (i.e. your System 1 can do this). My most recent post exploring this idea in detail is here.

This fact about how humans think and develop seems an important puzzle piece in understanding how, among other things, we address your questions around understanding what other minds understand.

For example, as people move through different phases of psychological development, one of the key skills they gain is better cognitive empathy. I think this comes from being able to hold more complex structures in their mind and thus be able to model other minds more richly. An interesting question I don't know the answer to is if you get more cognitive empathy past the end of where human psychological development seems to stop. LIke, if an AI could hold 4 or 5 levels simultaneously instead of just 3, would they understand more than us, or just be faster. I might compare it to a stack based computer. A 3-register stack is sufficient to run arbitrary computations, but if you've ever used an RPN calculator you know that having 4 or more registers sure makes life easier even if you know you could always do it with just 3.

I don't know that I really have a lot of answers here, but hopefully these are somewhat useful puzzle pieces you can work on fitting together with other things you're looking at.

Why does there need to be structure? We can just have a non-uniform distribution of energy around the universe in order for there to be information to extract. I guess you could call this "structure" but that seems like a stretch to me.

I don't know if I can convince you. You seem pretty convinced that there are natural abstractions or something like them. I'm pretty suspicious that there are natural abstractions and instead think there are useful abstractions but they are all contingent on how the minds creating those abstractions are organized and that no abstractions meaningfully exist independent of the minds that create them. Perhaps the structure of our universe limits how minds work in ways that de facto means we all create ontology within certain constraints, but I don't think we know enough to prove this.

By my view, any sense in which abstractions seem natural is a kind of typical mind fallacy.

Sure, differences are as real as the minds making them are. Once you have minds those minds start perceiving differentiation since they need to extract information from the environment to function. So I guess I'm saying I don't see what your objection is in this last comment as you've not posited anything that seems to claim something that actually disagrees with my point as far as I can tell. I think it's a bit weird to call the differentiation you're referring to "objective", but you explained what you mean.

Isn't a special case of aiming at any target we want the goals we would want it to have? And whatever goals we'd want it to have would be informed by our ontology? So what I'm saying is I think there's a case where the generality of your claim breaks down.

I think that the big claim the post relies on is that values are a natural abstraction, and the Natural Abstractions Hypothesis holds. Now this is admittedly very different from the thesis that value is complex and fragile.

It is not that AI would naturally learn human values, but that it's relatively easy for us to point at human values/Do What I Mean/Corrigibility, and that they are natural abstractions.

This is not a claim that is satisfied by default, but is a claim that would be relatively easy to satisfy if true.

If this is the case, my concern seems yet more warranted, as this is hoping we won't suffer a false positive alignment scheme that looks like it could work but won't. Given the his cost of getting things wrong, we should minimize false positive risks which means not pursuing some ideas because the risk if they are wrong is too high.

For what it's worth, I think you're running headlong into an instance of the problem of the criterion and enjoy seeing how you're grappling with it. I've tagged this post as such.

Reading this post I think it insufficiently addresses motivations, purpose, reward functions, etc. to make the bold claim that perfect world-model interpretability is sufficient for alignment. I think this because ontology is not the whole of action. Two agents with the same ontology and very different purposes would behave in very different ways.

Perhaps I'm being unfair, but I'm not convinced that you're not making the same mistake as when people claim any sufficiently intelligent AI would be naturally good.

This seems straightforward to me: reification is a process by which our brain picks out patterns/features and encodes them so we can recognize them again and make sense of the world given our limited hardware. We can then think in terms of those patterns and gloss over the details because the details often aren't relevant for various things.

The reason we reify things one way versus another depends on what we care about, i.e. our purposes.

To me this seems obvious: noumena feel real to most people because they're captured by their ontology. It takes a lot of work for a human mind to learn not to jump straight from sensation to reification, and even with training there's only so much a person can do because the mind has lots of low-level reification "built in" that happens prior to conscious awareness. Cf. noticing

Oh, I thought I already explained that. There's at least two different ways "exist" can be meant here, and I think we're talking past each other.

For some thing to exist that implies it must exist ontologically, i.e. in the map. Otherwise it is not yet a thing. So I'm saying there's a difference between what we might call existence and being. You exist, in the sense of being an ontological thing, only by virtue of reification, but you are by virtue of the whole world being.

Load More