AI ALIGNMENT FORUM
AF

Gordon Seidoh Worley
Ω3051817612
Message
Dialogue
Subscribe

I'm writing a book about epistemology. It's about The Problem of the Criterion, why it's important, and what it has to tell us about how we approach knowing the truth.

I've also written a lot about AI safety. Some of the more interesting stuff can be found at the site of my currently-dormant AI safety org, PAISRI.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Formal Alignment
4G Gordon Worley III's Shortform
6y
14
G Gordon Worley III's Shortform
Gordon Seidoh Worley1mo10

I don't disagree that those who we called nobels frequently acted badly. But I do see idealized noble values as worth looking at. Think less real kings and lords and more valorized archetypes like Robin Hood, King Richard the Lionhearted, of course King Arthur and his Knights. I think this fiction captures a kind of picture of the expectations we set for what good leaders look like who have power over others, and that's the version I'm suggesting is worth using as a starting point for what we want "good" AI to look like.

I'm also not very concerned about the economic reality of what made the need for idealized nobility norms exist in feudal societies. I don't see that as a key part of what I'm pointing at. Nobility has a larger and longer tradition than the one used in Medieval Europe, though it is the expression of it that I and most folks on Less Wrong are probably familiar with.

Reply
G Gordon Worley III's Shortform
Gordon Seidoh Worley1mo3-4

We've spent years talking about "aligned" AI, and "Friendly" AI before that, but maybe we should have spent that time talking about "noble" AI.

To be noble is, in part, to act in the best interests of one's domain of responsibility. Historically this meant the peasants living and working on an estate. Today this might mean being a responsible leader of a business who prioritizes people as well as profits and owns the fallout of hard decisions, or being a responsible political leader, though those seem few and far between these days.

We've lost touch with the idea of nobility, but a noble AI might exhibit these traits we think as positive for alignment:

  • cares for organic life
  • doesn't take actions that would harm others except to defend itself
  • is patient and understanding of the fear others may have at its power
  • finds solutions that are not deceptive and don't undermine the agency of others, while also not shying away from stopping others from doing seriously dangerous things
  • honors a commitment to helping life flourish
  • holds itself back when exercising more power would cause harm

and many more. I'm just starting to think about this idea as an alternative framing for what we've been calling alignment, so curious for folks thoughts.

Reply
Do Not Tile the Lightcone with Your Confused Ontology
Gordon Seidoh Worley1mo43

We used to think a lot about the potential for issues caused by an AI experiencing an ontological crisis. But your post seems to suggest we should perhaps be more concerned by the issues created by imposing ontology on AIs and them running away with that ontology. Is that how you're thinking about this?

Reply
Recreating the caring drive
Gordon Seidoh Worley7mo10Review for 2023 Review

I'd really like to see more follow up on the ideas made in this post. Our drive to care is arguably why we're willing to cooperate, and making AI that cares the same way we do is a potentially viable path to AI aligned with human values, but I've not seen anyone take it up. Regardless, I think this is an important idea and think folks should look at it more closely.

Reply
Teleosemantics!
Gordon Seidoh Worley7mo30Review for 2023 Review

I think this post is important because it brings old insights from cybernetics into a modern frame that relates to how folks are thinking about AI safety today. I strongly suspect that the big idea in this post, that ontology is shaped by usefulness, matters greatly to addressing fundamental problems in AI alignment.

Reply
Finding the Wisdom to Build Safe AI
Gordon Seidoh Worley1y10

Seems reasonable. I do still worry quite a bit about Goodharting, but perhaps this could be reasonably addressed with careful oversight by some wise humans to do the wisdom equivalent of red teaming.

Reply
We might be dropping the ball on Autonomous Replication and Adaptation.
Gordon Seidoh Worley1y56

According to METR, the organization that audited OpenAI, a dozen tasks indicate ARA capabilities.

Small comment, but @Beth Barnes of METR posted on Less Wrong just yesterday to say "We should not be considered to have ‘audited’ GPT-4 or Claude".

This doesn't appear to be a load-bearing point in your post, but would still be good to update the language to be more precise.

Reply1
How to talk about reasons why AGI might not be near?
Gordon Seidoh Worley2y10

Ah I see. I have to admit, I write a lot of my comments between things and I missed that the context of the post could cause my words to be interpreted this way. These days I'm often in executive mode rather than scholar mode and miss nuance if it's not clearly highlighted, hence my misunderstanding, but also reflects where I'm coming from with this answer!

Reply
How to talk about reasons why AGI might not be near?
Gordon Seidoh Worley2y10

I left a comment over in the other thread, but I think Joachim misunderstands my position.

In the above comment I've taken for granted that there's a non-trivial possibility that AGI is near, so I'm not arguing we should say that "AGI is near" regardless of whether it is or not, because we don't know if it is or not, we only have our guesses about it, and so long as there's a non-trivial chance that AGI is near, I think that's the more important message to communicate.

Overall it would be better if we can communicate something like "AGI is probably near", but "probably" and similar terms are going to get rounded off, so even if you do literally say "AGI is probably near" or similar, that's not what people will hear, and if you're going to say "probably" my argument is that it's better if they round the "probably" off to "near" rather than "not near".

Reply
How to talk about reasons why AGI might not be near?
Answer by Gordon Seidoh WorleySep 17, 20231-2

From a broad policy perspective, it can be tricky to know what to communicate. I think it helps if we think a bit more about the effects of our communication and a bit less about correctly conveying our level of credence in particular claims. Let me explain.

If we communicate the simple idea that AGI is near then it pushes people to work on safety projects that would be good to work on even if AGI is not near while paying some costs in terms of reputation, mental health, and personal wealth.

If we communicate the simple idea that AGI is not near then people will feel less need to work on safety soon. This would let them not miss out on opportunities that would be good to take ahead of when they actually need to focus on AI safety.

We can only really communicate one thing at a time to people. Also, we should worry more about tail risks a false positives (thinking we can build AGI safely when we cannot) than false negatives (thinking we can't build AGI safely when we can). Taking these two facts into consideration, I think the policy implication is clear: unless there is extremely strong evidence that AGI is not near, we must act and communicate as if AGI is near.

Reply
Load More
0Teaching Claude to Meditate
7mo
0
15Finding the Wisdom to Build Safe AI
1y
4
18Dangers of Closed-Loop AI
1y
1
5How much do personal biases in risk assessment affect assessment of AI risks?
Q
2y
Q
1
6Yampolskiy on AI Risk Skepticism
4y
0
6Bootstrapped Alignment
4y
7
4[Preprint] The Computational Limits of Deep Learning
5y
3
2Comparing AI Alignment Approaches to Minimize False Positive Risk
5y
0
5What are the high-level approaches to AI alignment?
Q
5y
Q
13
7The Mechanistic and Normative Structure of Agency
5y
0
Load More
The Problem of the Criterion
3y
(+1/-7)
Occam's Razor
4y
(+58)
The Problem of the Criterion
4y
(+80)
The Problem of the Criterion
4y
(+570)
Dark Arts
4y
(-11)
Transformative AI
4y
(+15/-13)
Transformative AI
4y
(+348)
Internal Family Systems
4y
(+59)
Internal Family Systems
4y
(+321)
Buddhism
5y
(+321)
Load More