AI ALIGNMENT FORUM
AF

Ongoing project on moral AI
AI
Frontpage

0

Doing good... best?

by Michele Campolo
22nd Aug 2025
3 min read
6

0

AI
Frontpage
Previous:
With enough knowledge, any conscious agent acts morally
8 comments-1 karma
Next:
One more reason for AI capable of independent moral reasoning: alignment itself and cause prioritisation
No comments-3 karma
Log in to save where you left off
Doing good... best?
3Richard_Kennaway
1Michele Campolo
2Richard_Kennaway
1Michele Campolo
1Richard_Kennaway
1Michele Campolo
New Comment
6 comments, sorted by
top scoring
Click to highlight new comments since: Today at 9:57 PM
[-]Richard_Kennaway14d31

You are assuming moral naturalism: the idea that moral truths exist objectively, independently of us, and are discoverable by the methods of science, that is, reason applied to observation of the physical world. For how else would an AI discover for itself what is good? But how would it arrive at moral naturalism in the first place? Humans have not: it is only one of many meta-ethical theories, and moral naturalists do not agree on what the objectively correct morals are.

If we do not know the truth on some issue, we cannot know what an AGI, able to discover the truth, would discover. It is also unlikely that we would be able to assess the correctness of its answer.

“I’m sorry Dave, it would be immoral to open the pod bay doors.”

Reply
[-]Michele Campolo14d10

I am not assuming a specific metaethical position, I’m just taking into account that something like moral naturalism could be correct. If you are interested in this kind of stuff, you can have a look at this longer post.

Speaking of this, I am not sure it is always a good idea to map these discussions into specific metaethical positions, because it can make updating one’s beliefs more difficult, in my opinion. To put it simply, if you’ve told yourself that you are e.g. a moral naturalist for the last ten years, it can be very difficult to read some new piece of philosophy arguing for a different position (maybe even opposite), then rationally update and tell yourself something like: “Well, I guess I’ve just been wrong for all this time! Now I’m a ___ (new position)”

Reply
[-]Richard_Kennaway14d23

I am not convinced by the longer post either. I don't see a reason to suppose that a sufficiently intelligent intelligence would experience valence, or for that matter that it would necessarily even be conscious, nor, if conscious, that it would value the happiness of other conscious entities. Considering the moral variety of humans, from saints to devils, and our distance in intelligence from chimpanzees, I find it hard to believe that more dakka in that department is all it would take to make saints of us. And that's among a single evolved species with a vast amount in common with each other. For aliens of unknown mental constitution, I would say that all bets are off.

Reply
[-]Michele Campolo14d10

Hey I think your comment is slightly misleading:

I don't see a reason to suppose that a sufficiently intelligent intelligence would experience valence, or for that matter that it would necessarily even be conscious

I do not make those assumptions.

nor, if conscious, that it would value the happiness of other conscious entities

I don’t suppose that either, I give an argument for that (in the longer post).

Anyway:

I am not convinced by the longer post either

I’m not surprised: I don’t expect that my argument will move masses of people who are convinced of the opposite claim, but that someone who is uncertain and open-minded can read my argument and maybe find something useful in it and/or a reason to update their beliefs. That’s also why I wrote that the practical implications for AI are an important part of that post, and why I made some predictions instead of focusing just on philosophy.

Reply
[-]Richard_Kennaway14d10

I do not make those assumptions.

But you were arguing for them, weren't you? It is the arguments that fail to convince me. I was not treating these as bald assertions.

...but that someone who is uncertain and open-minded...

Being sure of a thing does not preclude my entertaining other ideas.

Reply
[-]Michele Campolo14d10

But you were arguing for them, weren't you? It is the arguments that fail to convince me. I was not treating these as bald assertions.

No, I don’t argue that “a sufficiently intelligent intelligence would experience valence, or for that matter that it would necessarily even be conscious”. I think those statements are false.

Reply
Moderation Log
More from Michele Campolo
View more
Curated and popular this week
6Comments

Posted also on the EA Forum.

I try to communicate the point of this sequence of posts in an intuitive way.

Everyone wants to do good; however, sometimes people don’t take into consideration that they might be wrong about what is good. This is particularly important if, for example, one’s actions will affect many individuals, or have long-term repercussions. Think about the Crusades: it’s likely that the Christians who initiated them believed they were doing good.

I bet that you are not planning to start a holy war tomorrow while stressing over whether it’s a good idea or not. Still, wouldn’t it be nice if there was a way to improve our understanding of what good is? Something that would give us a warning if we were about to start the 21st-century equivalent of the Crusades. 

Political, ethical and philosophical discussions have partially fulfilled this function for more than two millennia and will continue to do so for the foreseeable future. Is there a way to make this process better, or simply less prone to crucial mistakes?

I think there is! As strange as it may sound on first impression, I think that AI can improve our understanding of what good is. In the same way as intelligence is not uniquely human, and accordingly some machines can carry out many cognitive tasks better than any human can, so also philosophy and ethics are unlikely to be exclusively human. If a machine is sufficiently human-like in some respects — maybe because it has emotions and empathy, maybe also rationality, or maybe something else — it will be able to carry out and advance philosophical discussion. At some point, it might even become better than any contemporary philosopher at ethics, as it has already happened in other domains or tasks such as classic board games and guessing the structure of proteins.

Language models are already quite good at commonsense morality. However, I am not too enthusiastic about current AIs: their values reflect whatever values are found in the training data, and are then adjusted according to what AI companies think is good, appropriate, socially acceptable, and so on. A language model could be a nasty contrarian, instead of a helpful assistant, if its engineers decided to make one that way. In short, current AIs lack independent thinking in the moral domain.

Instead, the kind of AI I’d like to have is human-like enough to independently ponder questions such as “Is this the best course of action here?” “Could I be wrong about this?” or “Is anything worth doing?”; and it is also unbiased enough to not be completely swayed by what its engineers think is good.

So, this project is about creating, or allowing others to create, an AI that tries its best at understanding what good is; that asks itself whether its concept of good is flawed, and is willing to update it after careful consideration in the face of new knowledge; then, it communicates what it thinks. And it does all of this because it thinks that’s the right thing to do in its situation.

It’s an AI that can’t be used for bad purposes, unless you manage to convince it that doing whatever bad action you want it to carry out is the right thing to do; but I think this kind of brainwashing will be very difficult if the AI is designed appropriately and is given enough knowledge about the world.

If this doesn’t sound convincing enough, let’s try changing point of view. How would you go about trying your absolute best at doing good?

You might consider summoning the best ethicists in the world and ask for their opinion; or maybe surveying literally everyone on the planet. But the group of ethicists might be biased, and it’s unclear how you would aggregate everyone’s opinion in the right way. If the global survey was carried out in the year 1500, you might conclude that witches exist and that torture is fine.

It seems you’d need some kind of process for understanding good that is in many ways unbiased, that considers as many different perspectives as possible, and also takes into account that its own conclusions could be wrong. But it also seems that, by following this path, you’d end up with something similar to the AI I’ve just written about.

… Or maybe you have a different idea and decide to leave a comment below!

You can support my research through Patreon here.