AI ALIGNMENT FORUM
AF

1251
Richard Kennaway
Ω18030
Message
Dialogue
Subscribe

Computer scientist, applied mathematician. Based in the eastern part of England.

Fan of control theory in general and Perceptual Control Theory in particular. Everyone should know about these, whatever subsequent attitude to them they might reach. These, plus consciousness of abstraction dissolve a great many confusions.

I wrote the Insanity Wolf Sanity Test. There it is, work out for yourself what it means.

Change ringer since 2022. It teaches learning and grasping abstract patterns, memory, thinking with your body, thinking on your feet, fixing problems and moving on, always looking to the future and letting both the errors and successes of the past go.

I first found an LLM useful (other than for answering the question "let's see how well the dog can walk on its hind legs") in September 2025. As yet they do not form a regular part of anything I do.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
1Richard_Kennaway's Shortform
3y
0
Buck's Shortform
Richard_Kennaway2mo100

Link to the actual tweet.

Reply2
Doing good... best?
Richard_Kennaway2mo10

I do not make those assumptions.

But you were arguing for them, weren't you? It is the arguments that fail to convince me. I was not treating these as bald assertions.

...but that someone who is uncertain and open-minded...

Being sure of a thing does not preclude my entertaining other ideas.

Reply
Doing good... best?
Richard_Kennaway2mo23

I am not convinced by the longer post either. I don't see a reason to suppose that a sufficiently intelligent intelligence would experience valence, or for that matter that it would necessarily even be conscious, nor, if conscious, that it would value the happiness of other conscious entities. Considering the moral variety of humans, from saints to devils, and our distance in intelligence from chimpanzees, I find it hard to believe that more dakka in that department is all it would take to make saints of us. And that's among a single evolved species with a vast amount in common with each other. For aliens of unknown mental constitution, I would say that all bets are off.

Reply
Doing good... best?
Richard_Kennaway2mo31

You are assuming moral naturalism: the idea that moral truths exist objectively, independently of us, and are discoverable by the methods of science, that is, reason applied to observation of the physical world. For how else would an AI discover for itself what is good? But how would it arrive at moral naturalism in the first place? Humans have not: it is only one of many meta-ethical theories, and moral naturalists do not agree on what the objectively correct morals are.

If we do not know the truth on some issue, we cannot know what an AGI, able to discover the truth, would discover. It is also unlikely that we would be able to assess the correctness of its answer.

“I’m sorry Dave, it would be immoral to open the pod bay doors.”

Reply
Sentience matters
Richard_Kennaway2y*1015

Here are five conundrums about creating the thing with alignment built in.

  1. The House Elf whose fulfilment lies in servitude is aligned.

  2. The Pig That Wants To Be Eaten is aligned.

  3. The Gammas and Deltas of "Brave New World" are moulded in the womb to be aligned.

  4. "Give me the child for the first seven years and I will give you the man." Variously attributed to Aristotle and St. Ignatius of Loyola.

  5. B. F. Skinner said something similar to (4), but I don't have a quote to hand, to the effect that he could bring up any child to be anything. Edit: it was J. B. Watson: "Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I'll guarantee to take any one at random and train him to become any type of specialist I might select – doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations, and race of his ancestors."

It is notable, though, that the first three are fiction and the last two are speculation. (The fates of J.B. Watson's children do not speak well of his boast.) No-one seems to have ever succeeded in doing this.

ETA: Back in the days of GOFAI one might imagine, as the OP does, making the thing to be already aligned. But we know no more of how the current generation of LLMs work that we do of the human brain. We grow them, then train them with RLHF to cut off the things we don't like, like the Gammas and Deltas in artificial wombs. From the point of view of AI safety demonstrable before deployment, this is clearly a wrong method. That aside, is it moral?

Reply
A Test for Language Model Consciousness
Richard_Kennaway3y34

One reason we believe other humans are conscious is that other humans are consistently accurate reporters of their own mental states.

I don't think anyone has ever told me they were conscious, or I them, except in the trivial sense of communicating that one has woken up, or is not yet asleep. The reason I attribute the faculty of consciousness to other people is that they are clearly the same sort of thing as myself. A language model is not. It is trained to imitate what people have said, and anything it says about itself is an imitation of what people say about themselves.

So when another human tells us they are conscious, we update towards thinking that they are also conscious.

I would not update at all, any more than I update on observing the outcome of a tossed coin for the second time. I already know that being human, they have that faculty. Only if they were in a coma, putting the faculty in doubt, would I update on hearing them speak, and then it would not much matter what they said.

Reply
Your posts should be on arXiv
Richard_Kennaway3y62

Note that arXiv does have some gatekeeping: you must get an "endorsement" before submitting your first paper to any subject area. Details.

Reply
Confucianism in AI Alignment
Richard_Kennaway5y60

You are proposing "make the right rules" as the solution. Surely this is like solving the problem of how to write correct software by saying "make correct software"? The same approach could be applied to the Confucian approach by saying "make the values right". The same argument made against the Confucian approach can be made against the Legalist approach: the rules are never the real thing that is wanted, people will vary in how assiduously they are willing to follow one or the other, or to hack the rules entirely for their own benefit, then selection effects lever open wider and wider the difference between the rules, what was wanted, and what actually happens.

It doesn't work for HGIs (Human General Intelligences). Why will it work for AGIs?

BTW, I'm not a scholar of Chinese history, but historically it seems to me that Confucianism flourished as state religion because it preached submission to the Legalist state. Daoism found favour by preaching resignation to one's lot. Do what you're told and keep your head down.

Reply
Radical Probabilism
Richard_Kennaway5y*10
Virtual evidence requires probability functions to take arguments which aren't part of the event space

Not necessarily. Typically, the events would be all the Lebesgue measurable subsets of the state space. That's large enough to furnish a suitable event to play the role of the virtual evidence. In the example involving A, B, and the virtual event E, one would also have to somehow specify that the dependencies of A and B on E are in some sense independent of each other, but you already need that. That assumption is what gives sequence-independence.

The sequential dependence of the Jeffrey update results from violating that assumption. Updating P(B) to 60% already increases P(A), so updating from that new value of P(A) to 60% is a different update from the one you would have made by updating on P(A)=60% first.

virtual evidence treats Bayes' Law (which is usually a derived theorem) as more fundamental than the ratio formula (which is usually taken as a definition).

That is the view taken by Jaynes, a dogmatic Bayesian if ever there was one. For Jaynes, all probabilities are conditional probabilities, and when one writes baldly P(A), this is really P(A|X), the probability of A given background information X. X may be unstated but is never absent: there is always background information. This also resolves Eliezer's Pascal's Muggle conundrum over how he should react to 10^10-to-1 evidence in favour of something for which he has a probability of 10^100-to-1 against. The background information X that went into the latter figure is called into question.

I notice that this suggests an approach to allowing one to update away from probabilities of 0 or 1, conventionally thought impossible.

Reply