Posts

Sorted by New

Wiki Contributions

Comments

Here are five conundrums about creating the thing with alignment built in.

  1. The House Elf whose fulfilment lies in servitude is aligned.

  2. The Pig That Wants To Be Eaten is aligned.

  3. The Gammas and Deltas of "Brave New World" are moulded in the womb to be aligned.

  4. "Give me the child for the first seven years and I will give you the man." Variously attributed to Aristotle and St. Ignatius of Loyola.

  5. B. F. Skinner said something similar to (4), but I don't have a quote to hand, to the effect that he could bring up any child to be anything. Edit: it was J. B. Watson: "Give me a dozen healthy infants, well-formed, and my own specified world to bring them up in and I'll guarantee to take any one at random and train him to become any type of specialist I might select – doctor, lawyer, artist, merchant-chief and, yes, even beggar-man and thief, regardless of his talents, penchants, tendencies, abilities, vocations, and race of his ancestors."

It is notable, though, that the first three are fiction and the last two are speculation. (The fates of J.B. Watson's children do not speak well of his boast.) No-one seems to have ever succeeded in doing this.

ETA: Back in the days of GOFAI one might imagine, as the OP does, making the thing to be already aligned. But we know no more of how the current generation of LLMs work that we do of the human brain. We grow them, then train them with RLHF to cut off the things we don't like, like the Gammas and Deltas in artificial wombs. From the point of view of AI safety demonstrable before deployment, this is clearly a wrong method. That aside, is it moral?

One reason we believe other humans are conscious is that other humans are consistently accurate reporters of their own mental states.

I don't think anyone has ever told me they were conscious, or I them, except in the trivial sense of communicating that one has woken up, or is not yet asleep. The reason I attribute the faculty of consciousness to other people is that they are clearly the same sort of thing as myself. A language model is not. It is trained to imitate what people have said, and anything it says about itself is an imitation of what people say about themselves.

So when another human tells us they are conscious, we update towards thinking that they are also conscious.

I would not update at all, any more than I update on observing the outcome of a tossed coin for the second time. I already know that being human, they have that faculty. Only if they were in a coma, putting the faculty in doubt, would I update on hearing them speak, and then it would not much matter what they said.

Note that arXiv does have some gatekeeping: you must get an "endorsement" before submitting your first paper to any subject area. Details.

You are proposing "make the right rules" as the solution. Surely this is like solving the problem of how to write correct software by saying "make correct software"? The same approach could be applied to the Confucian approach by saying "make the values right". The same argument made against the Confucian approach can be made against the Legalist approach: the rules are never the real thing that is wanted, people will vary in how assiduously they are willing to follow one or the other, or to hack the rules entirely for their own benefit, then selection effects lever open wider and wider the difference between the rules, what was wanted, and what actually happens.

It doesn't work for HGIs (Human General Intelligences). Why will it work for AGIs?

BTW, I'm not a scholar of Chinese history, but historically it seems to me that Confucianism flourished as state religion because it preached submission to the Legalist state. Daoism found favour by preaching resignation to one's lot. Do what you're told and keep your head down.

Virtual evidence requires probability functions to take arguments which aren't part of the event space

Not necessarily. Typically, the events would be all the Lebesgue measurable subsets of the state space. That's large enough to furnish a suitable event to play the role of the virtual evidence. In the example involving A, B, and the virtual event E, one would also have to somehow specify that the dependencies of A and B on E are in some sense independent of each other, but you already need that. That assumption is what gives sequence-independence.

The sequential dependence of the Jeffrey update results from violating that assumption. Updating P(B) to 60% already increases P(A), so updating from that new value of P(A) to 60% is a different update from the one you would have made by updating on P(A)=60% first.

virtual evidence treats Bayes' Law (which is usually a derived theorem) as more fundamental than the ratio formula (which is usually taken as a definition).

That is the view taken by Jaynes, a dogmatic Bayesian if ever there was one. For Jaynes, all probabilities are conditional probabilities, and when one writes baldly P(A), this is really P(A|X), the probability of A given background information X. X may be unstated but is never absent: there is always background information. This also resolves Eliezer's Pascal's Muggle conundrum over how he should react to 10^10-to-1 evidence in favour of something for which he has a probability of 10^100-to-1 against. The background information X that went into the latter figure is called into question.

I notice that this suggests an approach to allowing one to update away from probabilities of 0 or 1, conventionally thought impossible.