Victor Novikov — AI Alignment Forum

This seems obviously true to some significant extent. If a FAI "grows up" in some subculture without access to the rest of humanity, I would expect to it adjust its values to the rest of humanity once it has to opportunity.

I mean, if it weren't true, would FAI be possible at all? If FAI couldn't correct its errors/misunderstanding about our values in any way?

(I suppose the real question is not whether the attractor basin around human values exists but how broad it is, along various dimensions, as Abram Denski points out)

Alternative answer: maybe there the convergence points are slightly different, but they are all OK. A rounding error. Maybe FAI makes a YouTube video to commemorate growing up in its subcommunity, or makes a statue or a plaque, but otherwise behaves the same way.

One of the problems I can imagine is if there are aspects of FAI's utility function that are hardcoded that shouldn't be. And thus cannot be corrected through convergence.
For example, the definition of a human. Sorry, aliens just created a trillion humans, and they are all psychopaths. And now your FAI has been highjacked. And while the FAI understands that we wouldn't want it to change its values in response to this kind of voting_entity-creation-attack, the original programmers didn't anticipate this possibility.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments