Дмитрий Зеленский

Posts

Sorted by New

Wiki Contributions

Comments

The human side of interaction

This led me to think... why do we even believe that human values are good? Perhaps the typical human behaviour amplified by possibilities of a super-intelligence would actually destroy the universe. I don't personally find this very likely (that's why I never posted it before), but, given that almost all AI safety is built around "how to check that AI's values are convergent with human values" one way or another, perhaps something else should be approached - like remodeling history (actual, human history) from a given starting point (say, Roman Principatus or 1945) with actors assigned values different from human values (but in similar relationship to each other, if applicable) and finding what leads to better results (and, in particular, in us not being destroyed by 2020). All with the usual sandbox precautions, of course.

(Addendum: Of course, pace "fragility of value". We should have some inheritance from metamorals. But we don't actually know how well our morals (and systems in "reliable inheritance" from them) are compatible with our metamorals, especially in an extreme environment such as superintelligence.)