rhollerith_dot_com

Richard Hollerith, 15 miles north of San Francisco. hruvulum@gmail.com

Posts

Sorted by New

Wiki Contributions

Comments

G Gordon Worley III's Shortform

Can you explain where there is an error term in AlphaGo or where an error term might appear in hypothetical model similar to AlphaGo trained much longer with much more numerous parameters and computational resources?

G Gordon Worley III's Shortform

At least one person here disagrees with you on Goodharting. (I do.)

You've written before on this site if I recall correctly that Eliezer's 2004 CEV proposal is unworkable because of Goodharting. I am granting myself the luxury of not bothering to look up your previous statement because you can contradict me if my recollection is incorrect.

I believe that the CEV proposal is probably achievable by humans if those humans had enough time and enough resources (money, talent, protection from meddling) and that if it is not achievable, it is because of reasons other than Goodhart's law.

(Sadly, an unaligned superintelligence is much easier for humans living in 2022 to create than a CEV-aligned superintelligence is, so we are probably all going to die IMHO.)

Perhaps before discussing the CEV proposal we should discuss a simpler question, namely, whether you believe that Goodharting inevitably ruins the plans of any group setting out intentionally to create a superintelligent paperclip maximizer.

Another simple goal we might discuss is a superintelligence (SI) whose goal is to shove as much matter as possible into a black hole or an SI that "shuts itself off" within 3 months of its launching where "shuts itself off" means stops trying to survive or to affect reality in any way.