AI ALIGNMENT FORUM
AF

Nathaniel Monson
Ω1000
Message
Dialogue
Subscribe

Mathematician turned alignment researcher. Probably happy to chat about math, current ML, or long-term AI thoughts.

The basics - Nathaniel Monson (nmonson1.github.io)

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
How to solve deception and still fail.
Nathaniel Monson2y4-2

Those doesn't necessarily seem correct to me. If, eg, OpenAI develops a super intelligent, non deceptive AI, then I'd expect some of the first questions they'd ask it to be of the form "are there questions which we would regret asking you, according to our own current values? How can we avoid asking you those while still getting lots of use and insight from you? What are some standard prefaces we should attach to questions to make sure following through on your answer is good for us? What are some security measures that we can take to make sure our users lives are generally improved by interacting with you? What are some security measures we can take to minimize the chances of a world turning out very badly according to our own desires?" Etc.

Reply
No wikitag contributions to display.
No posts to display.