Viliam - AI Alignment Forum

"Publish or Perish" (a quick note on why you should try to make your work legible to existing academic communities)

ah, I see! It's an incentive problem! So I guess your funding needs to be conditional on you producing legible outputs.

This rubs me the wrong way. Of course, you can make anyone do X, if you make their funding conditional on X. But whether you should do that, that depends on how sure you are that X is more valuable than whatever is the alternative.

There are already thousands of people out there whose funding is conditional on them producing legible outputs. Why is that not enough? What will change if we increase that number by a dozen?

AI Safety Camp, Virtual Edition 2023

Viliam2y10

I wonder if it would make sense to make this half-open, in the sense that you would publish on LW links to the study materials, and maybe also some of the results. So that people who didn't participate have a better idea.

TurnTrout's shortform feed

Viliam3y00

Makes sense, with the proviso that this is sometimes true only statistically. Like, the AI may choose to write an output which has a 70% chance to hurt you and a 30% chance to (equally) help you, if that is its best option.

If you assume that the AI is smarter than you, and has a good model of you, you should not read the output. But if you accidentally read it, and luckily you react in the right (for you) way, that is a possible result, too. You just cannot and should not rely on being so lucky.

AI ALIGNMENT FORUM
AF

Posts

Wiki Contributions

Comments