All of Viliam's Comments + Replies

I wonder if it would make sense to make this half-open, in the sense that you would publish on LW links to the study materials, and maybe also some of the results. So that people who didn't participate have a better idea.

2Linda Linsefors23d
There is no study material since this is not a course. If you are accepted to one of the project teams they you will work on that project.  You can read about the previous research outputs here: Research Outputs – AI Safety Camp [] The most famous research to come out of AISC is the coin-run experiment. (95) We Were Right! Real Inner Misalignment - YouTube [] [2105.14111] Goal Misgeneralization in Deep Reinforcement Learning ( [] But the projects are different each year, so the best way to get an idea for what it's like is just to read the project descriptions. 

Makes sense, with the proviso that this is sometimes true only statistically. Like, the AI may choose to write an output which has a 70% chance to hurt you and a 30% chance to (equally) help you, if that is its best option.

If you assume that the AI is smarter than you, and has a good model of you, you should not read the output. But if you accidentally read it, and luckily you react in the right (for you) way, that is a possible result, too. You just cannot and should not rely on being so lucky.