Tournesol, YouTube and AI Risk

adamShimi

Introduction

Tournesol is a research project and an app aiming at building a large and varied database of preference judgements by experts on YouTube videos, in order to align YouTube’s recommendation algorithm towards videos according to different criteria, like scientific accuracy and entertainment value.

The researchers involved launched the website for participating last month, and hope to ratchet a lot of contributions by the end of the year, so that they have a usable and useful database of comparison between YouTube videos. For more details on the functioning of Tournesol, I recommend the video on the front page of the project, the white paper and this talk by one of the main researchers.

What I want to explore in this post is the relevance of Tournesol and the research around it to AI Alignment. Lê Nguyên Hoang, the main research on Tournesol, believes that it is very relevant. And whether or not he is right, I think the questions he raises should be discussed here in more detail.

This post focuses on AI Alignment, but there are also a lot of benefits to get from Tournesol on the more general problem of recommender systems and social media. To see how Tournesol should help solve these problems, see the white paper.

Thanks to Lê Nguyên Hoang and Jérémy Perret for feedback on this post.

AI Risk or Not AI Risk

There are two main ways to argue about Tournesol’s usefulness and importance for AI Alignment, depending on a central question: is YouTube’s algorithm a likely candidate for a short timeline AGI or not? So let’s start with it.

YouTube and Predict-O-Matic

Lê believes that YouTube’s algorithm has a high probability of reaching AGI level in the near future -- something like the next ten years. While I’ve been updating to shorter timelines after seeing the GPT models and talking with Daniel Kokotajlo, I was initially rather dismissive of the idea that YouTube’s algorithm could become an AGI, and a dangerous one at that.

Now I’m less sure of how ridiculous it is. I’m still not putting as much probability as Lê does, but our discussion was one of the reasons I wanted to write such a post and have a public exchange about it.

So, in what way could YouTube’s algorithm reach an AGI level?

(Economic pressure) Recommending videos that are seen more and more is very profitable for YouTube (and its parent company Google). So there is an economic incentive to push the underlying model to be as good as possible at this task.
(Training Dataset) YouTube’s algorithm has access to all the content on YouTube. Which is an enormous quantity of data. Every minute, 500 hours of videos are uploaded to YouTube. And we all know that pretty much every human behavior can be found on YouTube.
(Available funding and researchers) YouTube, through its parent company Google, has access to huge ressources. So if reaching AGI depends only on building and running bigger models, the team working on YouTube’s recommender algorithm can definitely do it. See for example the recent trillion parameter language model of Google.

Hence if it’s feasible for YouTube’s algorithm to reach AGI level, there’s a risk it will do.

Then what? After all, YouTube is hardly a question-answerer for the most powerful and important people in the world. That was also my first reaction. But after thinking a bit more, I think YouTube’s recommendation algorithm might have similar issues as a Predict-O-Matic. Such a model is an oracle/question-answerer, which will probably develop incentives for self-fulfilling prophecies and simplifying the system it’s trying to predict. Similarly, the objective of YouTube’s algorithm is on maximizing the time spent on videos, which could create the same kind of incentives.

One example of such behavior happening right now is the push towards more and more polarized political content, which in turns push people to look for such content, and thus is a self-fulfilling prophecy. It’s also relatively easy to adapt examples from Abram’s post with the current YouTube infrastructure: pushing towards more accurate financial recommendations by giving to a lot of people a video about how one stock is going to tank, making people sell it and thus tanking the stock.

I think the most important difference with the kind of Predict-O-Matic I usually have in mind is that a YouTube recommendation is a relatively weak output, that will probably not be taken at face value by many people with strong decision power. But this is compensated by the sheer reach of YouTube: There are 1-billion hours of watch-time per day for 2 billion humans, 70% of which result from recommendations (those are YouTube’s numbers, so to take with a grain of salt). Nudging many people towards something can be as effective or even more effective than strongly influencing a small number of decision-makers.

Therefore, the possibility of YouTube’s algorithm reaching AGI level and causing Predict-O-Matic type issues appear strong enough to at least entertain and discuss.

(Lê himself has a wiki page devoted to that idea, which differs from my presentation here)

Assuming the above risk about YouTube’s algorithm, Tournesol is the most direct project to attempt the alignment of this AI. It has thus value both for avoiding a catastrophe with this specific AI, but also for dealing with practical cases of alignment.

Useful Even Without a YouTube AGI

Maybe you’re not convinced by the previous section. One reason I find Tournesol exciting is that even then, it has value for AI Alignment research.

The most obvious one is the construction of a curated dataset to do value learning, of a scale that is unheard of. There are a lot of things one could do with access to this dataset: define a benchmark for value learning techniques, apply microscope AI to find a model of expert recommendation of valuable content.

Such experimental data also seems crucial if we want to understand better what is the influence of data on different alignment schemes. Examining an actual massive dataset, with directly helpful data but necessarily errors and attacks, might help design realistic assumptions to use when studying specific ML algorithms and alignment schemes.

What Can You Do To Help?

All of that is conditioned on the success of Tournesol. So what can you do to help?

If you’re a programmer: they are looking for React and Django programmers to work on the website and an Android app. For Lê, this is the most important point to reach a lot of people.
If you’re a student/professor/researcher: you can sign in for Tournesol with your institutional email address, which means your judgement will be added to the database when using Tournesol.
If you’re a researcher in AI Alignment: you can discuss this proposal and everything around it in the comments, so that the community doesn’t ignore this potential opportunity. There are also many open theoretical problems in the white paper. If you’re really excited about this project and want to collaborate, you can contact Lê by mail at len.hoang.lnh@gmail.com

Conclusion

Tournesol aims at building a database of preference comparison between YouTube content, primarily in order to align YouTube’s algorithm. Even if there is no risk from the latter, such a database would have massive positive consequences for AI Alignment research.

I’m not saying that every researcher here should drop what they’re doing and go work on or around Tournesol. But the project seems sufficiently relevant to at least know and acknowledge, if nothing else. And if you can give a hand, or even criticize the proposal and discuss potential use of the database, I think you’ll be doing AI Alignment a service.

I suspect the best way to think about the polarizing political content thing which is going on right now is something like: The algorithm knows that if it recommends some polarizing political stuff, there's some chance you will head down a rabbit hole and watch a bunch more vids. So in terms of maximizing your expected watch time, recommending polarizing political stuff is a good bet. "Jumping out of the system" and noticing that recommending polarizing videos also polarizes society as a whole and gets them to spend more time on Youtube on a macro level seems to require a different sort of reasoning.

For the stock thing, I think it depends on how the system is scored. When training a supervised machine learning model, we score potential models based on how well they predict past data -- data the model itself has no way to affect (except if something really weird is going on?) There doesn't seem to be much incentive to select a model that makes self-fulfilling prophecies. A model which ignores the impact of its "prophecies" will score better, insofar as the prophecy would've affected the outcome.

I'm not necessarily saying there isn't a concern here, I just think step 1 is to characterize the problem precisely.

I suspect the best way to think about the polarizing political content thing which is going on right now is something like: The algorithm knows that if it recommends some polarizing political stuff, there's some chance you will head down a rabbit hole and watch a bunch more vids. So in terms of maximizing your expected watch time, recommending polarizing political stuff is a good bet. "Jumping out of the system" and noticing that recommending polarizing videos also tends to polarize society as a whole and gets them to spend more time on Youtube on a macro level seems to require a different sort of reasoning.

I agree, but I think you can have problems (and even Predict-O-Matic like problems) without reaching that different sort of reasoning. Like, maybe depending on the viewer history, the best video to polarize the person is different, and the algorithm could learn that. If you follow that line of reasoning, the system starts to make better and better models of human behavior and how to influence them, without having to "jump out of the system" as you say.

One could also argue that because YouTube videos contain so much info about the real world, a powerful enough algorithm using them can probably develop a pretty good model of the world. And there's a lot of content on YouTube about YouTube, so it could become "self-aware" in the sense of understanding the system in which it is embedded.

For the stock thing, I think it depends on how the system is scored. When training a supervised machine learning model, we score potential models based on how well they predict past data--data the model itself has no way to affect (except if something really weird is going on?) There doesn't seem to be much incentive to select a model that makes self-fulfilling prophecies. A model which ignores the impact of its "prophecies" will score better, insofar as the prophecy would've affected the outcome.

Agreed, this is more the kind of problem that emerges from RL like training. The page on the Tournesol wiki about this subject points to this recent paper that propose a recommendation algorithm tried in practice on YouTube. AFAIK we don't have access to the actual algorithm used by YouTube, so it's hard to say whether it's using RL; but the paper above looks like evidence that it eventually will be.

Like, maybe depending on the viewer history, the best video to polarize the person is different, and the algorithm could learn that. If you follow that line of reasoning, the system starts to make better and better models of human behavior and how to influence them, without having to "jump out of the system" as you say.

Makes sense.

...there's a lot of content on YouTube about YouTube, so it could become "self-aware" in the sense of understanding the system in which it is embedded.

I think it might be useful to distinguish between being aware of oneself in a literal sense, and the term "self-aware" as it is used colloquially / the connotations the term sneaks in.

Some animals, if put in front of a mirror, will understand that there is some kind of moving animalish thing in front of them. The ones that pass the mirror test are the ones that realize that moving animalish thing is them.

There is a lot of content on YouTube about YouTube, so the system will likely become aware of itself in a literal sense. That's not the same as our colloquial notion of "self-awareness".

IMO, it'd be useful to understand the circumstances under which the first one leads to the second one.

My guess is that it works something like this. In order to survive and reproduce, evolution has endowed most animals with an inborn sense of self, to achieve self-preservation. (This sense of self isn't necessary for cognition--if you trip on psychedelics and experience ego death, your brain can still think. Occasionally people will hurt themselves in this state since their self-preservation instincts aren't functioning as normal.)

Colloquial "self-awareness" occurs when an animal looking in the mirror realizes that the thing in the mirror and its inborn sense of self are actually the same thing. Similar to Benjamin Franklin realizing that lightning and electricity are actually the same thing.

If this story is correct, we need not worry much about the average ML system developing "self-awareness" in the colloquial sense, since we aren't planning to endow it with an inborn sense of self.

That doesn't necessarily mean I think Predict-O-Matic is totally safe. See this post I wrote for instance.

This is exciting! Re: Youtube algo eventually becoming dangerously powerful AI: One major reason for skepticism is that powerful AI will probably involve several orders of magnitude bigger brains than GPT-3, and GPT-3 is already expensive enough to run that you have to pay to use it. If GPT-3 is like 6 cents per 1,000 token or so, then plausibly even on rather short timelines the Youtube Algo will need to be 6 cents per token. Meanwhile youtubers make 1-3 cents per ad view according to a quick google, which suggests that even at this level the algo would be costing more than it makes, probably. I guess the price of compute will go down in the near future, and powerful AI could turn out to require fewer parameters than the human brain, so this scenario isn't completely implausible...

EDIT: I think a simililarly-large reason to bet against Youtube algo reaching AGI level is that youtube isn't trying to make their algo an AGI. It might happen eventually, but long before it does, some other company that also has loads of money and that is actually trying to get AGI will have beaten them to it.

Thanks for the feedback.

Your argument as I understand it is: the economic incentive to make the model bigger might disappear if the cost of computing the recommendation outweighs the gain of having "better" recommendations.

I think this is definitely relevant, but I don't feel like I have enough information to decide if the argument holds or not. Notably, it goes back to the parameter that we discussed in a call: whether increasing the model size/compute/dataset size improves the performance for the real world task until AGI is reached, or whether there's some interval in which the performance for the real world tasks stays the same. If we're more in the case of the former, then the economic incentive probably still exists; if we're in the latter case, then I can see a point where making the model bigger just looks like throwing money out the window.

Meanwhile youtubers make 1-3 cents per ad view according to a quick google, which suggests that even at this level the algo would be costing more than it makes, probably.

We care about what YouTube makes, not what youtubers make, right?

EDIT: I think a simililarly-large reason to bet against Youtube algo reaching AGI level is that youtube isn't trying to make their algo an AGI. It might happen eventually, but long before it does, some other company that also has loads of money and that is actually trying to get AGI will have beaten them to it.

I agree with that. And similarly, YouTube has less incentive to do private research on AGI if it's not trying to reach it, as opposed to a company pushing towards AGI.

Yes, we care about what YouTube makes, not what youtubers make. My brief google didn't turn up anything about what YouTube makes but I assume it's not more than a few times greater than what youtubers make... but I might be wrong about that!

I agree we don't have enough information to decide if the argument holds or not. I think that even if bigger models are always qualitatively better, the issue is whether the monetary returns outweigh the increasing costs. I suspect they won't, at least in the case of the youtube algo. Here's my argument I guess, in more detail:

1. Suppose that currently the cost of compute for the algo is within an OOM of the revenue generated by it. (Seems plausible to me but I don't actually know)

2. Then to profitably scale up the algo by, say, 2 ooms, the money generated by the algo would have to go up by, like, 1.5 ooms.

3. But it's implausible that a 2-oom increase in size of algo would result in that much increase in revenue. Like, yeah, the ads will be better targeted, people will be spending more, etc. But 1.5 OOMs more? When I imagine a world where Youtube viewers spend 10x more money as a result of youtube ads, I imagine those ads being so incredibly appealing that people go to youtube just to see the ads because they are so relevant and interesting. And I feel like that's possible, but it's implausible that making the model 2 ooms bigger would yield that result.

... you know now that I write it out, I'm no longer so sure! GPT-3 was a lot better than GPT-2, and it was 2 OOMs bigger. Maybe youtube really could make 1.5 OOMs more revenue by making their model 2 OOMs bigger. And then maybe they could increase revenue even further by making it bigger still, etc. on up to AGI.

If the main source of revenue is people buying stuff after seeing an ad on YouTube, then I agree with your point in the middle of the comment, that it seems hardly possible for the revenue to go 1.5 OOMs more by only 2OOMs on model size. I bet that there would be a big discontinuity here, where you need massive investment to actually see any significant improvement.

On the other hand, if the main source of revenue is money payed for the number of views of ads, then I believe a better model could improve that relatively smoothly. In part because just giving people interesting stuff to see makes them look at more ads.

Isn't there a close connection between money payed for number of views of ads and people buying stuff after seeing an ad on YouTub? I thought that the situation is something like this: People see ads and buy stuff --> Data is collected on how much extra money the ad brought in --> youtube charges advertisers accordingly. The only way for youtube to charge advertisers significantly more is first for people to buy significantly more stuff as a result of seeing ads.

10%

11%

12%

13%

14%

15%

16%

17%

18%

19%

20%

21%

22%

23%

24%

25%

26%

27%

28%

29%

30%

31%

32%

33%

34%

35%

36%

37%

38%

39%

40%

41%

42%

43%

44%

45%

46%

47%

48%

49%

50%

51%

52%

53%

54%

55%

56%

57%

58%

59%

Orfeas (50%)

60%

61%

62%

63%

64%

65%

66%

67%

68%

69%

adamShimi (60%)

70%

71%

72%

73%

74%

75%

76%

77%

78%

79%

Daniel Kokotajlo (75%)

80%

81%

82%

83%

84%

85%

86%

87%

88%

89%

Adele Lopez (84%),gt22 (85%)

90%

91%

92%

93%

94%

95%

96%

97%

98%

99%

habryka (90%),TurnTrout (90%),ejacob (95%)

Conditioning on YouTube’s algorithm reaching AGI level, what probability do you put on it showing Predict-O-Matic type problems?

99%

Eliezer Yudkowsky (1%),TurnTrout (1%),Adele Lopez (1%),Daniel Kokotajlo (3%),habryka (4%),ejacob (5%),Alexei (5%)

10%

11%

12%

13%

14%

15%

16%

17%

18%

19%

adamShimi (10%)

20%

21%

22%

23%

24%

25%

26%

27%

28%

29%

Orfeas (20%),gt22 (25%)

30%

31%

32%

33%

34%

35%

36%

37%

38%

39%

40%

41%

42%

43%

44%

45%

46%

47%

48%

49%

50%

51%

52%

53%

54%

55%

56%

57%

58%

59%

60%

61%

62%

63%

64%

65%

66%

67%

68%

69%

70%

71%

72%

73%

74%

75%

76%

77%

78%

79%

80%

81%

82%

83%

84%

85%

86%

87%

88%

89%

90%

91%

92%

93%

94%

95%

96%

97%

98%

99%

What probability do you put on YouTube’s algorithm reaching AGI level?

99%

AI ALIGNMENT FORUM
AF