When you exhaust all the language data from text, you can start extracting language from audio and video.
As far as I know the largest public repository of audio and video is YouTube. We can do a rough back-of-the-envelope computation for how much data is in there:
According to some 2019 article I found, in every minute 50 hours of video are uploaded to YouTube. If we assume this was the average for the last 15 years, that gets us 200 billion minutes of video.
An average conversation has 150 words per minute, according to a Google search. That gets us 30T wor
When you exhaust all the language data from text, you can start extracting language from audio and video.
As far as I know the largest public repository of audio and video is YouTube. We can do a rough back-of-the-envelope computation for how much data is in there:
- According to some 2019 article I found, in every minute 50 hours of video are uploaded to YouTube. If we assume this was the average for the last 15 years, that gets us 200 billion minutes of video.
- An average conversation has 150 words per minute, according to a Google search. That gets us 30T wor
... (read more)