__RicG__

Physicist switching to AI alignment

Studying these man-made horrors so they are no longer beyond my comprehension

Posts

Sorted by New

Wiki Contributions

Comments

If your model has the extraordinary power to say what internal motivational structures SGD will entrain into scaled-up networks, then you ought to be able to say much weaker things that are impossible in two years, and you should have those predictions queued up and ready to go rather than falling into nervous silence after being asked.

 

Sorry, I might misunderstanding you (and hope I am), but... I think doomers literally say "Nobody knows what internal motivational structures SGD will entrain into scaled-up networks and thus we are all doomed". The problems is not having the science to confidently say how the AIs will turn out, and not that doomers have a secret method to know that next-token-prediction is evil.

If you meant that doomers are too confident answering the question "will SGD even make motivational structures?" their (and mine) answer still stems from ignorance: nobody knows, but it is plausible that SGD will make motivational structures in the neural networks because it can be useful in many tasks (to get low loss or whatever), and if you think you do know better you should show it experimentally and theoretically in excruciating detail.

 

I also don't see how it logically follows that "If your model has the extraordinary power to say what internal motivational structures SGD will entrain into scaled-up networks" => "then you ought to be able to say much weaker things that are impossible in two years" but it seems to be the core of the post. Even if anyone had the extraordinary model to predict what SGD exactly does (which we, as a species, should really strive for!!) it would still be a different question to predict what will or won't happen in the next two years.

If I reason about my field (physics) the same should hold for a sentence structured like "If your model has the extraordinary power to say how an array of neutral atoms cooled to a few nK will behave when a laser is shone upon them" (which is true) => "then you ought to be able to say much weaker things that are impossible in two years in the field of cold atom physics" (which is... not true). It's a non sequitur.