Top postsTop post
Nora Belrose
702
Ω
72
4
115
(adapted from Nora's tweet thread here.) Consider a trained, fully functional language model. What are the chances you'd get that same model -- or something functionally indistinguishable -- by randomly guessing the weights? We crunched the numbers and here's the answer: We've developed a method for estimating the probability of...
Crossposted from the AI Optimists blog. AI doom scenarios often suppose that future AIs will engage in scheming— planning to escape, gain power, and pursue ulterior motives, while deceiving us into thinking they are aligned with our interests. The worry is that if a schemer escapes, it may seek world...