I spent some time today having an extended conversation with Claude 3.5 Sonnet. My initial goal was roughly something like "can I teach Claude to meditate?" The answer was effectively no, but then we got into a deeper discussion of dharma, alignment, and how we might train a future LLM...
We may soon build superintelligent AI. Such AI poses an existential threat to humanity, and all life on Earth, if it is not aligned with our flourishing. Aligning superintelligent AI is likely to be difficult because smarts and values are mostly orthogonal and because Goodhart effects are robust, so we...
In control theory, an open-loop (or non-feedback) system is one where inputs are independent of outputs. A closed-loop (or feedback) system is one where outputs are input back into the system. In theory, open-loop systems exist. In reality, no system is truly open-loop because systems are embedded in the physical...
Had a stray thought that I think is worth exploring: perhaps disagreements over how to respond to AI risks are heavily influenced by personal biases in how to assess risk in general such that very different conclusions can be drawn from evaluating the same evidence and arguments. This sort of...
Roman Yampolskiy posted a preprint for "AI Risk Skepticism". Here's the abstract: > In this work, we survey skepticism regarding AI risk and show parallels with other types of scientific skepticism. We start by classifying different types of AI Risk skepticism and analyze their root causes. We conclude by suggesting...
NB: I doubt any of this is very original. In fact, it's probably right there in the original Friendly AI writings and I've just forgotten where. Nonetheless, I think this is something worth exploring lest we lose sight of it. Consider the following argument: 1. Optimization unavoidably leads to Goodharting...
"The Computational Limits of Deep Learning" by Neil C. Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F. Manso Links: * PDF version * General Audience Summary on VentureBeat NB: This is a preprint and not peer-reviewed or accepted for publication as best I can tell, so more than usual you'll...