How does it work to optimize for realistic goals in physical environments of which you yourself are a part? E.g. humans and robots in the real world, and not humans and AIs playing video games in virtual worlds where the player not part of the environment. The authors claim we don't actually have a good theoretical understanding of this and explore four specific ways that we don't understand this process.
This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute.
MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."
Eliezer describes the similarity between understanding what a locally valid proof step is in mathematics, knowing there are bad arguments for true conclusions, and that for civilization to hold together, people need to apply rules impartially even if it feels like it costs them in a particular instance. He fears that our society is losing appreciation for these points.
Will AGI progress gradually or rapidly? I think the disagreement is mostly about what happens before we build powerful AGI.
I think weaker AI systems will already have radically transformed the world. This is strategically relevant because I'm imagining AGI strategies playing out in a world where everything is already going crazy, while other people are imagining AGI strategies playing out in a world that looks kind of like 2018 except that someone is about to get a decisive strategic advantage.
A coordination problem is when everyone is taking some action A, and we’d rather all be taking action B, but it’s bad if we don’t all move to B at the same time. Common knowledge is the name for the epistemic state we’re collectively in, when we know we can all start choosing action B - and trust everyone else to do the same.
Why do some societies exhibit more antisocial punishment than others? Martin explores both some literature on the subject, and his own experience living in a country where "punishment of cooperators" was fairly common.
The "tails coming apart" is a phenomenon where two variables can be highly correlated overall, but at extreme values they diverge. Scott Alexander explores how this applies to complex concepts like happiness and morality, where our intuitions work well for common situations but break down in extreme scenarios.
How do human beings produce knowledge? When we describe rational thought processes, we tend to think of them as essentially deterministic, deliberate, and algorithmic. After some self-examination, however, Alkjash came to think that his process is closer to babbling many random strings and later filtering by a heuristic.
In this post, Alkjash explores the concept of Babble and Prune as a model for thought generation. Babble refers to generating many possibilities with a weak heuristic, while Prune involves using a stronger heuristic to filter and select the best options. He discusses how this model relates to creativity, problem-solving, and various aspects of human cognition and culture.
Babble is our ability to generate ideas. Prune is our ability to filter those ideas. For many people, Prune is too strong, so they don't generate enough ideas. This post explores how to relax Prune to let more ideas through.
orthonormal reflects that different people experience different social fears. He guesses that the strongest fear for a person (an "alarm" in their head) is usually broken. So the people who are most selfless end up that way because uncalibrated fear they're being too selfish, the most loud are that because of the fear of not being heard, etc.
Social reality and culture work a lot like improv comedy. We often don't know "who we are" or what's going on socially, but everyone unconsciously tries to establish expectations of one another. Understanding this dynamic can give you more freedom to change your role in social interactions.
Prediction markets are a potential way to harness wisdom of crowds and incentivize truth-seeking. But they're tricky to set up correctly. Zvi Mowshowitz, who has extensive experience with prediction markets and sports betting, explains the key factors that make prediction markets succeed or fail.
Rohin Shah argues that many common arguments for AI risk (about the perils of powerful expected utility maximizers) are actually arguments about goal-directed behavior or explicit reward maximization, which are not actually implied by coherence arguments. An AI system could be an expected utility maximizer without being goal-directed or an explicit reward maximizer.
Scott reviews a paper by Bloom, Jones, Reenen & Webb which argues that scientific progress is slowing down, as measured by outputs per researcher. Scott argues that this is actually the expected result - constant progress in response to exponentially increasing inputs should be our null hypothesis, based on historical trends.
You want your proposal for an AI to be robust to changes in its level of capabilities. It should be robust to the AI's capabilities scaling up, and also scaling down, and also the subcomponents of the AI scaling relative to each other.
We might need to build AGIs that aren't robust to scale, but if so we should at least realize that we are doing that.
Democratic processes are important loci of power. It's useful to understand the dynamics of the voting methods used real-world elections. My own ideas of ethics and of fun theory are deeply informed by my decades of interest in voting theory
Eliezer explores a dichotomy between "thinking in toolboxes" and "thinking in laws".
Toolbox thinkers are oriented around a "big bag of tools that you adapt to your circumstances." Law thinkers are oriented around universal laws, which might or might not be useful tools, but which help us model the world and scope out problem-spaces. There seems to be confusion when toolbox and law thinkers talk to each other.
Often you can compare your own Fermi estimates with those of other people, and that’s sort of cool, but what’s way more interesting is when they share what variables and models they used to get to the estimate. This lets you actually update your model in a deeper way.
Alex Zhu spent quite awhile understanding Paul's Iterated Amplication and Distillation agenda. He's written an in-depth FAQ, covering key concepts like amplification, distillation, corrigibility, and how the approach aims to create safe and capable AI assistants.
A hand-drawn presentation on the idea of an 'Untrollable Mathematician' - a mathematical agent that can't be manipulated into believing false things.
Scott Alexander reviews and expands on Paul Graham's "hierarchy of disagreement" to create a broader and more detailed taxonomy of argument types, from the most productive to the least. He discusses the difficulty and importance of avoiding lower levels of argument, and the value of seeking "high-level generators of disagreement" even when they don't lead to agreement.
A collection of examples of AI systems "gaming" their specifications - finding ways to achieve their stated objectives that don't actually solve the intended problem. These illustrate the challenge of properly specifying goals for AI systems.
There are problems with the obvious-seeming "wizard's code of honesty" aka "never say things that are false". Sometimes, even exceptionally honest people lie (such as when hiding fugitives from an unjust regime). If "never lie" is unworkable as an absolute rule, what code of conduct should highly honest people aspire to?
Frustrated by claims that "enlightenment" and similar meditative/introspective practices can't be explained and that you only understand if you experience them, Kaj set out to write his own detailed gears-level, non-mysterious, non-"woo" explanation of how meditation, etc., work in the same way you might explain the operation of an internal combustion engine.
Some people claim that aesthetics don't mean anything, and are resistant to the idea that they could. After all, aesthetic preferences are very individual.
Sarah argues that the skeptics have a point, but they're too epistemically conservative. Colors don't have intrinsic meanings, but they do have shared connotations within a culture. There's obviously some signal being carried through aesthetic choices.
A book review examining Elinor Ostrom's "Governance of the Commons", in light of Eliezer Yudkowsky's "Inadequate Equilibria." Are successful local institutions for governing common pool resources possible without government intervention? Under what circumstances can such institutions emerge spontaneously to solve coordination problems?
You can learn to spot when something is hijacking your motivation, and take back control of your goals.
You've probably heard about the "tit-for-tat" strategy in the iterated prisoner's dilemma. But have you heard of the Pavlov strategy? The simple strategy performs surprisingly well in certain conditions. Why don't we talk about Pavlov strategy as much as Tit-for-Tat strategy?
By default, humans are a kludgy bundle of impulses. But we have the ability to reflect upon our decision making, and the implications thereof, and derive better overall policies. You might want to become a more robust, coherent agent – in particular if you're operating in an unfamiliar domain, where common wisdom can't guide you.
Here’s a pattern I’d like to be able to talk about. It might be known under a certain name somewhere, but if it is, I don’t know it. I call it a Spaghetti Tower. It shows up in large complex systems that are built haphazardly.
What if our universe's resources are just a drop in the bucket compared to what's out there? We might be able to influence or escape to much larger universes that are simulating us or can otherwise be controlled by us. This could be a source of vastly more potential value than just using the resources in our own universe.
People who helped Jews during WWII are intriguing. They appear to be some kind of moral supermen. They had almost nothing to gain and everything to lose. How did they differ from the general population? Can we do anything to get more of such people today?
Can the smallest boolean circuit that solves a problem be a "daemon" (a consequentialist system with its own goals)? Paul Christiano suspects not, but isn't sure. He thinks this question, while not necessarily directly important, may yield useful insights for AI alignment.
A tradition of knowledge is a body of knowledge that has been consecutively and successfully worked on by multiple generations of scholars or practitioners. This post explores the difference between living traditions (with all the necessary pieces to preserve and build knowledge), and dead traditions (where crucial context has been lost).
It might be some elements of human intelligence (at least at the civilizational level) are culturally/memetically transmitted. All fine and good in theory. Except the social hypercompetition between people and intense selection pressure of ideas online might be eroding our world's intelligence. Eliezer wonders if he's only who he is because he grew up reading old science fiction from before the current era's memes.
What causes some people to develop extensive frameworks of ideas rather than remain primarily consumers of ideas? There is something incomplete about my model of people doing this vs not doing this. I expect more people to have more ideas than they do.
A question post, which received many thoughtful answers.
One of the biggest intuitive mysteries to me is how humanity took so long to do anything. Humans have been 'behaviorally modern' for about 50 thousand years. And apparently didn't invent, for instance, rope until 28 thousand years ago. Why did everything take so long?
Eliezer Yudkowsky offers detailed critiques of Paul Christiano's AI alignment proposal, arguing that it faces major technical challenges and may not work without already having an aligned superintelligence. Christiano acknowledges the difficulties but believes they are solvable.
In the 2012 LessWrong survey, it turned out LessWrongers were 22% more likely than expected to be a first-born child. Later, a MIRI researcher wondered off-handedly if great mathematicians (who plausibly share some important features with LessWrongers), also exhibit this same trend towards being first born.
The short answer: Yes, they do, as near as I can tell, but not as strongly as LessWrongers.
Analyzing Nobel Laureates in Physics, there's a statistically significant birth order effect: they're 10 percentage points more likely to be firstborn than chance would predict. This effect is smaller than seen in the rationalist community (22 points) or historical mathematicians (16.7 points), but still interesting.