Assume you are in the set of possible worlds where AI takeover happens by default. If you do nothing, then at some point in the 21st century the AI lab Magma develops a transformative AI system. Magma employees perform a number of safety checks, conclude the system is safe enough,...
TL;DR: Gradient descent won't select for inner search processes because they're not compute & memory efficient. Slightly longer TL;DR: A key argument for mesa-optimization is that as we search over programs, we will select for "search processes with simple objectives", because they are simpler or more compact than alternative less...
Yann LeCun recently posted A Path Towards Autonomous Machine Intelligence, a high-level description of the architecture he considers most promising to advance AI capabilities. This post summarizes the architecture and describes some implications for AI safety work if we accept the hypothesis that the first transformative AI will have this...
Thesis: The problem of fully updated deference is not a strong argument against the viability of the assistance games / utility uncertainty approach to AI (outer) alignment. Background: A proposed high-level approach to AI alignment is to have the AI maintain a probability distribution over possible human utility functions instead...