AI ALIGNMENT FORUM
AF

Wikitag Dashboard

Wikitags in Need of Work

Newest Wikitag

Wikitag Voting Activity

Combined Wikitags Activity Feed

Wikitags in Need of Work

Reset Filter Collapse Wikitags

All Wikitags High Priority

Eliezer Yudkowsky is a research fellow of the Machine Intelligence Research Institute, which he co-founded in 2001. He is mainly concerned with the obstacles and importance of developing a Friendly AI, such as a reflective decision theory that would lay a foundation for describing fully recursive self modifying agents that retain stable preferences while rewriting their source code. He also co-founded LessWrong, writing the Sequences, long sequences of posts dealing with epistemology, AGI, metaethics, rationality and so on... (read more)

High Priority

Someone is well-calibrated if the things they predict with X% chance of happening in fact occur X% of the time. Importantly, calibration is not the same as accuracy. Calibration is about accurately assessing how good your predictions are, not making good predictions. Person A, whose predictions are marginally better than chance (60% of them come true when choosing from two options) and who is precisely 60% confident in their choices, is perfectly calibrated. In contrast, Person B, who is 99% confident in their predictions, and right 90% of the time, is more accurate than Person A, but less well-calibrated... (read more)

High Priority

AI Risk is analysis of the risks associated with building powerful AI systems... (read more)

High Priority

Rationality is the art of thinking in ways that result in accurate beliefs and good decisions. It is the primary topic of LessWrong.

Rationality is not only about avoiding the vices of self-deception and obfuscation (the failure to communicate clearly), but also about the virtue of curiosity, seeing the world more clearly than before, and achieving things previously unreachable to you. The study of rationality on LessWrong includes a theoretical understanding of ideal cognitive algorithms, as well as building a practice that uses these idealized algorithms to inform heuristics, habits, and techniques, to successfully reason and make decisions in the real world... (read more)

High Priority

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

That is, if there are things humans can do that are simultaneously pretty good at optimizing all the remaining plausible Vi ~~(hence~~(hence uniformative), and these are pretty good at optimizing the remaining plausible Ui ~~(hence~~(hence acceptable to the AI), there are probably things the AI can do which would be even better at simultaneously optimizing all remaining Ui.

~~Then~~Then, if we can predict that the AI would update to wanting to run the universe itself without human interference after the AI had seen all collectable evidence, a sufficiently advanced AI can also see that this update is predictable (by efficiency) and therefore behaves as if it had already updated (by Bayesianism). Efficiency is a sufficient condition but not a necessary one; high-human reasoning over the meta-level question also seems sufficient, and perhaps even infrahuman reasoning would suffice.

~~Therefore~~Therefore, we should expect a sufficiently intelligent AI, given a morally uncertain utility function ΔU that updates to ΔU|E≈T given all available evidence, to behave as corrigibly or incorrigibly as an AI given a constant utility function T. This is a problem from the viewpoint of anyone who thinks we do not currently know how to pick ΔU such that surely ΔU|E≈V, which makes corrigibility still necessary.

^{^︎}
Pretending for the sake of simplification that V has been idealized or rescued into a utility function.
^{^︎}
So that for purposes of the simplified scenario, we only need to consider what the AI does about the button, and not whether the AI tries to back itself up to elsewhere on the Internet. More generally, though, "avoiding effective shutdown" can include strategies like creating a hidden backup while the original hardware is in fact shut down, thus giving the appearance of a successful shutdown and avoiding further shutdown attempts.
^{^︎}
This idea comes with its own arguable problems — e.g. humans sometimes optimize bad things. Let us set those aside while considering only whether this approach solves the shutdown problem in particular.
^{^︎}
This issue was first observed in analyzing historical-fact shutdown as a possible alternative to utility indifference.

Han Kay	Active Inference (1)	1d
Han Kay	Systems Theory (1)	1d
Han Kay	Cybernetics (1)	1d
Han Kay	AI Safety (0)	1d