AI ALIGNMENT FORUM
AF

Wikitag Dashboard

Wikitags in Need of Work

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:.. (read more)

The Open Agency Architecture ("OAA") is an AI alignment proposal by (among others) @davidad and @Eric Drexler. .. (read more)

Singluar learning theory is a theory that applies algebraic geometry to statistical learning theory, developed by Sumio Watanabe. Reference textbooks are "the grey book", Algebraic Geometry and Statistical Learning Theory, and "the green book", Mathematical Theory of Bayesian Statistics.

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)

Open Threads are informal discussion areas, where users are welcome to post comments that didn't quite feel big enough to warrant a top-level post, nor fit in other posts... (read more)

A Black Marble is a technology that by default destroys the civilization that invents it. It's one type of Existential Risk. AGI may be such an invention, but isn't the only one... (read more)

AI Development Pause

AI Evaluations focus on experimentally assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based... (read more)

AI Risk Skepticism is the view that the potential risks posed by artificial intelligence (AI) are overstated or misunderstood, specifically regarding the direct, tangible dangers posed by the behavior of AI systems themselves. Skeptics of object-level AI risk argue that fears of highly autonomous, superintelligent AI leading to catastrophic outcomes are premature or unlikely.

Slowing Down AI refers to efforts and proposals aimed at reducing the pace of artificial intelligence advancement to allow more time for safety research and governance frameworks. These initiatives can include voluntary industry commitments, regulatory measures, or coordinated pauses in development of advanced AI systems.

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

That is, if there are things humans can do that are simultaneously pretty good at optimizing all the remaining plausible Vi ~~(hence~~(hence uniformative), and these are pretty good at optimizing the remaining plausible Ui ~~(hence~~(hence acceptable to the AI), there are probably things the AI can do which would be even better at simultaneously optimizing all remaining Ui.

~~Then~~Then, if we can predict that the AI would update to wanting to run the universe itself without human interference after the AI had seen all collectable evidence, a sufficiently advanced AI can also see that this update is predictable (by efficiency) and therefore behaves as if it had already updated (by Bayesianism). Efficiency is a sufficient condition but not a necessary one; high-human reasoning over the meta-level question also seems sufficient, and perhaps even infrahuman reasoning would suffice.

~~Therefore~~Therefore, we should expect a sufficiently intelligent AI, given a morally uncertain utility function ΔU that updates to ΔU|E≈T given all available evidence, to behave as corrigibly or incorrigibly as an AI given a constant utility function T. This is a problem from the viewpoint of anyone who thinks we do not currently know how to pick ΔU such that surely ΔU|E≈V, which makes corrigibility still necessary.

^{^︎}
Pretending for the sake of simplification that V has been idealized or rescued into a utility function.
^{^︎}
So that for purposes of the simplified scenario, we only need to consider what the AI does about the button, and not whether the AI tries to back itself up to elsewhere on the Internet. More generally, though, "avoiding effective shutdown" can include strategies like creating a hidden backup while the original hardware is in fact shut down, thus giving the appearance of a successful shutdown and avoiding further shutdown attempts.
^{^︎}
This idea comes with its own arguable problems — e.g. humans sometimes optimize bad things. Let us set those aside while considering only whether this approach solves the shutdown problem in particular.
^{^︎}
This issue was first observed in analyzing historical-fact shutdown as a possible alternative to utility indifference.

tbs	Self-replication (0)	2d
Aran Nayebi	⟨M,N,ε,δ⟩-agreement (1)	6d
Articus19	structural cost (0)	14d
Ruby	2024 Longform Reviews (0)	18d