Wiki-Tags in Need of Work

Axioms (together with definitions) forms the basis of mathematical theorems. Every mathematical theorem is only proven inside its axiom system... (read more)

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:.. (read more)

The Open Agency Architecture ("OAA") is an AI alignment proposal by (among others) @davidad and @Eric Drexler.  .. (read more)

Singluar learning theory is a theory that applies algebraic geometry to statistical learning theory, developed by Sumio Watanabe. Reference textbooks are "the grey book", Algebraic Geometry and Statistical Learning Theory, and "the green book", Mathematical Theory of Bayesian Statistics.

Archetypal Transfer Learning (ATL) is a proposal by @whitehatStoic for what is argued by the author to be a fine tuning approach that "uses archetypal data" to "embed Synthetic Archetypes". These Synthetic Archetypes are derived from patterns that models assimilate from archetypal data, such as artificial stories. The method yielded a shutdown activation rate of 57.33% in the GPT-2-XL model after fine-tuning. .. (read more)

Open Threads are informal discussion areas, where users are welcome to post comments that didn't quite feel big enough to warrant a top-level post, nor fit in other posts... (read more)

A Black Marble is a technology that by default destroys the civilization that invents it. It's one type of Existential Risk. AGI may be such an invention, but isn't the only one... (read more)

AI Evaluations focus on experimentally assessing the capabilities, safety, and alignment of advanced AI systems. These evaluations can be divided into two main categories: behavioral and understanding-based... (read more)

Löb's Theorem is theorem proved by Martin Hugo Löb which states: .. (read more)

Tag Voting Activity

User Post Title Tag Pow When Vote

Recent Tag & Wiki Activity

Growth Mindset is the name for a hypothetical mental state wherein an agentnagent believes that they can grow as a agent, and this belief causes the agent to try to grow, which might plausibly lead to more growth!

Growth Mindset is the name for a hypothetical mental state wherein a personan agentn believes that they can grow as a person,agent, and this belief causes themthe agent to try to grow, which might plausibly lead to more growth!

Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique and an alignment technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.

Reinforcement Learning from Human Feedback (RLHF) is a machine learning technique and an alignment technique where the model's training signal uses human evaluations of the model's outputs, rather than labeled data or a ground truth reward signal.

How to celebrate [stub]

petrov.com has an organizer's guide for a small quiet ceremony, and Modes of Petrov Day contains additional ideas. Otherwise see the many posts tagged Petrov Day for things people have done in various times/places/groups.

LessWrong Website Celebrations

The LessWrong site has an annual tradition of commemorating the day involving users having the potential to take down the LessWrong frontpage for 24 hours. This has happened since 2019. Here are the relevant posts.

See also: also

Jailbreaking (AIs)

Growth Mindset is the name for a hypothetical mental state wherein a person believes that they can grow as a person, and this belief causes them to try to grow, which might plausibly lead to more growth!

This is a very natural and obvious hypothesis that could apply to agents in general, and could be defended as a sort of useful philosophic presupposition to often start with, in most domains, about most agents, and most positive traits, that would lead to improvement behaviors that might be quite cheap, and might be quite valuable to have tried!

The primary academic popularizer of the idea within academic psychology is Carol Dweck, who has been embroiled in some controversies.

Sometimes the idea is taken farther than is empirically justified when psychometric rigor is brought to bear on specific traits in specific situations where people might honesty (or dishonestly!) disagree about whether growth or change is possible.

In these debates, sometimes people with strong specific arguments about specific limits to specific growth in specific people are treated as "just not having growth mindset" and then their specific arguments are seemingly speciously dismissed.

Like most concepts, the concept of growth mindset can be abused by idiots.

Growth Mindset

Growth Mindset is the name for a hypothetical mental state wherein an agent believes that they can grow as a agent, and this belief causes the agent to try to grow, which might plausibly lead to more growth!

This is a very natural and obvious hypothesis that could apply to agents in general, and could be defended as a sort of useful philosophic presupposition to often start with, in most domains, about most agents, and most positive traits, that would lead to improvement behaviors that might be quite cheap, and might be quite valuable to have tried!

The primary academic popularizer of the idea within academic psychology is Carol Dweck, who has been embroiled in some controversies....

(Read More)

Theory of mind [Wikipedia] refers to the capacity to understand other people by ascribing mental states to them. A theory of mind includes the knowledge that others' beliefs, desires, intentions, emotions, and thoughts may be different from one's own

Theory of Mind

Theory of mind [Wikipedia] refers to the capacity to understand other people by ascribing mental states to them. A theory of mind includes the knowledge that others' beliefs, desires, intentions, emotions, and thoughts may be different from one's own

The term language model agent better captures the inherent agency of such a system, while "language model cognitive architecture" better captures the extensive additions to the LLM, and the resulting change in function and capabilities. Episodic memory may be one key cognitive system; vector-based systems in LMCAs to-date are quite limited, and humans with episodic memory problems are limited in the complexity and time-horizon of problems they can solve.

SB 1047

A p-zombie is as likely as anyone else to ask, "When"When I see red, do I see the same color that you see when you see red?"", but they have no real experience of the color red; the zombie'zombie's speech must be explained in some other terms which do not require them to have real experiences.

The zombie thought experiment is purported to show that consciousness cannot be reduced to merely physical things: our universe is purported to perhaps have special "bridging laws""bridging laws" which bring a mind into existence when there are atoms in a suitably brain-like configuration.

Physicalists typically deny the possibility of zombies: if a p-zombie is atom-by-atom identical to a human being in our universe, then our speech can be explained by the same mechanisms as the zombie'zombie's and yet it would seem awfully peculiar that our words and actions would have an entirely materialistic explanation, but also, furthermore, our universe happens to contain exactly the right bridging law such that our utterances about consciousness are true and our consciousness syncs up with what our merely physical bodies do. It'It's too much of a stretch: Occam'Occam's razor dictates that we favor a monistic universe with one uniform set of laws.


 


 

It has been argued that a careful (re)definition of AIXI's off-policy behavior may patch the anvil problem in practice.

AI Control in the context of AI Alignment is a category of plans that aim to ensure safety and benefit from AI systems, even if they are goal-directed and are actively trying to subvert your control measures. From The case for ensuring that powerful AIs are controlled:

In this post, we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they apply to their powerful models prevent unacceptably bad outcomes, even if the AIs are misaligned and intentionally try to subvert those safety measures. We think no fundamental research breakthroughs are required for labs to implement safety measures that meet our standard for AI control for early transformatively useful AIs; we think that meeting our standard would substantially reduce the risks posed by intentional subversion.

 and

There are two main lines of defense you could employ to prevent schemers from causing catastrophes.

  • Alignment: Ensure that your models aren't scheming.[2]
  • Control: Ensure that even if your models are scheming, you'll be safe, because they are not capable of subverting your safety measures.[3]