PabloAMC — AI Alignment Forum

Recent progress on the science of evaluations

Summary: This post presents new methodological innovations presented in the paper General Scales Unlock AI Evaluation with Explanatory and Predictive Power. The paper introduces a set of general (universal) cognitive abilities that allow us to predict and explain AI system behaviour out of distribution. I outline what I believe are...

Jun 23, 202514

PabloAMC's Shortform

Sep 2, 20232

Causal representation learning as a technique to prevent goal misgeneralization

Summary: This is a submission for the goal misgeneralization contest organized by AI Alignment Awards, as well as the third iteration of some slowly improving AI Safety research agenda that I aim to pursue at some point in the future. Thanks to Jaime Sevilla for the comments that helped sharpen...

Jan 4, 202321

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

Summary This is a distillation post intended to summarize the article How RL Agents Behave When Their Actions Are Modified? by Eric Langlois and Tom Everitt, published at AAAI-21. The article describes Modified Action MDPs, where the environment or another agent such as a human may override the action of...

May 20, 202222

Vulnerabilities in CDT and TI-unaware agents

The aim of this post is illustrating the need to take into account decision-making and incentive considerations when designing agents. This post is also a proof that these considerations are important in order to ensure the safety of agents. Also, we will postulate that there exist some agents that are...

Mar 10, 20205

Implications of Quantum Computing for Artificial Intelligence Alignment Research

[Crossposted from arXiv] ABSTRACT: We introduce a heuristic model of Quantum Computing and apply it to argue that a deep understanding of quantum computing is unlikely to be helpful to address current bottlenecks in Artificial Intelligence Alignment. Our argument relies on the claims that Quantum Computing leads to compute overhang...

Aug 22, 201924

Pablo Antonio Moreno Casares

Pablo Antonio Moreno Casares

Pablo Antonio Moreno Casares

Implications of Quantum Computing for Artificial Intelligence Alignment Research

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

Causal representation learning as a technique to prevent goal misgeneralization

Recent progress on the science of evaluations

Pablo Antonio Moreno Casares

Implications of Quantum Computing for Artificial Intelligence Alignment Research

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

Causal representation learning as a technique to prevent goal misgeneralization

Recent progress on the science of evaluations

Recent progress on the science of evaluations

PabloAMC's Shortform

Causal representation learning as a technique to prevent goal misgeneralization

How RL Agents Behave When Their Actions Are Modified? [Distillation post]

Vulnerabilities in CDT and TI-unaware agents

Implications of Quantum Computing for Artificial Intelligence Alignment Research