Vanessa Kosoy

The Learning-Theoretic Agenda: Status 2023

TLDR: I give an overview of (i) the key problems that the learning-theoretic AI alignment research agenda is trying to solve, (ii) the insights we have so far about these problems and (iii) the research directions I know for attacking these problems. I also describe "Physicalist Superimitation" (previously knows as "PreDCA"): a hypothesized alignment protocol based on infra-Bayesian physicalism that is designed to reliably learn and act on the user's values. I wish to thank Steven Byrnes, Abram Demski, Daniel Filan, Felix Harder, Alexander Gietelink Oldenziel, John S. Wentworth[1] and my spouse Marcus Ogren for reviewing a draft of this article, finding errors and making helpful comments and suggestions. Any remaining flaws are entirely my fault. Preamble I already described the learning-theoretic agenda (henceforth LTA) in 2018. While the overall spirit stayed similar, naturally LTA has evolved since then, with new results and research directions, and slightly different priorities. This calls for an updated exposition. There are several reasons why I decided that writing this article is especially important at this point: * Most of the written output of LTA focuses on the technical, mathy side. This leaves people confused about the motivation for those inquiries, and how they fit into the big picture. I find myself having to explain it again and again in private conversations, and it seems more efficient to have a written reference. * In particular, LTA is often conflated with infra-Bayesianism. While infra-Bayesianism is a notable part, it is not the only part, and without context it's not even clear why it is relevant to alignment. * Soares has recently argued that alignment researchers don't "stack", meaning that it doesn't help much to add more researchers to the same programme. I think that for LTA, this is not at all the case. On the contrary, there is a variety of shovel-ready problems that can be attacked in parallel. With this in mind, it's especi

144Apr 19, 2023

Vanessa Kosoy

Message

Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran's group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.

E-mail: {first name}@alter.org.il

10154

2269

1184

Lectures on statistical learning theory for alignment researchers

This is a recorded series of lectures and homework assignments discussions on the topic of statistical learning theory, co-organized by Gergely Szucs (@Yegreg) and @Alex Flint. The homework assignments are available here, and lecture notes are available here. The lectures are intended for those who want to do research on...

Oct 1, 202542

New Paper: Ambiguous Online Learning

Abstract: We propose a new variant of online learning that we call "ambiguous online learning". In this setting, the learner is allowed to produce multiple predicted labels. Such an "ambiguous prediction" is considered correct when at least one of the labels is correct, and none of the labels are "predictably...

Jun 25, 202530

New Paper: Infra-Bayesian Decision-Estimation Theory

Diffractor is the first author of this paper. Official title: "Regret Bounds for Robust Online Decision Making" > Abstract: We propose a framework which generalizes "decision making with structured observations" by allowing robust (i.e. multivalued) models. In this framework, each model associates each decision with a convex set of probability...

Apr 10, 202580

Video lectures on the learning-theoretic agenda

This is a YouTube playlist of recorded lectures on the learning-theoretic AI alignment agenda (LTA) I gave for my MATS scholars of the Winter 2024 cohort, edited by my beloved spouse @Marcus Ogren. H/t William Brewer for helping with the recording, and the rest of the MATS team for making...

Oct 27, 202475

Linear infra-Bayesian Bandits

Linked is my MSc thesis, where I do regret analysis for an infra-Bayesian[1] generalization of stochastic linear bandits. The main significance that I see in this work is: * Expanding our understanding of infra-Bayesian regret bounds, and solidifying our confidence that infra-Bayesianism is a viable approach. Previously, the most interesting...

May 10, 202440

AI Alignment Metastrategy

I call "alignment strategy" the high-level approach to solving the technical problem[1]. For example, value learning is one strategy, while delegating alignment research to AI is another. I call "alignment metastrategy" the high-level approach to converging on solving the technical problem in a manner which is timely and effective. (Examples...

Dec 31, 2023127

Critical review of Christiano's disagreements with Yudkowsky

This is a review of Paul Christiano's article "where I agree and disagree with Eliezer". Written for the LessWrong 2022 Review. In the existential AI safety community, there is an ongoing debate between positions situated differently on some axis which doesn't have a common agreed-upon name, but where Christiano and...

Dec 27, 2023176

Load More (7/63)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Vanessa Kosoy

Vanessa Kosoy

Vanessa Kosoy

The Learning-Theoretic Agenda: Status 2023

Infra-Bayesian physicalism: a formal theory of naturalized induction

Critical review of Christiano's disagreements with Yudkowsky

AI Alignment Metastrategy

Vanessa Kosoy

Lectures on statistical learning theory for alignment researchers

New Paper: Ambiguous Online Learning

New Paper: Infra-Bayesian Decision-Estimation Theory

Video lectures on the learning-theoretic agenda

Linear infra-Bayesian Bandits

AI Alignment Metastrategy

Critical review of Christiano's disagreements with Yudkowsky

The Learning-Theoretic Agenda: Status 2023

Infra-Bayesian physicalism: a formal theory of naturalized induction

Critical review of Christiano's disagreements with Yudkowsky

AI Alignment Metastrategy

Lectures on statistical learning theory for alignment researchers

New Paper: Ambiguous Online Learning

New Paper: Infra-Bayesian Decision-Estimation Theory

Video lectures on the learning-theoretic agenda

Linear infra-Bayesian Bandits

AI Alignment Metastrategy

Critical review of Christiano's disagreements with Yudkowsky