x

AI ALIGNMENT FORUM

AF

Jan — AI Alignment Forum

Jan Hendrik Kirchner

Top postsTop post

Jan Hendrik Kirchner

Message

phd student in comp neuroscience @ mpi brain research frankfurt. https://twitter.com/janhkirchner and https://universalprior.substack.com/

1105

Ω

156

35

80

5y

Jan Hendrik Kirchner

phd student in comp neuroscience @ mpi brain research frankfurt. https://twitter.com/janhkirchner and https://universalprior.substack.com/

Top postsTop post

A descriptive, not prescriptive, overview of current AI Alignment Research

TL;DR: In this project, we collected and cataloged AI alignment research literature and analyzed the resulting dataset in an unbiased way to identify major research directions. We found that the field is growing quickly, with several subfields emerging in parallel. We looked at the subfields and identified the prominent researchers, recurring topics, and different modes of communication in each. Furthermore, we found that a classifier trained on AI alignment research articles can detect relevant articles that we did not originally include in the dataset. (video presentation here) Dataset Announcement In the context of the 6th AISC, we collected a dataset of alignment research articles from a variety of different sources. This dataset is now available for download here and the code for reproducing the scrape is on GitHub here[1]. When using the dataset, please cite our manuscript as described in the footnote[2]. Table 1: Different sources of text included in the dataset alongside the number of articles per source. Color of row indicates that data was analyzed as AI alignment research articles (green) or baseline (gray), or that the articles were added to the dataset as a result of the analysis in Fig. 4 (purple). Definition of level-0 and level-1 articles in Fig. 4c. For details about our collection procedure see the Methods section. Here follows an abbreviated version of the full manuscript, which contains additional analysis and discussion. Rapid growth of AI Alignment research from 2012 to 2022 across two platforms After collecting the dataset, we analyzed the two largest non-redundant sources of articles, Alignment Forum (AF) and arXiv. We found rapid growth in publications on the AF (Fig. 1a) and a long-tailed distribution of articles per researcher (Fig. 1b) and researchers per article (Fig. 1c). We were surprised to find a decrease in publications on the arXiv in recent years, but identified the cause for the decrease as spurious and fixed the issue in t

Results from a survey on tool use and workflows in alignment research

[Simulators seminar sequence] #1 Background & shared assumptions

A survey of tool use and workflows in alignment research

Jan's Shortform

Feb 27, 2023•5

[Simulators seminar sequence] #2 Semiotic physics - revamped

Update February 21st: After the initial publication of this article (January 3rd) we received a lot of feedback and several people pointed out that propositions 1 and 2 were incorrect as stated. That was unfortunate as it distracted from the broader arguments in the article and I (Jan K) take...

Feb 27, 2023•22

[Simulators seminar sequence] #1 Background & shared assumptions

Meta: Over the past few months, we've held a seminar series on the Simulators theory by janus. As the theory is actively under development, the purpose of the series is to discover central structures and open problems. Our aim with this sequence is to share some of our discussions with...

Jan 2, 2023•50

Results from a survey on tool use and workflows in alignment research

by jacquesthibs, Jan, janus, and Logan Riggs

In March 22nd, 2022, we released a survey with an accompanying post for the purpose of getting more insight into what tools we could build to augment alignment researchers and accelerate alignment research. Since then, we’ve also released a dataset, a manuscript (LW post), and the (relevant) Simulators post was...

Dec 19, 2022•79

A descriptive, not prescriptive, overview of current AI Alignment Research

TL;DR: In this project, we collected and cataloged AI alignment research literature and analyzed the resulting dataset in an unbiased way to identify major research directions. We found that the field is growing quickly, with several subfields emerging in parallel. We looked at the subfields and identified the prominent researchers,...

Jun 6, 2022•139

Adversarial attacks and optimal control

Meta: After a fun little motivating section, this post goes deep into the mathematical weeds. This thing aims to explore the mathematical properties of adversarial attacks from first principles. Perhaps other people are not as confused about this point as I was, but hopefully, the arguments are still useful and/or...

May 22, 2022•17

Elementary Infra-Bayesianism

TL;DR: I got nerd-sniped into working through some rather technical work in AI Safety. Here's my best guess of what is going on. Imprecise probabilities for handling catastrophic downside risk. Short summary: I apply the updating equation from Infra-Bayesianism to a concrete example of an Infradistribution and illustrate the process....

May 8, 2022•41

Load More (7/13)