AI ALIGNMENT FORUM
AF

543
Wikitags

Academic Papers

Edited by Kaj_Sotala last updated 9th Jul 2020

Posts either linking to, or summarizing, formal papers published elsewhere.

Subscribe
Discussion
Subscribe
Discussion
Posts tagged Academic Papers
65Some AI research areas and their relevance to existential safety
Andrew_Critch
5y
34
19How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley
2y
0
512021 AI Alignment Literature Review and Charity Comparison
Larks
4y
15
58Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
1y
0
29A technical note on bilinear layers for interpretability
Lee Sharkey
2y
0
19Formal Solution to the Inner Alignment Problem
michaelcohen
5y
123
21Why is pseudo-alignment "worse" than other ways ML can fail to generalize?
Q
nostalgebraist, evhub
5y
Q
7
73Shallow review of technical AI safety, 2024
technicalities, Stag, Stephen McAleese, jordine, Dr. David Mathers
10mo
1
28NeurIPS ML Safety Workshop 2022
Dan H
3y
1
19Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt, srm, MikhailB, Lewis Hammond, chansmi, sofmonk
1y
1
25How truthful is GPT-3? A benchmark for language models
Owain_Evans
4y
18
14New paper: Corrigibility with Utility Preservation
Koen.Holtman
6y
9
18Learning preferences by looking at the world
Rohin Shah
7y
4
26Human-AI Collaboration
Rohin Shah
6y
4
19Learning biases and rewards simultaneously
Rohin Shah
6y
3
Load More (15/21)
Add Posts