This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Wikitags
AXRP
Edited by
Multicore
,
DanielFilan
,
et al.
last updated
30th Dec 2024
AI X-
Risk
Research
Podcast
is a podcast hosted by Daniel Filan.
See also:
Audio
,
Interviews
Subscribe
1
Subscribe
1
Discussion
0
Discussion
0
Posts tagged
AXRP
Most Relevant
37
AXRP Episode 31 - Singular Learning Theory with Daniel Murfet
DanielFilan
1y
0
38
AXRP Episode 27 - AI Control with Buck Shlegeris and Ryan Greenblatt
DanielFilan
1y
6
19
AXRP Episode 24 - Superalignment with Jan Leike
DanielFilan
2y
3
30
AXRP Episode 22 - Shard Theory with Quintin Pope
DanielFilan
2y
4
25
AXRP Episode 19 - Mechanistic Interpretability with Neel Nanda
DanielFilan
3y
0
17
AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan
2y
0
25
AXRP Episode 39 - Evan Hubinger on Model Organisms of Misalignment
DanielFilan
9mo
0
20
AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan
1y
0
18
AXRP Episode 15 - Natural Abstractions with John Wentworth
DanielFilan
3y
0
20
AXRP Episode 38.2 - Jesse Hoogland on Singular Learning Theory
DanielFilan
9mo
0
15
AXRP Episode 45 - Samuel Albanie on DeepMind’s AGI Safety Approach
DanielFilan
2mo
0
17
AXRP Episode 41 - Lee Sharkey on Attribution-based Parameter Decomposition
DanielFilan
3mo
0
13
AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability
DanielFilan
5mo
0
15
AXRP Episode 14 - Infra-Bayesian Physicalism with Vanessa Kosoy
DanielFilan
3y
10
15
AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo
DanielFilan
3y
1