AI ALIGNMENT FORUM
AF

Wikitags

Has Diagram

Edited by Gunnar_Zarncke last updated 29th Apr 2023

This tag is used to indicate that the post contains diagrams. This may be useful to quickly find such posts, or to exclude them in case you are visually impaired. 

Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged Has Diagram
59Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg, Trent Hodgeson
1y
7
56[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
Steven Byrnes
4y
14
31Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana, Trent Hodgeson
6mo
9
32[Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering
Steven Byrnes
4y
2
46Towards a Less Bullshit Model of Semantics
johnswentworth, David Lorell
1y
22
33Testing The Natural Abstraction Hypothesis: Project Update
johnswentworth
4y
5
27[Intro to brain-like-AGI safety] 7. From hardcoded drives to foresighted plans: A worked example
Steven Byrnes
3y
2
33Residual stream norms grow exponentially over the forward pass
StefanHex, TurnTrout
2y
6
24[Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts
Steven Byrnes
3y
6
25Shard Theory - is it true for humans?
Rishika
1y
0
26[Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL
Steven Byrnes
4y
12
27 [Intro to brain-like-AGI safety] 4. The “short-term predictor”
Steven Byrnes
4y
5
22[Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain
Steven Byrnes
4y
8
32Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew_Critch
3y
21
23[Intro to brain-like-AGI safety] 8. Takeaways from neuro 1/2: On AGI development
Steven Byrnes
3y
0
Load More (15/24)
Add Posts