AI ALIGNMENT FORUM
AF

Wikitags

Has Diagram

Edited by Gunnar Zarncke last updated 29th Apr 2023

This tag is used to indicate that the post contains diagrams. This may be useful to quickly find such posts, or to exclude them in case you are visually impaired. 

Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged Has Diagram
58Self-Other Overlap: A Neglected Approach to AI Alignment
Marc Carauleanu, Mike Vaiana, Judd Rosenblatt, Diogo de Lucena, Cameron Berg, AE Studio
1y
7
57[Intro to brain-like-AGI safety] 1. What's the problem & Why work on it now?
Steve Byrnes
3y
14
31Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu, Diogo de Lucena, Gunnar Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana, AE Studio
4mo
9
32[Intro to brain-like-AGI safety] 3. Two subsystems: Learning & Steering
Steve Byrnes
3y
2
46Towards a Less Bullshit Model of Semantics
johnswentworth, David Lorell
1y
22
33Testing The Natural Abstraction Hypothesis: Project Update
johnswentworth
4y
5
25[Intro to brain-like-AGI safety] 7. From hardcoded drives to foresighted plans: A worked example
Steve Byrnes
3y
0
33Residual stream norms grow exponentially over the forward pass
Stefan Heimersheim, Alex Turner
2y
6
24[Intro to brain-like-AGI safety] 13. Symbol grounding & human social instincts
Steve Byrnes
3y
6
25Shard Theory - is it true for humans?
Rishika Bose
1y
0
25[Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL
Steve Byrnes
3y
12
27 [Intro to brain-like-AGI safety] 4. The “short-term predictor”
Steve Byrnes
3y
5
22[Intro to brain-like-AGI safety] 2. “Learning from scratch” in the brain
Steve Byrnes
3y
8
32Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide
Andrew Critch
3y
21
23[Intro to brain-like-AGI safety] 8. Takeaways from neuro 1/2: On AGI development
Steve Byrnes
3y
0
Load More (15/24)
Add Posts