Has Diagram

Edited by Gunnar_Zarncke last updated 29th Apr 2023

This tag is used to indicate that the post contains diagrams. This may be useful to quickly find such posts, or to exclude them in case you are visually impaired.

Posts tagged Has Diagram

1

61Self-Other Overlap: A Neglected Approach to AI Alignment

Marc Carauleanu, Mike Vaiana, Kvee, Diogo de Lucena, Cameron Berg, Trent Hodgeson

2y

9

1

32Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Kvee, Cameron Berg, Mike Vaiana, Trent Hodgeson

1y

9

1

46Towards a Less Bullshit Model of Semantics

johnswentworth, David Lorell

2y

22

1

33Testing The Natural Abstraction Hypothesis: Project Update

johnswentworth

5y

5

1

33Residual stream norms grow exponentially over the forward pass

StefanHex, TurnTrout

3y

6

1

25Shard Theory - is it true for humans?

r_b

2y

0

1

32Open technical problem: A Quinean proof of Löb's theorem, for an easier cartoon guide

Andrew_Critch

3y

21

1

24The Natural Abstraction Hypothesis: Implications and Evidence

CallumMcDougall

4y

3

1

12A newcomer’s guide to the technical AI safety field

zeshen

3y

1

13Levels of goals and alignment

zeshen

4y

2

1

14Embedding safety in ML development

zeshen

3y

0