x

AI ALIGNMENT FORUM

AF

Pedro Freire — AI Alignment Forum

Pedro Freire

Pedro Freire

Message

49

1

18

6y

Pedro Freire

49

6y

Uncovering Latent Human Wellbeing in LLM Embeddings

by ChengCheng, Pedro Freire, Dan H, and Scott Emmons

tl;dr A one-dimensional PCA projection of OpenAI's text-embedding-ada-002 achieves 73.7% accuracy on the ETHICS Util test dataset. This is comparable with the 74.6% accuracy of BERT-large finetuned on the entire ETHICS Util training dataset. This demonstrates how language models are developing implicit representations of human utility even without direct preference...

Sep 14, 2023•32