Uncovering Latent Human Wellbeing in LLM Embeddings
by ChengCheng, Pedro Freire, Dan H, and Scott Emmons
tl;dr A one-dimensional PCA projection of OpenAI's text-embedding-ada-002 achieves 73.7% accuracy on the ETHICS Util test dataset. This is comparable with the 74.6% accuracy of BERT-large finetuned on the entire ETHICS Util training dataset. This demonstrates how language models are developing implicit representations of human utility even without direct preference...
Sep 14, 202332