SoerenMind

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

This post is a copy of the introduction of this paper on lie detection in LLMs. The Twitter Thread is here. Authors: Lorenzo Pacchiardi, Alex J. Chan, Sören Mindermann, Ilan Moscovitz, Alexa Y. Pan, Yarin Gal, Owain Evans, Jan Brauner [EDIT: Many people said they found the results very surprising....

Sep 28, 2023187

SoerenMind

SoerenMind

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

The Alignment Problem from a Deep Learning Perspective (major rewrite)

Wikipedia as an introduction to the alignment problem

SoerenMind's Shortform

SoerenMind

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

The Alignment Problem from a Deep Learning Perspective (major rewrite)

Wikipedia as an introduction to the alignment problem

SoerenMind's Shortform

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Wikipedia as an introduction to the alignment problem

The Alignment Problem from a Deep Learning Perspective (major rewrite)

SoerenMind's Shortform