Towards Evaluating AI Systems for Moral Status Using Self-Reports
TLDR: In a new paper, we explore whether we could train future LLMs to accurately answer questions about themselves. If this works, LLM self-reports may help us test them for morally relevant states like consciousness. We think it's possible to start preliminary experiments testing for moral status in language models...