We're excited to share the first volume of Elements of Computational Philosophy, an interdisciplinary and collaborative project series focused on operationalizing fundamental philosophical notions in ways that are natively compatible with the current paradigm in AI.
The first volume paints a broad-strokes picture of operationalizing truth and truth-seeking. Beyond this high-level focus, its 100+ pages can be framed in several different ways, which is why we placed multiple topic-based summaries at the beginning of the document. The note to the reader and the table of contents should further help scope and navigate the document.
Have a pleasant read, and feel free to use this linkpost to comment on the document as you go. Questions, criticism, and suggestions are all welcome.
PS: There will soon be a presentation about the overarching project series as part of the alignment speaker series hosted by EleutherAI. Expect more information soon on the #announcements channel of their Discord server. In general, keep an eye on this space.
All I want for christmas is a "version for engineers." Here's how we constructed the reward, here's how we did the training, here's what happened over the course of training.
My current impression is that the algorithm for deciding who wins an argument is clever, if computationally expensive, but you don't have a clever way to turn this into a supervisory signal, instead relying on brute force (which you don't have much of). I didn't see where you show that you managed to actually make the LLMs better arguers.
Connection between winning an argument and finding the truth continues to seem plenty breakable both in humans and in AIs.