I've been trying to gather my thoughts for my next tiling theorem (agenda write-up here; first paper; second paper; recent project update). I have a lot of ideas for how to improve upon my work so far, and trying to narrow them down to an achievable next step has been...
Condensation: a theory of concepts is a model of concept-formation by Sam Eisenstat. Its goals and methods resemble John Wentworth's natural abstractions/natural latents research.[1] Both theories seek to provide a clear picture of how to posit latent variables, such that once someone has understood the theory, they'll say "yep, I...
It's been a while since I wrote about myopia! My previous posts about myopia were "a little crazy", because it's not this solid well-defined thing; it's a cluster of things which we're trying to form into a research program. This post will be "more crazy". The Good/Evil/Good Spectrum "Good" means...
Löb's Theorem: * If ⊢□x→x, then ⊢x. * Or, as one formula: □(□x→x)→□x Payor's Lemma: * If ⊢□(□x→x)→x, then ⊢x. * Or, as one formula: □(□(□x→x)→x)→□x. In the following discussion, I'll say "reality" to mean x, "belief" to mean □x, "reliability" to mean □x→x (ie, belief is reliable when belief...
This post owes credit to discussions with Caspar Oesterheld, Scott Garrabrant, Sahil, Daniel Kokotajlo, Martín Soto, Chi Nguyen, Lukas Finnveden, Vivek Hebbar, Mikhail Samin, and Diffractor. Inspired in particular by a discussion with cousin_it. Short version: The idea here is to combine the Nash bargaining solution with Wei Dai's UDT...
I will be discussing weak-to-strong generalization with Sahil on Monday, November 3rd, 2025, 11am Pacific Daylight Time. You can join the discussion with this link. Weak-to-strong generalization is an approach to alignment (and capabilities) which seeks to address the scarcity of human feedback by using a weak model to teach...
Sahil has been up to things. Unfortunately, I've seen people put effort into trying to understand and still bounce off. I recently talked to someone who tried to understand Sahil's project(s) several times and still failed. They asked me for my take, and they thought my explanation was far easier...