## AI ALIGNMENT FORUMAF

Eigil Fjeldgren Rischel

I maintain a reading list on Goodreads. I have a personal website with some blog posts, mostly technical stuff about math research. I am also on github, twitter, and mastodon.

Sorted by New

# Wiki Contributions

I mean, "is a large part of the state space" is basically what "high entropy" means!

For case 3, I think the right way to rule out this counterexample is the probabilistic criterion discussed by John - the vast majority of initial states for your computer don't include a zero-day exploit and a script to automatically deploy it. The only way to make this likely is to include you programming your computer in the picture, and of course you do have a world model (without which you could not have programmed your computer)

Ha, I was just about to write this post. To add something, I think you can justify the uniform measure on bounded intervals of reals (for illustration purposes, say ) by the following argument: "Measuring a real number " is obviously simply impossible if interpreted literally, containing an infinite amount of data. Instead this is supposed to be some sort of idealization of a situation where you can observe "as many bits as you want" of the binary expansion of the number (choosing another base gives the same measure). If you now apply the principle of indifference to each measured bit, you're left with Lebesgue measure.

It's not clear that there's a "right" way to apply this type of thinking to produce "the correct" prior on (or or any other non-compact space.

But then shouldn't there be a natural biextensional equivalence ? Suppose , and denote . Then the map is clear enough, it's simply the quotient map. But there's not a unique map - any section of the quotient map will do, and it doesn't seem we can make this choice naturally.

I think maybe the subcategory of just "agent-extensional" frames is reflective, and then the subcategory of "environment-extensional" frames is coreflective. And there's a canonical (i.e natural) zig-zag

Does the biextensional collapse satisfy a universal property? There doesn't seem to be an obvious map either or (in each case one of the arrows is going the wrong way), but maybe there's some other way to make it universal?