Diffractor

Less Basic Inframeasure Theory

So, we've also got an analogue of KL-divergence for crisp infradistributions.

We'll be using and for crisp infradistributions, and and for probability distributions associated with them. will be used for the KL-divergence of infradistributions, and will be used for the KL-divergence of probability distributions. For crisp infradistributions, the KL-divergence is defined as

I'm not entirely sure why it's like this, but it has the basic properties you would expect of the KL-divergence, like concavity in both arguments and interacting well with continuous pushforwards and semidirect product.

Straight off the bat, we have:

**Proposition 1:**

Proof: KL-divergence between probability distributions is always nonnegative, by Gibb's inequality.

**Proposition 2:**

And now, because KL-divergence between probability distributions is 0 only when they're equal, we have:

**Proposition 3:** *If ** is the uniform distribution on **, then *

And the cross-entropy of any distribution with the uniform distribution is always , so:

**Proposition 4:** *is a concave function over* .

Proof: Let's use as our number in in order to talk about mixtures. Then,

Then we apply concavity of the KL-divergence for probability distributions to get:

**Proposition 5: **

At this point we can abbreviate the KL-divergence, and observe that we have a multiplication by 1, to get:

And then pack up the expectation

Then, with the choice of and fixed, we can move the choice of the all the way inside, to get:

Now, there's something else we can notice. When choosing , it doesn't matter what is selected, you want to take every and maximize the quantity inside the expectation, that consideration selects your . So, then we can get:

And pack up the KL-divergence to get:

And distribute the min to get:

And then, we can pull out that fixed quantity and get:

And pack up the KL-divergence to get:

**Proposition 6:**

To do this, we'll go through the proof of proposition 5 to the first place where we have an inequality. The last step before inequality was:

Now, for a direct product, it's like semidirect product but all the and are the same infradistribution, so we have:

Now, this is a constant, so we can pull it out of the expectation to get:

**Proposition 7:**

For this, we'll need to use the Disintegration Theorem (the classical version for probability distributions), and adapt some results from Proposition 5. Let's show as much as we can before showing this.

Now, hypothetically, if we had

then we could use that result to get

and we'd be done. So, our task is to show

for any pair of probability distributions and . Now, here's what we'll do. The and gives us probability distributions over , and the and are probability distributions over . So, let's take the joint distribution over given by selecting a point from according to the relevant distribution and applying . By the classical version of the disintegration theorem, we can write it either way as starting with the marginal distribution over and a semidirect product to , or by starting with the marginal distribution over and you take a semidirect product with some markov kernel to to get the joint distribution. So, we have:

for some Markov kernels . Why? Well, the joint distribution over is given by or respectively (you have a starting distribution, and lets you take an input in and get an output in ). But, breaking it down the other way, we start with the marginal distribution of those joint distributions on (the pushforward w.r.t. ), and can write the joint distribution as semidirect product going the other way. Basically, it's just two different ways of writing the same distributions, so that's why KL-divergence doesn't vary at all.

Now, it is also a fact that, for semidirect products (sorry, we're gonna let be arbitrary here and unconnected to the fixed ones we were looking at earlier, this is just a general property of semidirect products), we have:

To see this, run through the proof of Proposition 5, because probability distributions are special cases of infradistributions. Running up to right up before the inequality, we had

But when we're dealing with probability distributions, there's only one possible choice of probability distribution to select, so we just have

Applying this, we have:

The first equality is our expansion of semidirect product for probability distributions, second equality is the probability distributions being equal, and third equality is, again, expansion of semidirect product for probability distributions. Contracting the two sides of this, we have:

Now, the KL-divergence between a distribution and itself is 0, so the expectation on the left-hand side is 0, and we have

And bam, we have which is what we needed to carry the proof through.

John_Maxwell's Shortform

Potential counterargument: Second-strike capabilities are still relevant in the interstellar setting. You could build a bunch of hidden ships in the oort cloud to ram the foe and do equal devastation if the other party does it first, deterring a first strike even with tensions and an absence of communication. Further, while the "ram with high-relativistic objects" idea works pretty well for preemptively ending a civilization confined to a handful of planets, AI's would be able to colonize a bunch of little asteroids and KBO's and comets in the oort cloud, and the higher level of dispersal would lead to preemptive total elimination being less viable.

Introduction to Cartesian Frames

I will be hosting a readthrough of this sequence on MIRIxDiscord again, PM for a link.

Needed: AI infohazard policy

So, here's some considerations (not an actual policy)

It's instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.

First, a chunk from Wikipedia

Most of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine article by U.S. anti-weapons activist Howard Morland in 1979 on the "secret of the hydrogen bomb". In 1978, Morland had decided that discovering and exposing this "last remaining secret" would focus attention onto the arms race and allow citizens to feel empowered to question official statements on the importance of nuclear weapons and nuclear secrecy. Most of Morland's ideas about how the weapon worked were compiled from highly accessible sources—the drawings which most inspired his approach came from the

Encyclopedia Americana. Morland also interviewed (often informally) many former Los Alamos scientists (including Teller and Ulam, though neither gave him any useful information), and used a variety of interpersonal strategies to encourage informational responses from them (i.e., asking questions such as "Do they still use sparkplugs?" even if he wasn't aware what the latter term specifically referred to)....

When an early draft of the article, to be published inThe Progressivemagazine, was sent to the DOE after falling into the hands of a professor who was opposed to Morland's goal, the DOE requested that the article not be published, and pressed for a temporary injunction. After a short court hearing in which the DOE argued that Morland's information was (1). likely derived from classified sources, (2). if not derived from classified sources, itself counted as "secret" information under the "born secret" clause of the 1954 Atomic Energy Act, and (3). dangerous and would encourage nuclear proliferation...Through a variety of more complicated circumstances, the DOE case began to wane, as it became clear that some of the data they were attempting to claim as "secret" had been published in a students' encyclopedia a few years earlier....

Because the DOE sought to censor Morland's work—one of the few times they violated their usual approach of not acknowledging "secret" material which had been released—it is interpreted as being at least partially correct, though to what degree it lacks information or has incorrect information is not known with any great confidence.

So, broad takeaways from this: The Streisand effect is real. A huge part of keeping something secret is just having nobody suspect that there *is* a secret there to find. This is much trickier for nuclear weapons, which are of high interest to the state, while it's more doable for AI stuff (and I don't know how biosecurity has managed to stay so low-profile). This doesn't mean you can just wander around giving the rough sketch of the insight, in math, it's not too hard to reinvent things once you know what you're looking for. But, AI math does have a huge advantage in this it's a really broad field and hard to search through (I think my roommate said that so many papers get submitted to NeurIPS that you couldn't read through them all in time for the next NeurIPS conference), and, in order to reinvent something from scratch without having the fundamental insight, you need to be pointed in the *exact* right direction and even then you've got a good shot at missing it (see: the time-lag between the earliest neural net papers and the development of backpropagation, or, in the process of making the Infra-Bayes post, stumbling across concepts that could have been found months earlier if some time-traveler had said the right three sentences at the time.)

Also, secrets can get out through *really* dumb channels. Putting important parts of the H-bomb structure in a student's encyclopedia? Why would you do that? Well, probably because there's a lot of people in the government and people in different parts have different memories of which stuff is secret and which stuff isn't.

So, due to AI work being insight/math-based, security would be based a lot more on just... not telling people things. Or alluding to them. Although, there is an interesting possibility raised by the presence of so much other work in the field. For nuclear weapons work, things seem to be either secret or well-known among those interested in nuclear weapons. But AI has a big intermediate range between "secret" and "well-known". See all those Arxiv papers with like, 5 citations. So, for something that's kinda iffy (not serious enough (given the costs of the slowdown in research with full secrecy) to apply full secrecy, not benign enough to be comfortable giving a big presentation at NeurIPS about it), it might be possible to intentionally target that range. I don't think it's a binary between "full secret" and "full publish", there's probably intermediate options available.

Of course, if it's *known* that an organization is trying to fly under the radar with a result, you get the Streisand effect in full force. But, just as well-known authors may have pseudonyms, it's probably possible to just publish a paper on Arxiv (or something similar) under a pseudonym and not have it referenced anywhere by the organization as an official piece of research they funded. And it would be available for viewing and discussion and collaborative work in that form, while also (with high probability) remaining pretty low-profile.

Anyways, I'm gonna set a 10-minute timer to have thoughts about the guidelines:

Ok, the first thought I'm having is that this is probably a case where Inside View is just strictly better than Outside View. Making a policy ahead of time that can just be followed requires whoever came up with the policy to have a good classification in advance all the relevant categories of result and what to do with them, and that seems pretty dang hard to do especially because novel insights, almost by definition, are not something you expected to see ahead of time.

The next thought is that working something out for a while and then going "oh, this is roughly adjacent to something I wouldn't want to publish, when developed further" isn't *quite* as strong of an argument for secrecy as it looks like, because, as previously mentioned, even fairly basic additional insights (in retrospect) are pretty dang tricky to find ahead of time if you don't know what you're looking for. Roughly, the odds of someone finding the thing you want to hide scale with the number of people actively working on it, so that case seems to weigh in favor of publishing the result, but not actively publicizing it to the point where you can't befriend everyone else working on it. If one of the papers published by an organization could be built on to develop a serious result... well, you'd still have the problem of not knowing which paper it is, or what unremarked-on direction to go in to develop the result, if it was published as normal and not flagged as anything special. But if the paper got a whole bunch of publicity, the odds go up that someone puts the pieces together spontaneously. And, if you know everyone working on the paper, you've got a saving throw if someone runs across the thing.

There *is* a *very* strong argument for talking to several other people if you're unsure whether it'd be good to publish/publicize, because it reduces the problem of "person with laxest safety standards publicizes" to "organization with the laxest safety standards publicizes". This isn't a full solution, because there's still a coordination problem at the organization level, and it gives incentives for organizations to be really defensive about sharing their stuff, including safety-relevant stuff. Further work on the inter-organization level of "secrecy standards" is very much needed. But within an organization, "have personal conversation with senior personnel" sounds like the obvious thing to do.

So, current thoughts: There's some intermediate options available instead of just "full secret" or "full publish" (publish under pseudonym and don't list it as research, publish as normal but don't make efforts to advertise it broadly) and I haven't seen anyone mention that, and they seem preferable for results that would benefit from more eyes on them, that could also be developed in bad directions. I'd be skeptical of attempts to make a comprehensive policy ahead of time, this seems like a case where inside view on the details of the result would outperform an ahead-of-time policy. But, one essential aspect that *would* be critical on a policy level is "talk it out with a few senior people first to make the decision, instead of going straight for personal judgement", as that tamps down on the coordination problem considerably.

Introduction To The Infra-Bayesianism Sequence

Maximin over outcomes would lead to the agent devoting all its efforts towards avoiding the worst outcomes, sacrificing overall utility, while maximin over expected value pushes towards policies that do acceptably on average in all of the environments that it may find itself in.

Regarding "why listen to past me", I guess to answer this question I'd need to ask about your intuitions on Counterfactual mugging. What would you do if it's one-shot? What would you do if it's repeated? If you were told about the problem beforehand, would you pay money for a commitment mechanism to make future-you pay up the money if asked? (for +EV)

Basic Inframeasure Theory

Yeah, looking back, I should probably fix the m- part and have the signs being consistent with the usual usage where it's a measure minus another one, instead of the addition of two signed measures, one a measure and one a negative measure. May be a bit of a pain to fix, though, the proof pages are extremely laggy to edit.

Wikipedia's definition can be matched up with our definition by fixing a partial order where iff there's a that's a sa-measure s.t. , and this generalizes to any closed convex cone. I lifted the definition of "positive functional" from Vanessa, though, you'd have to chat with her about it.

We're talking about linear functions, not affine ones. is linear, not affine (regardless of f and c, as long as they're in and , respectively). Observe that it maps the zero of to 0.

Basic Inframeasure Theory

We go to the trouble of sa-measures because it's possible to add a sa-measure to an a-measure, and get another a-measure where the expectation values of all the functions went up, while the new a-measure we landed at would be impossible to make by adding an a-measure to an a-measure.

Basically, we've gotta use sa-measures for a clean formulation of "we added all the points we possibly could to this set", getting the canonical set in your equivalence class.

Admittedly, you *could* intersect with the cone of a-measures again at the end (as we do in the next post) but then you wouldn't get the nice LF-duality tie-in.

Adding the cone of a-measures instead would correspond to being able to take expectation values of continuous functions in , instead of in [0,1], so I guess you could reformulate things this way, but IIRC the 0-1 normalization doesn't work as well (ie, there's no motive for why you're picking 1 as the thing to renormalize to 1 instead of, say, renormalizing 10 to 10). We've got a candidate other normalization for that case, but I remember being convinced that it doesn't work for belief functions, but for the Nirvana-as-1-reward-forever case, I remember getting really confused about the relative advantages of the two normalizations. And apparently, when working on the internal logic of infradistributions, this version of things works better.

So, basically, if you drop sa-measures from consideration you don't get the nice LF-duality tie in and you don't have a nice way to express how upper completion works. And maybe you could work with a-measures and use upper completion w.r.t. a different cone and get a slightly different LF-duality, but then that would make normalization have to work differently and we haven't really cleaned up the picture of normalization in that case yet and how it interacts with everything else. I remember me and Vanessa switched our opinions like 3 times regarding which upper completion to use as we kept running across stuff and going "wait no, I take back my old opinion, the other one works better with this".

(A -> B) -> A

I found a paper about this exact sort of thing. Escardo and Olivia call that type signature a "selection functional", and the type signature is called a "quantification functional", and there's several interesting things you can do with them, like combining multiple selection functionals into one in a way that looks reminiscent of game theory. (ie, if has type signature , and has type signature , then has type signature .

Counterfactual Induction

Oh, I see what the issue is. Propositional tautology given means , not . So yeah, when A is a boolean that is equivalent to via boolean logic alone, we can't use that A for the exact reason you said, but if A isn't equivalent to via boolean logic alone (although it may be possible to infer by other means), then the denominator isn't necessarily small.

So, first off, I should probably say that a lot of the formalism overhead involved in

this post in particularfeels like the sort of thing that will get a whole lot more elegant as we work more things out, but "Basic inframeasure theory" still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up.Yes, your current understanding is correct, it's rebuilding probability theory in more generality to be suitable for RL in nonrealizable environments, and capturing a much broader range of decision-theoretic problems, as well as whatever spin-off applications may come from having the basic theory worked out, like our infradistribution logic stuff.

It copes with unrealizability because its hypotheses are not probability distributions, but sets of probability distributions (actually more general than that, but it's a good mental starting point), corresponding to properties that reality may have, without fully specifying everything. In particular, if an agent learns a class of belief functions (read: properties the environment may fulfill) is learned, this implies that for all properties within that class that the true environment fulfills (you don't know the true environment exactly), the infrabayes agent will match or exceed the expected utility lower bound that can be guaranteed if you know reality has that property (in the low-time-discount limit)

There's another key consideration which Vanessa was telling me to put in which I'll post in another comment once I fully work it out again.

Also, thank you for noticing that it took a lot of work to write all this up, the proofs took a while. n_n