Diffractor

Comments

Introduction to Cartesian Frames

I will be hosting a readthrough of this sequence on MIRIxDiscord again, PM for a link.

Needed: AI infohazard policy

So, here's some considerations (not an actual policy)

It's instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.

First, a chunk from Wikipedia

Most of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine article by U.S. anti-weapons activist Howard Morland in 1979 on the "secret of the hydrogen bomb". In 1978, Morland had decided that discovering and exposing this "last remaining secret" would focus attention onto the arms race and allow citizens to feel empowered to question official statements on the importance of nuclear weapons and nuclear secrecy. Most of Morland's ideas about how the weapon worked were compiled from highly accessible sources—the drawings which most inspired his approach came from the Encyclopedia Americana. Morland also interviewed (often informally) many former Los Alamos scientists (including Teller and Ulam, though neither gave him any useful information), and used a variety of interpersonal strategies to encourage informational responses from them (i.e., asking questions such as "Do they still use sparkplugs?" even if he wasn't aware what the latter term specifically referred to)....

When an early draft of the article, to be published in The Progressive magazine, was sent to the DOE after falling into the hands of a professor who was opposed to Morland's goal, the DOE requested that the article not be published, and pressed for a temporary injunction. After a short court hearing in which the DOE argued that Morland's information was (1). likely derived from classified sources, (2). if not derived from classified sources, itself counted as "secret" information under the "born secret" clause of the 1954 Atomic Energy Act, and (3). dangerous and would encourage nuclear proliferation...

Through a variety of more complicated circumstances, the DOE case began to wane, as it became clear that some of the data they were attempting to claim as "secret" had been published in a students' encyclopedia a few years earlier....

Because the DOE sought to censor Morland's work—one of the few times they violated their usual approach of not acknowledging "secret" material which had been released—it is interpreted as being at least partially correct, though to what degree it lacks information or has incorrect information is not known with any great confidence.

So, broad takeaways from this: The Streisand effect is real. A huge part of keeping something secret is just having nobody suspect that there is a secret there to find. This is much trickier for nuclear weapons, which are of high interest to the state, while it's more doable for AI stuff (and I don't know how biosecurity has managed to stay so low-profile). This doesn't mean you can just wander around giving the rough sketch of the insight, in math, it's not too hard to reinvent things once you know what you're looking for. But, AI math does have a huge advantage in this it's a really broad field and hard to search through (I think my roommate said that so many papers get submitted to NeurIPS that you couldn't read through them all in time for the next NeurIPS conference), and, in order to reinvent something from scratch without having the fundamental insight, you need to be pointed in the exact right direction and even then you've got a good shot at missing it (see: the time-lag between the earliest neural net papers and the development of backpropagation, or, in the process of making the Infra-Bayes post, stumbling across concepts that could have been found months earlier if some time-traveler had said the right three sentences at the time.)

Also, secrets can get out through really dumb channels. Putting important parts of the H-bomb structure in a student's encyclopedia? Why would you do that? Well, probably because there's a lot of people in the government and people in different parts have different memories of which stuff is secret and which stuff isn't.

So, due to AI work being insight/math-based, security would be based a lot more on just... not telling people things. Or alluding to them. Although, there is an interesting possibility raised by the presence of so much other work in the field. For nuclear weapons work, things seem to be either secret or well-known among those interested in nuclear weapons. But AI has a big intermediate range between "secret" and "well-known". See all those Arxiv papers with like, 5 citations. So, for something that's kinda iffy (not serious enough (given the costs of the slowdown in research with full secrecy) to apply full secrecy, not benign enough to be comfortable giving a big presentation at NeurIPS about it), it might be possible to intentionally target that range. I don't think it's a binary between "full secret" and "full publish", there's probably intermediate options available.

Of course, if it's known that an organization is trying to fly under the radar with a result, you get the Streisand effect in full force. But, just as well-known authors may have pseudonyms, it's probably possible to just publish a paper on Arxiv (or something similar) under a pseudonym and not have it referenced anywhere by the organization as an official piece of research they funded. And it would be available for viewing and discussion and collaborative work in that form, while also (with high probability) remaining pretty low-profile.

Anyways, I'm gonna set a 10-minute timer to have thoughts about the guidelines:

Ok, the first thought I'm having is that this is probably a case where Inside View is just strictly better than Outside View. Making a policy ahead of time that can just be followed requires whoever came up with the policy to have a good classification in advance all the relevant categories of result and what to do with them, and that seems pretty dang hard to do especially because novel insights, almost by definition, are not something you expected to see ahead of time.

The next thought is that working something out for a while and then going "oh, this is roughly adjacent to something I wouldn't want to publish, when developed further" isn't quite as strong of an argument for secrecy as it looks like, because, as previously mentioned, even fairly basic additional insights (in retrospect) are pretty dang tricky to find ahead of time if you don't know what you're looking for. Roughly, the odds of someone finding the thing you want to hide scale with the number of people actively working on it, so that case seems to weigh in favor of publishing the result, but not actively publicizing it to the point where you can't befriend everyone else working on it. If one of the papers published by an organization could be built on to develop a serious result... well, you'd still have the problem of not knowing which paper it is, or what unremarked-on direction to go in to develop the result, if it was published as normal and not flagged  as anything special. But if the paper got a whole bunch of publicity, the odds go up that someone puts the pieces together spontaneously. And, if you know everyone working on the paper, you've got a saving throw if someone runs across the thing.

There is a very strong argument for talking to several other people if you're unsure whether it'd be good to publish/publicize, because it reduces the problem of "person with laxest safety standards publicizes" to "organization with the laxest safety standards publicizes". This isn't a full solution, because there's still a coordination problem at the organization level, and it gives incentives for organizations to be really defensive about sharing their stuff, including safety-relevant stuff. Further work on the inter-organization level of "secrecy standards" is very much needed. But within an organization, "have personal conversation with senior personnel" sounds like the obvious thing to do.

So, current thoughts: There's some intermediate options available instead of just "full secret" or "full publish" (publish under pseudonym and don't list it as research, publish as normal but don't make efforts to advertise it broadly) and I haven't seen anyone mention that, and they seem preferable for results that would benefit from more eyes on them, that could also be developed in bad directions. I'd be skeptical of attempts to make a comprehensive policy ahead of time, this seems like a case where inside view on the details of the result would outperform an ahead-of-time policy. But, one essential aspect that would be critical on a policy level is "talk it out with a few senior people first to make the decision, instead of going straight for personal judgement", as that tamps down on the coordination problem considerably.

Introduction To The Infra-Bayesianism Sequence

Maximin over outcomes would lead to the agent devoting all its efforts towards avoiding the worst outcomes, sacrificing overall utility, while maximin over expected value pushes towards policies that do acceptably on average in all of the environments that it may find itself in.

Regarding "why listen to past me", I guess to answer this question I'd need to ask about your intuitions on Counterfactual mugging. What would you do if it's one-shot? What would you do if it's repeated? If you were told about the problem beforehand, would you pay money for a commitment mechanism to make future-you pay up the money if asked? (for +EV)

Basic Inframeasure Theory

Yeah, looking back, I should probably fix the m- part and have the signs being consistent with the usual usage where it's a measure minus another one, instead of the addition of two signed measures, one a measure and one a negative measure. May be a bit of a pain to fix, though, the proof pages are extremely laggy to edit.

Wikipedia's definition can be matched up with our definition by fixing a partial order where  iff there's a  that's a sa-measure s.t. , and this generalizes to any closed convex cone. I lifted the definition of "positive functional" from Vanessa, though, you'd have to chat with her about it.

We're talking about linear functions, not affine ones.  is linear, not affine (regardless of f and c, as long as they're in  and , respectively). Observe that it maps the zero of  to 0.

Basic Inframeasure Theory

We go to the trouble of sa-measures because it's possible to add a sa-measure to an a-measure, and get another a-measure where the expectation values of all the functions went up, while the new a-measure we landed at would be impossible to make by adding an a-measure to an a-measure.

Basically, we've gotta use sa-measures for a clean formulation of "we added all the points we possibly could to this set", getting the canonical set in your equivalence class.

Admittedly, you could intersect with the cone of a-measures again at the end (as we do in the next post) but then you wouldn't get the nice LF-duality tie-in.

Adding the cone of a-measures instead would correspond to being able to take expectation values of continuous functions in , instead of in [0,1], so I guess you could reformulate things this way, but IIRC the 0-1 normalization doesn't work as well (ie, there's no motive for why you're picking 1 as the thing to renormalize to 1 instead of, say, renormalizing 10 to 10). We've got a candidate other normalization for that case, but I remember being convinced that it doesn't work for belief functions, but for the Nirvana-as-1-reward-forever case, I remember getting really confused about the relative advantages of the two normalizations. And apparently, when working on the internal logic of infradistributions, this version of things works better.

So, basically, if you drop sa-measures from consideration you don't get the nice LF-duality tie in and you don't have a nice way to express how upper completion works. And maybe you could work with a-measures and use upper completion w.r.t. a different cone and get a slightly different LF-duality, but then that would make normalization have to work differently and we haven't really cleaned up the picture of normalization in that case yet and how it interacts with everything else. I remember me and Vanessa switched our opinions like 3 times regarding which upper completion to use as we kept running across stuff and going "wait no, I take back my old opinion, the other one works better with this".

(A -> B) -> A

I found a paper about this exact sort of thing. Escardo and Olivia call that type signature a "selection functional", and the type signature is called a "quantification functional", and there's several interesting things you can do with them, like combining multiple selection functionals into one in a way that looks reminiscent of game theory. (ie, if has type signature , and has type signature , then has type signature .

Counterfactual Induction

Oh, I see what the issue is. Propositional tautology given means , not . So yeah, when A is a boolean that is equivalent to via boolean logic alone, we can't use that A for the exact reason you said, but if A isn't equivalent to via boolean logic alone (although it may be possible to infer by other means), then the denominator isn't necessarily small.

Counterfactual Induction

Yup, a monoid, because and , so it acts as an identitity element, and we don't care about the order. Nice catch.

You're also correct about what propositional tautology given A means.

Dutch-Booking CDT

(lightly edited restatement of email comment)

Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".

There will be 0 utility for taking left path, -10 utility for taking the right path, and -1000 utility for a cosmic ray hit. The CDT counterfactual says 0 utility for taking left path, -10 utility for taking the right path, while the conditional says 0 utility for left path, -1010 utility for right path (because conditional on taking the right path, you were hit by a cosmic ray).

In order to get the dutch book to go through, we need to get the agent to take the right path, to exploit P(cosmic ray) changing between the decision time and afterwards. So the initial bet could be something like -1 utility now, +12 utility upon taking the right path and not being hit by a cosmic ray. But now since the optimal action is "take the right path along with the bet", the problem setup has been changed, and we can't conclude that the agent's conditional on taking the right path places high probability on getting hit by a cosmic ray (because now the right path is the optimal action), so we can't money-pump with the "+0.5 utility, -12 utility upon taking a cosmic ray hit" bet.

So this seems to dutch-book Death-in-Damascus, not CDTEDT cases in general.

Load More