Adam Shimi

Half-researcher, half-distiller (see, both in AI Safety. Funded, and also PhD in theoretical computer science (distributed computing).

If you're interested by some research ideas that you see in my posts, know that I probably have many private docs complete in the process of getting feedback (because for my own work, the AF has proved mostly useless in terms of feedback I can give you access if you PM me!


Reviews for the Alignment Forum
AI Alignment Unwrapped
Understanding Goal-Directedness
Toying With Goal-Directedness


Identifiability Problem for Superrational Decision Theories

As outlined in the last paragraph of the post. I want to convince people that TDT-like decision theories won't give a "neat" game theory, by giving an example where they're even less neat than classical game theory.

Hum, then I'm not sure I understand in what way classical game theory is neater here?

I think you're thinking about a realistic case (same algorithm, similar environment) rather than the perfect symmetry used in the argument. A communication channel is of no use there because you could just ask yourself what you would send, if you had one, and then you know you would have just gotten that message from the copy as well.

As long as the probabilistic coin flips are independent on both sides (you also mention the case where they're symmetric, but let's put that aside for the example), then you can apply the basic probabilistic algorithm for leader election: both copies flip a coin n times to get a n-bit number, which they exchange. If the numbers are different, then the copy with the smallest one says 0 and the other says 1; otherwise they flip a coin and return the answer. With this algorithm, you have probability  of deciding different values, and so you can get as close as you want to 1 (by paying the price in more random bits).

I'd be interested. I think even just more solved examples of the reasoning we want are useful currently.

Do you have examples of problems with copies that I could look at and that you think would be useful to study?

Identifiability Problem for Superrational Decision Theories

Well, if I understand the post correctly, you're saying that these two problems are fundamentally the same problem, and so rationality should be able to solve them both if it can solve one. I disagree with that, because from the perspective of distributed computing (which I'm used to), these two problems are exactly the two kinds of problems that are fundamentally distinct in a distributed setting: agreement and symmetry-breaking.

Communication won't make a difference if you're playing with a copy.

Actually it could. Basically all of distributed computing assumes that every process is running the same algorithm, and you can solve symmetry-breaking in this case with communication and additional constraint on the scheduling of processes (the difficulty here is that the underlying graph is symmetric, whereas if you had some form of asymmetry (like three processes in a line, such that the one in the middle has two neighbors but the others only have one), they you can use directly that asymmetry to solve symmetry-breaking.

(By the way, you just gave me the idea that maybe I can use my knowledge of distributed computing to look at the sort of decision problems where you play with copies? Don't know if it would be useful, but that's interesting at least)

Identifiability Problem for Superrational Decision Theories

I don't see how the two problems are the same. They are basically the agreement and symmetry breaking problems of distributed computing, and those two are not equivalent in all models. What you're saying is simply that in the no-communication model (where the same algorithm is used on two processes that can't communicate), these two problems are not equivalent. But they are asking for fundamentally different properties, and are not equivalent in many models that actually allow communication. 

Phylactery Decision Theory

I feel like doing a better job of motivating why we should care about this specific problem might help get you more feedback.

If we want to alter a decision theory to learn its set of inputs and outputs, your proposal makes sense to me at first glance. But I'm not sure why I should particularly care, or why there is even a problem to begin with solution. The link you provide doesn't help me much after skimming it, and I (and I assume many people) almost never read something that requires me to read other posts without even a summary of the references. I made an exception today because I'm trying to give more feedback, and I feel that this specific piece of feedback might be useful for you.

Basically, I'm not sure of what problem you're trying to solve with having this ability to learn your cartesian boundary, and so I'm unable to judge how well you are solving it.

Testing The Natural Abstraction Hypothesis: Project Intro

This project looks great! I especially like the focus on a more experimental kind of research, while still focused and informed on the specific concepts you want to investigate.

If you need some feedback on this work, don't hesitate to send me a message. ;)

Vanessa Kosoy's Shortform

Oh, right, that makes a lot of sense.

So is the general idea that we quantilize such that we're choosing in expectation an action that doesn't have corrupted utility (by intuitively having something like more than twice as many actions in the quantilization than we expect to be corrupted), so that we guarantee the probability of following the manipulation of the learned user report is small?

I also wonder if using the user policy to sample actions isn't limiting, because then we can only take actions that the user would take. Or do you assume by default that the support of the user policy is the full action space, so every action is possible for the AI?

Review of "Fun with +12 OOMs of Compute"

About the update

You're right, that's what would happen with an update.

I think that the model I have in mind (although I hadn't explicitly thought about it until know), is something like a distribution over ways to reach TAI (capturing how probable it is that they're the first way to reach AGI), and each option comes with its own distribution (let's say over years). Obviously you can compress that into a single distribution over years, but then you lose the ability to do fine grained updating.

For example, I imagine that someone with relatively low probability that prosaic AGI will be the first to reach AGI, upon reading your post, would have reasons to update the distribution for prosaic AGI in the way you discuss, but not to update the probability that prosaic AGI will be the first to reach TAI. On the other hand, if there was a argument centered more around an amount of compute we could plausibly get in a short timeframe (the kind of thing we discuss as potential follow-up work), then I'd expect that this same person, if convinced, would put more probability that prosaic AGI will be the first to reach TAI.

Graph-based argument

I must admit that I have trouble reading your graph because there's no scale (although I expect the spiky part is centered at +12 OOMs? As for the textual argument, I actually think it makes sense to put quite low probability to +13 OOMs if one agrees with your scenario.

Maybe my argument is a bit weird, but it goes something like this: based on your scenarios, it should be almost sure that we can reach TAI with +12 OOMs of magnitude. If it's not the case, then there's something fundamentally difficult about reaching TAI with prosaic AGI (because you're basically throwing all the compute we want at it), and so I expect very little probability of a gain from 1 OOMs.

The part about this reasoning that feels weird is that I reason about 13 OOMs based on what happens at 12 OOMs, and the idea that we care about 13 OOMs iff 12 OOMs is not enough. It might be completely wrong.

Reasons for 12 OOMs

To the first suspicion I'll say: I had good reasons for writing about 12 rather than 6 which I am happy to tell you about if you like.

I'm both interested, and (without knowing them), I expect that I will want you to have put them in the post, to deal with the implicit conclusion that you couldn't argue 6 OOMs.

Also interested by your arguments for 6 OOMs or pointers.

Review of "Fun with +12 OOMs of Compute"

Let me try to make an analogy with your argument.

Say we want to make X. What you're saying is "with 10^12 dollars, we could do it that way". Why on earth would I update at all whether it can be done with 10^6 dollars? If your scenario works with that amount, then you should have described it using only that much money. If it doesn't, then you're not providing evidence for the cheaper case.

Similarly here, if someone starts with a low credence on prosaic AGI, I can see how your arguments would make them put a bunch of probability mass close to +10^12 compute. But they have no reason to put probability mass anywhere far from that point, since the scenarios you give are tailored to that. And lacking an argument for why you can get that much compute in a short timeline, then they probably end up thinking that if prosaic AGI ever happens, it's probably after every other option. Which seems like the opposite of the point you're trying to make.

Review of "Fun with +12 OOMs of Compute"

You're welcome!

To put it another way: I don't actually believe we will get to +12 OOMs of compute, or anywhere close, anytime soon. Instead, I think that if we had +12 OOMs, we would very likely get TAI very quickly, and then I infer from that fact that the probability of getting TAI in the next 6 OOMs is higher than it would otherwise be (if I thought that +12 OOMs probably wasn't enough, then my credence in the next 6 OOMs would be correspondingly lower).

To some extent this reply also partly addresses the concerns you raised about memory and bandwidth--I'm not actually saying that we actually will scale that much; I'm using what would happen if we magically did as an argument for what we should expect if we (non-magically) scale a smaller amount.

(Talking only for myself here)

Rereading your post after seeing this comment:

What I’ve done in this post is present an intuition pump, a thought experiment that might elicit in the reader (as it does in me) the sense that the probability distribution should have the bulk of its mass by the 10^35 mark.

I personally misread this, and understood "the bulk of its mass at the 10^35 mark". The correct reading is more in line with what you're saying here. That's probably a reason why I personnally focused on the +12 OOMs mark (I mean, that's also in the title).

So I agree we misunderstood some parts of your post, but I still think our issue remains. Except that instead of being about justifying +12 OOMs of magnitude in the short term, it becomes about justifying why the +12 OOMs examples should have any impact on, let's say, +6 OOMs.

I personally don't feel like your examples give me an argument  for anywhere but the +12 OOMs mark. That's where they live, and those examples seem to require that much compute, or still a pretty big amount of it. So reading your post makes me feel like I should have more probability mass at this mark or very close to it, but I don't see any reason to update the probability at the +6OOMs mark say.

And if the +12 OOMs looks really far, as it does in my point of view, then that definitely doesn't make me update towards shorter timelines.

Vanessa Kosoy's Shortform

However, it can do much better than that, by short-term quantilizing w.r.t. the user's reported success probability (with the user's policy serving as baseline). When quantilizing the short-term policy, we can upper bound the probability of corruption via the user's reported probability of short-term failure (which we assume to be low, i.e. we assume the malign AI is not imminent). This allows the AI to find parameters under which quantilization is guaranteed to improve things in expectation.

I don't understand what you mean here by quantilizing. The meaning I know is to take a random action over the top \alpha actions, on a given base distribution. But I don't see a distribution here, or even a clear ordering over actions (given that we don't have access to the utility function).

I'm probably missing something obvious, but more details would really help.

Load More