## AI ALIGNMENT FORUMAF

Linda Linsefors

Hi, I am a Physicist, an Effective Altruist and AI Safety student/rehearser.

# Posts

Sorted by New

The "Commitment Races" problem

Imagine your life as a tree (as in data structure). Every observation which (from your point of view of prior knowledge) could have been different, and every decision which (from your point of view) could have been different, is a node in this tree.

Ideally you would would want to pre-analyse the entire tree, and decide the optimal pre-commitment for each situation. This is too much work.

So instead you wait and see which branch you find yourself in, only then make the calculations needed to figure out what you would do in that situation, given a complete analysis of the tree (including logical constraints, e.g. people predicting what you would have done, etc). This is UDT. In theory, I see no drawbacks with UDT. Except in practice UDT is also too much work.

What you actually do, as you say, is to rely on experience based heuristics. Experience based heuristics is much superior for computational efficiency, and will give you a leg up in raw power. But you will slide away from optimal DT, which will give you a negotiating disadvantage. Given that I think raw power is more important than negotiating advantage, I think this is a good trade-off.

The only situation where you want to rely more on DT principles, is in super important one-off situations, and you basically only get those in weird acausal trade situations. Like, you could frame us building a friendly AI as acausal trade, like Critch said, but that framing does not add anything useful.

And then there is things like this and this and this, which I don't know how to think of. I suspect it breaks somehow, but I'm not sure how. And if I'm wrong, getting DT right might be the most important thing.

But in any normal situation, you will either have repeated games among several equals, where some coordination mechanism is just uncomplicatedly in everyone interest. Or your in a situation where one person just have much more power over the other one.

The "Commitment Races" problem

(This is some of what I tried to say yesterday, but I was very tried and not sure I said it well)

Hm, the way I understand UDT, is that you give yourself the power to travel back in logical time. This means that you don't need to actually make commitment early in your life when you are less smart.

If you are faced with blackmail or transparent Newcomb's problem, or something like that, where you realise that if you had though of the possibility of this sort of situation before it happened (but with your current intelligence), you would have pre-committed to something, then you should now do as you would have pre-committed to.

This means that an UDT don't have to do tons of pre-commitments. It can figure things out as it goes, and still get the benefit of early pre-committing. Though as I said when we talked, it does loose some transparency which might be very costly in some situations. Though I do think that you loose transparency in general by being smart, and that it is generally worth it.

(Now something I did not say)

However the there is one commitment that you (maybe?[1]) have to do to get the benefit of UDT if you are not already UDT, which is to commit to become UDT. And I get that you are wary of commitments.

Though more concretely, I don't see how UDT can lead to worse behaviours. Can you give an example? Or do you just mean that UDT get into commitment races at all, which is bad? But I don't know any DT that avoids this, other than always giving in to blackmail and bullies, which I already know you don't, given one of the stories in the blogpost.

[1] Or maybe not. Is there a principled difference between never giving into blackmail becasue you pre-committed something, or just never giving into blackmail with out any binding pre-commitment? I suspect not really, which means you are UDT as long as you act UDT, and no pre-commitment needed, other than for your own sake.

The "Commitment Races" problem

I mostly agree with this post, except I'm not convinced it is very important. (I wrote some similar thought here.)

Raw power (including intelligence) will always be more important than having the upper hand in negotiation. Because I can only shift you up to the amount I can threaten you.

Let's say I can cause you up to X utility of harm, according to your utility function. If I'm maximally skilled at blackmail negotiation then I can decide your action with in the set of action such that your utility is with in (max-X, max] utility.

If X utility is a lot, then I can influence you a lot. If X is not so much then I don't have much power over you. If I'm strong then X will be large, and influencing your action will probably be of little importance to me.

Blackmail is only important when players are of similar straights which is probably unlikely, or if the power to destroy is much more than the power to create, which I also find unlikely.

The main scenario where I expect blackmail to seriously matter (among super intelligences) is in aclausal trade between different universes. I'm sceptical to this being a real thing, but admit I don't have strong arguments on this point.

The "Commitment Races" problem

Meanwhile, a few years ago when I first learned about the concept of updatelessness, I resolved to be updateless from that point onwards. I am now glad that I couldn't actually commit to anything then.

Why is that?

AISU 2021

AISU needs a logo. If you are interested in making one for us, let me know.

AISU 2021

A note about previous events, and name changes

This is indeed the third AI Safety Unconference I'm involved in organising. The previous too where TAISU (short for Technical AI Safety Unconference), and Web-TAISU.

The first event was an in person event which took place at EA Hotel (CEEALAR). I choose to give that event a more narrow focus due to lack of space, and Web-TAISU where mainly just a quick adaptation to there suddenly being a plague about.

Having a bit more time to reflect this time, me and Aaron Roth have decided that there is less reason to put restriction on an online event, so this time we are inviting everything AI Safety related.

Buy the way, another thing that is new for this year, is that I have a co-organiser. Thanks to Aaron for joining and also reminding me that it was time for another AI Safety Unconference.

Reflections on Larks’ 2020 AI alignment literature review

Ok, that makes sense. Seems like we are mostly on the same page then.

I don't have strong opinions weather drawing in people via prestige is good or bad. I expect it is probably complicated. For example, there might be people who want to work on AI Safety for the right reason, but are too agreeable to do it unless it reach some level of acceptability. So I don't know what the effects will be on net. But I think it is an effect we will have to handle, since prestige will be important for other reasons.

On the other hand, there are lots of people who really do want to help, for the right reason. So if growth is the goal, helping these people out seems like just an obvious thing to do. I expect there are ways funders can help out here too.

I would not update much on the fact that currently most research is produced by existing institutions. It is hard to do good research, and even harder with out collogues, sallary and other support that comes with being part of an org. So I think there is a lot of room for growth, by just helping the people who are already involved and trying.

Reflections on Larks’ 2020 AI alignment literature review

There are two basic ways to increase the number of AI Safety refreshers.
1) Take mission aligned people (usually EA undergraduates) and help then gain the skills.
2) Take a skilled AI researcher and convince them to join the mission.

I think these two types of growth may have very different effects.

A type 1 new person might take some time to get any good, but will be mission aligned. If that person looses sight of the real problem, I am very optimistic about just reminding them what AI Safety is really about, and they will get back on track. Further more, these people already exist, and are already trying to become AI Safety researches. We can help them, ignore them, or tell them to stop. Ignoring them will produce more noise compared to helping them, since the normal pressure of building academic prestige is currently not very aligned with the mission. So do we support them or tell them to stop? Actively telling people not to try to help with AI Safety seems very bad, it is something I would expect to have bad cultural effects outside just regulating how many people are doing AI Safety research.

A type 2 new person who are converted to AI Safety research becasue they actually care about the mission is not to dissimilar from a type 2 new person, so I will not write more about that.

However there is an other type of type 2 person who will be attracted to AI Safety as a side effect of AI Safety being cool and interesting. I think there is a risk that these people takes over the field and diverts the focus completely. I'm not sure how to stop this though since this is a direct side effect of gaining respectability, and AI Safety will need respectability. And we can't just work in the shadows until it is the right time, because we don't know the timelines. The best plan I have for keeping global AI Safety research on course, is to put as many of "our" people in to the field as we can. We have a founders effect advantage, and I expect this to get stronger the more truly mission aligned people we can put into academia.

I agree with alexflint, that there are bad growth trajectories and good growth trajectories. But I don't think the good one is as hard to hit as they do. I think partly what is wrong is the model of AI Safety as a single company. I don't think this is a good intuition pump. Noise is a thing, but it is much less intrusive that this metaphor suggest. Someone at MIRI told me that to first approximation he don't read other peoples work, so at least for this person, it don't matter how much noise is published, and I think this is a normal situation, especially for people interested deep work.

What mostly keep people in academia from doing deep work is the pressure to constantly publish.

I think focusing on growth v.s. not growth is the wrong question. But I think focusing on deep work is the right question. So let's help people do deep work. Or, at least that what I aim to do. And I'm also happy to discuss with anyone.

AI Safety Discussion Days

The next discussion day is on July 18th.

Announcing Web-TAISU, May 13-17

So... apparently I underestimate the need to send out event reminders, but better late than never. Today is the 2:nd day (out of 4) of Web-TAISU, and it is not too late to join.