All of Linda Linsefors's Comments + Replies

The "Commitment Races" problem

Imagine your life as a tree (as in data structure). Every observation which (from your point of view of prior knowledge) could have been different, and every decision which (from your point of view) could have been different, is a node in this tree. 

Ideally you would would want to pre-analyse the entire tree, and decide the optimal pre-commitment for each situation. This is too much work. 

So instead you wait and see which branch you find yourself in, only then make the calculations needed to figure out what you would do in that situation, given a... (read more)

The "Commitment Races" problem

(This is some of what I tried to say yesterday, but I was very tried and not sure I said it well)

Hm, the way I understand UDT, is that you give yourself the power to travel back in logical time. This means that you don't need to actually make commitment early in your life when you are less smart.

If you are faced with blackmail or transparent Newcomb's problem, or something like that, where you realise that if you had though of the possibility of this sort of situation before it happened (but with your current intelligence), you would have pre-committed to ... (read more)

1Daniel Kokotajlo3moThanks for the detailed reply! The difficulty is in how you spell out that hypothetical. What does it mean to think about this sort of situation before it happened but with your current intelligence? Your current intelligence includes lots of wisdom you've accumulated, and in particular, includes the wisdom that this sort of situation has happened, and more generally that this sort of situation is likely, etc. Or maybe it doesn't -- but then how do we define current intelligence then? What parts of your mind do we cut out, to construct the hypothetical? I've heard of various ways of doing this and IIRC none of them solved the problem, they just failed in different ways. But it's been a while since I thought about this. One way they can fail is by letting you have too much of your current wisdom in the hypothetical, such that it becomes toothless -- if your current wisdom is that people threatening you is likely, you'll commit to giving in instead of resisting, so you'll be a coward and people will bully you. Another way they can fail is by taking away too much of your current wisdom in the hypothetical, so that you commit to stupid-in-retrospect things too often.
The "Commitment Races" problem

I mostly agree with this post, except I'm not convinced it is very important. (I wrote some similar thought here.)

Raw power (including intelligence) will always be more important than having the upper hand in negotiation. Because I can only shift you up to the amount I can threaten you.

Let's say I can cause you up to X utility of harm, according to your utility function. If I'm maximally skilled at blackmail negotiation then I can decide your action with in the set of action such that your utility is with in (max-X, max] utility.

If X utility is a lot, then... (read more)

2Daniel Kokotajlo3moI agree raw power (including intelligence) is very useful and perhaps generally more desireable than bargaining power etc. But that doesn't undermine the commitment races problem; agents with the ability to make commitments might still choose to do so in various ways and for various reasons, and there's general pressure (collective action problem style) for them to do it earlier while they are stupider, so there's a socially-suboptimal amount of risk being taken. I agree that on Earth there might be a sort of unipolar takeoff where power is sufficiently imbalanced and credibility sufficiently difficult to obtain and "direct methods" easier to employ, that this sort of game theory and bargaining stuff doesn't matter much. But even in that case there's acausal stuff to worry about, as you point out.
The "Commitment Races" problem

Meanwhile, a few years ago when I first learned about the concept of updatelessness, I resolved to be updateless from that point onwards. I am now glad that I couldn't actually commit to anything then.


Why is that?

3Daniel Kokotajlo3moAll the versions of updatelessness that I know of would have led to some pretty disastrous, not-adding-up-to-normality behaviors, I think. I'm not sure. More abstractly, the commitment races problem has convinced me to be more skeptical of commitments, even ones that seem probably good. If I was a consequentialist I might take the gamble, but I'm not a consequentialist -- I have commitments built into me that have served my ancestors well for generations, and I suspect for now at least I'm better off sticking with that than trying to self-modify to something else.
AISU 2021

AISU needs a logo. If you are interested in making one for us, let me know.

AISU 2021

A note about previous events, and name changes

This is indeed the third AI Safety Unconference I'm involved in organising. The previous too where TAISU (short for Technical AI Safety Unconference), and Web-TAISU.

The first event was an in person event which took place at EA Hotel (CEEALAR). I choose to give that event a more narrow focus due to lack of space, and Web-TAISU where mainly just a quick adaptation to there suddenly being a plague about.

Having a bit more time to reflect this time, me and Aaron Roth have decided that there is less reason to put res... (read more)

Reflections on Larks’ 2020 AI alignment literature review

Ok, that makes sense. Seems like we are mostly on the same page then. 

I don't have strong opinions weather drawing in people via prestige is good or bad. I expect it is probably complicated. For example, there might be people who want to work on AI Safety for the right reason, but are too agreeable to do it unless it reach some level of acceptability. So I don't know what the effects will be on net. But I think it is an effect we will have to handle, since prestige will be important for other reasons. 

On the other hand, there are lots of people w... (read more)

1Alex Flint4moI very much agree with these two:
Reflections on Larks’ 2020 AI alignment literature review

There are two basic ways to increase the number of AI Safety refreshers.
1) Take mission aligned people (usually EA undergraduates) and help then gain the skills.
2) Take a skilled AI researcher and convince them to join the mission.

I think these two types of growth may have very different effects. 

A type 1 new person might take some time to get any good, but will be mission aligned. If that person looses sight of the real problem, I am very optimistic about just reminding them what AI Safety is really about, and they will get back on track. Further mor... (read more)

2Alex Flint4moThank you for this thoughtful comment Linda -- writing this replying has helped me to clarify my own thinking on growth and depth. My basic sense is this: If I meet someone who really wants to help out with AI safety, I want to help them to do that, basically without reservation, regardless of their skill, experience, etc. My sense is that we have a huge and growing challenge in navigating the development of advanced AI, and there is just no shortage of work to do, though it can at first be quite difficult to find. So when I meet individuals, I will try to help them find out how to really help out. There is no need for me to judge whether a particular person really wants to help out or not; I simply help them see how they can help out, and those who want to help out will proceed. Those who do not want to help out will not proceed, and that's fine too -- there are plenty of good reasons for a person to not want to dive head-first into AI safety. But it's different when I consider setting up incentives, which is what @Larks was writing about: I'm quite concerned about "drawing people into the field through credibility and prestige" and even about "drawing people into the field through altruism, nerd-sniping, and apparent tractability". The issue is not the people who genuinely want to help out, whom I consider to be a boon to the field regardless of their skill or experience. The issue is twofold: 1. Drawing people who are not particularly interested in helping out into the field via incentives (credibility, prestige, etc). 2. Tempting those who do really want to help out and are already actually helping out to instead pursue incentives (credibility, prestige, etc). So I'm not skeptical of growth via helping individuals, I'm skeptical of growth via incentives.
Announcing Web-TAISU, May 13-17

So... apparently I underestimate the need to send out event reminders, but better late than never. Today is the 2:nd day (out of 4) of Web-TAISU, and it is not too late to join.

General information about the event:

Collaborative Schedule:

Let me know if you have any questions.

Using vector fields to visualise preferences and make them consistent

As mentioned, I did think of this of this model before, and I also disagree with Justin/Convergence on how to use it.

Lets say that the underlying space for the vector field is the state of the world. Should we really remove curl? I'd say no. It is completely valid to want to move along some particular path, even a circle, or more likely, a spiral.

Alternatively, lets say that the underlying space for the vector field is world histories. Now we should remove curl, becasue any circular preference in this space is inconsistent. But what even is the vector... (read more)

[Meta] Do you want AIS Webinars?

Let's do it!

If you pick a time and date and write up an abstract, then I will sort out the logistic. Worst case it's just you and me having a conversation, but most likely some more people will show up.

Linda Linsefors's Shortform

I'm basically ready to announce the next Technical AI Safety Unconference (TAISU). But I have hit a bit of decision paralysis as to what dates it should be.

If you are reasonably interested in attending, please help me by filling in this doodle

If you don't know what this is about, have a look at the information for the last one.

The venue will be EA Hotel in Blackpool UK again.

“embedded self-justification,” or something like that

The way I understand your division of floors and sealing, the sealing is simply the highest level meta there is, and the agent has *typically* no way of questioning it. The ceiling is just "what the algorithm is programed to do". Alpha Go is had programed to update the network weights in a certain way in response to the training data.

What you call floor for Alpha Go, i.e. the move evaluations, are not even boundaries (in the sense nostalgebraist define it), that would just be the object level (no meta at all) policy.

I think this structure will b... (read more)

What you call floor for Alpha Go, i.e. the move evaluations, are not even boundaries (in the sense nostalgebraist define it), that would just be the object level (no meta at all) policy.

I think in general the idea of the object level policy with no meta isn't well-defined, if the agent at least does a little meta all the time. In AlphaGo, it works fine to shut off the meta; but you could imagine a system where shutting off the meta would put it in such an abnormal state (like it's on drugs) that the observed behavior wouldn't mean very much ... (read more)

Vanessa Kosoy's Shortform

I agree that you can assign what ever belief you want (e.g. what ever is useful for the agents decision making proses) for for what happens in the counterfactual when omega is wrong, in decision problems where Omega is assumed to be a perfect predictor. However if you want to generalise to cases where Omega is an imperfect predictor (as you do mention), then I think you will (in general) have to put in the correct reward for Omega being wrong, becasue this is something that might actually be observed.

1Vanessa Kosoy1yThe method should work for imperfect predictors as well. In the simplest case, the agent can model the imperfect predictor as perfect predictor + random noise. So, it definitely knows the correct reward for Omega being wrong. It still believes in Nirvana if "idealized Omega" is wrong.
All I know is Goodhart

Weather this works or not is going to depend heavily on what looks like.

Given , i.e. , what does this say about ?

The answer depends on the amount of mutual information between , and . Unfortunately the the more generic is, (i.e. any function is possible) the less mutual information there will be. Therefore, unless we know some structure about , the restriction to is not going to do much. The agent will just find a very differen... (read more)

4Stuart Armstrong2yYou can't get too much work from a single bit of information ^_^
Conceptual Problems with UDT and Policy Selection

I think UDT1.1 have two fundamentally wrong assumptions built in.

1) Complete prior: UDT1.1 follows the policy that is optimal according to it's prior. This is incommutable in general settings and will have to be approximated some how. But even an approximation of UDT1.1 assumes that UDT1.1 is at least well defined. However in some multi agent settings or when the agent is being fully simulated by the environment, or any other setting where the environment is necessary bigger than the agent, then UDT1.1 is ill defined.

2) Free will: In the problem Agent... (read more)

TAISU - Technical AI Safety Unconference

The TAISU is now full. I might still accept exceptional applications. But don't expect to be accepted just becasue you meet the basic requirements.

TAISU - Technical AI Safety Unconference

There is still room for more participants at TAISU, but sleeping space is starting to fill up. The EA Hotel dorm rooms are almost fully booked. Fore those who don't fit in the dorm or want some more privet space, there are lots of near by hotel. However since TAISU happens to be on a UK bank holiday, these might fill up too.

TAISU - Technical AI Safety Unconference

Accepted applicants so far (July 5)

Gavin Leech, University of Bristol (soon)

Michaël Trazzi, FHI

David Lindner, ETH Zürich

Gordon Worley, PAISRI


Josh Jacobson, BERI


Andrea Luppi, Harvard University / FHI

Dragan Mlakić

Noah Topper

Andrew Schreiber, Ought

Jan Brauner, University of Edinburgh - weekend only

Søren Elverlin,

Victoria Krakovna, DeepMind - weekend only

Janos Kramar, DeepMind - weekend only

TAISU - Technical AI Safety Unconference

Are you worried about the unconference not having enough participants (in total), or it not having enough senior participants?

TAISU - Technical AI Safety Unconference

There i no specific deadline for signing up.

However, i might close the application at some point due to the unconference being full. We have more or less unlimited sleeping space since the EA Hotel is literally surrounded by other hotels. So the limitation is spaces for talks, discussions and workshops and such.

If all activities are in the EA Hotel, we should not be much more than 20 people. If it looks like I will get more applications than that I will see if it is possibly to rent some more commons spaces at other hotels I have not looked in to this yet, but I will soon.

We currently have 4 accepted applicants.

1Linda Linsefors2ycommend removed by me
TAISU - Technical AI Safety Unconference

Good initiative. I will add a question to the application form, asking if the applicant allows me to share that they are coming. I then will share the participant list here (with the names of those how agreed) and update every few days.

For pledges, just write here as Ryan said.

The Game Theory of Blackmail

I would decompose that in to a value trade + a blackmail.

The default for me would be to take the action that gives me 1 utility. But you can offer me a trade where you give me something better in return for me not taking that action. This would be a value trade.

Lets now take me agreeing to your proposition as the default. If I then choose to threaten to call the deal off, unless you pay me a even higher amount, than this is blackmail.

I don't think that these parts (the value trade and the blackmail) should be viewed as sequential. I wrote it that way ... (read more)

The Game Theory of Blackmail

I did not mean to imply that the choices had to be made simultaneous, or in any other particular order, just that this is the type of payoff matrix. But I also think that "simultaneous choice" v.s. "sequential game" is a false dichotomy. If both players are UDT, every game is a game simultaneous choice game (where the choices are over complete policies).

I know that according to what I describe, the blackmailers threat is not credible in the game theory sense of the word. Sow what? It is still possible to make credible threats in the common-use meaning of the word, which is what matters.

1Vladimir Slepnev2yThreatening to crash your car unless the passenger gives you a dollar is also not credible in the common meaning of the word...
Probability is fake, frequency is real

I agree that "want" is not the correct word exactly. What I mean by prior is an agents actual a priori beliefs, so by definition there will be no mis-match there. I am not trying to say that you choose your prior exactly.

What I am gesturing at is that no prior is wrong, as long as it does not assign zero probability to the true outcome. And I think that much of the confusion in atrophic situation comes from trying to solve an under-constrained system.

Two agents can have the same source code and optimise different utility functions

I agree.

An even simpler example: If the agents are reward learners, both of them will optimize for their own reward signal, which are two different things in the physical world.