cousin_it — AI Alignment Forum

AI ALIGNMENT FORUM
AF

I think AI offers a chance of getting huge power over others, so it would create competitive pressure in any case. In case of a market economy it's market pressure, but in case of countries it would be a military arms race instead. And even if the labs didn't get any investors and raced secretly, I think they'd still feel under a lot of pressure. The chance of getting huge power is what creates the problem, that's why I think spreading out power is a good idea. There would still be competition of course, but it would be normal economic levels of competition, and people would have some room to do the right things.

Problems I've Tried to Legibilize

cousin_it22d*710

I think on the level of individual people, there's a mix of moral and self-interested actions. People sometimes choose to do the right thing (even if the right thing is as complicated as taking metaethics and metaphilosophy into account), or can be convinced to do so. But with corporations it's another matter: they choose the profit motive pretty much every time.

Making an AI lab do the right thing is much harder than making its leader concerned. A lab leader who's concerned enough to slow down will be pressured by investors to speed back up, or get replaced, or get outcompeted. Really you need to convince the whole lab and its investors. And you need to be more convincing than the magic of the market! Recall that in many of these labs, the leaders / investors / early employees started out very concerned about AI safety and were reading LW. Then the magic of the market happened and now the labs are racing at full speed, do you think our convincing abilities can be stronger than the thing that did that? The profit motive, again. In my first comment there was a phrase about things being not profitable to understand.

What it adds up to is, even with our uncertainty about ethics and metaethics, it seems to me that concentration of power is itself a force against morality. The incentives around concentrated power are all wrong. Spreading out power is a good thing that enables other good things, enables individuals to sometimes choose what's right. I'm not absolutely certain but that's my current best guess.

Problems I've Tried to Legibilize

cousin_it23d*35

I'm pretty slow to realize these things, and I think other people are also slow, so the window is already almost closed. But in any case, my current thinking is that we need to start pushing on the big actors from outside, try to reduce their power. Trying to make them see the light is no longer enough.

What it means in practical terms: - Make it clear that we frown on people who choose to work for AI labs, even on alignment. This social pressure (on LW and related forums maybe) might already do some good. - Make it clear that we're allied with the relatively poor majority of people outside the labs, and in particular those who are already harmed by present harms. Make amends with folks on the left who have been saying such things for years. - Support protests against labs, support court cases against them having to do with e.g. web scraping, copyright infringement, misinformation, suicides. Some altruist money in this might go a long way. - Think more seriously about building organizations that will make AI power more spread out. Open source, open research, open training. Maybe some GPL-like scheme to guarantee that things don't get captured. We need to reduce concentration of power in the near term, enable more people to pose a challenge to the big actors. I understand it increases other risks, but in my opinion it's worth it.

Problems I've Tried to Legibilize

cousin_it23d1013

I'm worried about the approach of "making decisionmakers realize stuff". In the past couple years I've switched to a more conflict-theoretic view: the main problem to me is that the people building AI don't want to build aligned AI. Even if we solved metaethics and metaphilosophy tomorrow, and gave them the solution on a plate, they wouldn't take it.

This is maybe easiest to see by looking at present harms. An actually aligned AI would politely decline to do such things as putting lots of people out of jobs or filling the internet with slop. So companies making AI for the market have to make it misaligned in at least these ways, otherwise it'll fail in the market. Extrapolating into the future, even if we do lots of good alignment research, markets and governments will pick out only those bits that contribute to market-aligned or government-aligned AI. Which (as I've been saying over and over) will be really bad for most people, because markets and governments don't necessarily need most people.

So this isn't really a comment on the list of problems (which I think is great), but more about the "theory of change" behind it. I no longer have any faith in making decisionmakers understand something it's not profitable for them to understand. I think we need a different plan.

Geometric UDT

cousin_it1mo*10

Sure, but if we put a third "if" on top (namely, "it's a representation of our credences, but also both hypotheses are nosy neighbors that care about either world equally"), doesn't that undo the second "if" and bring us back to the first?

Geometric UDT

cousin_it1mo30

I still don't completely understand what your assumptions are supposed to model, but if we take them on face value, then it seems to me that always making rainbows is the right answer. After all, if both hypotheses are "nosy neighbors" that don't care which universe we end up in, there's no point figuring out which universe we end up in: we should just make rainbows because it's cheaper. No?

Plans A, B, C, and D for misalignment risk

cousin_it2mo10

Yeah, that partly makes sense to me. I guess my intuition is like, if 95% of the company is focused on racing as hard as possible (and using AI leverage for that too, AI coming up with new unsafe tricks and all that), then the 5% who care about safety probably won't have that much impact.

Plans A, B, C, and D for misalignment risk

cousin_it2mo417

The OP says takeover risk is 45% under plan D and 75% under plan E. We're supposed to gain an extra 30% of safety from this feeble "build something by next week with 1% of compute"? Not happening.

My point is that if the "ten people on the inside" obey their managers, plan D will have a tiny effect at best. And if we instead postulate that they won't obey their managers, then there are no such "ten people on the inside" in the first place. So we should already behave as if we're in world E.

Plans A, B, C, and D for misalignment risk

cousin_it2mo13

Can you maybe describe in more detail how you imagine it? What specifically do the "ten people on the inside" do, if company leadership disagrees with them about safety?

Plans A, B, C, and D for misalignment risk

cousin_it2mo20

Do you know any people working at frontier labs who would be willing to do the kind of thing you describe in plan D, some kind of covert alignment against the wishes of the larger company? Who would physically press keys on their terminal to do it, as opposed to quitting or trying to sway the company? Not asking to name names, just my hunch is that there are very few such people now, maybe none at all. And if that's the case, we're in E world already.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments