All of Alexis Carlier's Comments + Replies

Literature Review on Goal-Directedness

Glad to see this published—nice work!

1Adam Shimi5moThanks!
[AN #55] Regulatory markets and international standards as a means of ensuring beneficial AI

Re Regulatory markets for AI safety: You say that the proposal doesn’t seem likely to work if “alignment is really hard and we only get one shot at it” (i.e. unbounded maximiser with discontinuous takeoff). Do you expect that status-quo government regulation would do any better, or just that any regulation wouldn’t be helpful in such a scenario? My intuition is that even if alignment is really hard, regulation could be helpful e.g. by reducing races to the bottom, and I’d rather have a more informed group (like people from a policy and technical safety team at a top lab) implementing it instead of a less-informed government agency. I’m also not sure what you mean by legible regulation.

3Rohin Shah1yI agree that regulation could be helpful by reducing races to the bottom; I think what I was getting at here (which I might be wrong about, as it was several months ago) was that it is hard to build regulations that directly attack the technicalproblem. Consider for example the case for car manufacturing. You could have two types of regulations: 1. Regulations that provide direct evidence of safety: For example, you could require that all car designs be put through a battery of safety tests, e.g. crashing them into a wall and ensuring that the airbags deploy. 2. Regulations that provide evidence of thinking aboutsafety: For example, you could require that all car designs have at least 5 person-years of safety analysis done by people with a degree in Automotive Safety (which is probably not an actual field but in theory could be one). Iirc, the regulatory markets paper seemed to have most of its optimism on the first kind of regulation, or at least that's how I interpreted it. That kind of regulation seems particularly hard in the one-shot alignment case. The second kind of regulation seems much more possible to do in all scenarios, and preventing races to the bottom is an example of that kind of regulation. I'm not sure what I meant by legible regulation -- probably I was just emphasizing the fact that for regulations to be good, they need to be sufficiently clear and understood by companies so that they can actually be in compliance with them. Again, for regulations of the first kind this seems pretty hard to do.
Cortés, Pizarro, and Afonso as Precedents for Takeover

Is this a fair description of your disagreement re the 90% argument?

Daniel thinks that a 90% reduction in the population of a civilization corresponds to a ~90% reduction in their power/influentialness. Because the Americans so greatly outnumbered the Spanish, this ten-fold reduction in power/influentialness doesn’t much alter the conclusion.

Matthew thinks that a 90% reduction in the population of a civilization means that “you don’t really have a civilization”, which I interpret to mean something like a ~99.9%+ reduction in t... (read more)

4Matthew Barnett1yFor my part, I think you summarized my position fairly well. However, after thinking about this argument for another few days, I have more points to add. * Disease seems especially likely to cause coordination failures since it's an internal threat rather than an external threat (which unlike internal threats, tend to unite empires). We can compare the effects of the smallpox epidemic in the Aztec and Inca empires alongside other historical diseases during wartime, such as the Plauge of Athens [] which arguably is what caused Athens to lose the Peloponnesian War. * Along these same lines, the Aztec/Inca didn't have any germ theory of disease, and therefore didn't understand what was going on. They may have thought that the gods were punishing them for some reason, and therefore they probably spent a lot of time blaming random groups for the catastrophe. We can contrast these circumstances to eg. the Paraguayan War [] which killed up to 90% of the male population, but people probably had a much better idea what was going on and who was to blame, so I expect that the surviving population had an easier time coordinating. * A large chunk of the remaining population likely had some sort of disability. Think of what would happen if you got measles and smallpox in the same two year window: even if you survived it probably wouldn't look good. This means that the pure death rate is an underestimate of the impact of a disease. The Aztecs, for whom "only" 40 percent died of disease, were still greatly affected []
3Daniel Kokotajlo1yThanks Alexis, this seems like an accurate description to me. Strong-upvoted, partly because I want to reward people for doing these sorts of summary-and-distillation stuff. As for your question, hmm, I'm not sure. I tentatively say yes, but my hesitations are (1) cases where 90% of the population dies are probably very rare, and (2) how would we measure power anyway? Presumably most civilizations that lose 90% of their population do end up conquered by someone else pretty quickly, since most civilizations aren't 10x more powerful than all their neighbors. I think the crux is this business about the chain of command. Cortez and Pizarro succeeded by getting Americans to ally with them and/or obey them. The crux is, would they have been able to do this as well or mostly as well without the disease? I think that reading a bunch of books on what happened might more or less answer this question. For example, maybe the books will say that the general disarray caused by the disease created a sense of desperation and confusion in the people which led them to be open to the conquistador's proposals when otherwise they would have dismissed them. In which case, I concede defeat in this disagreement. Or maybe the books will say that if only the conquistadors had been outnumbered even more , they would have lost. But what I predict is that the books will say that, for the most part, the reasons why people allied with Cortes and Pizarro had more to do with non-disease considerations: "Here is this obviously powerful representative of an obviously powerful faraway empire, wielding intriguing technology that we could benefit from. There is our hated enemy, Tenochtitlan, who has been oppressing us for decades. Now is our chance to turn the tables on our oppressors!" Similarly, I predict that the reason why the emperors allowed the conquistadors to get close enough to ambush them have little to do with disease and more to do with, well, just not predicting that the conquistadors
What can the principal-agent literature tell us about AI risk?

I agree that this seems like a promising research direction! I think this would be done best while also thinking about concrete traits of AI systems, as discussed in this footnote. One potential beneficial outcome would be to understand which kind of systems earn rents and which don't; I wouldn't be surprised if the distinction between rent earning agents vs others mapped pretty cleanly onto a Bostromian utility maximiser vs CAIS distinction, but maybe it won't.

In any case, the alternative perspective offered by the agency rents framing comp... (read more)

What can the principal-agent literature tell us about AI risk?
The claim that this couldn't work because such models are limited seems just arbitrary and wrong to me.

The economists I spoke to seemed to think that in agency unawareness models conclusions follow pretty immediately from the assumptions and so don't teach you much. It's not that they can't model real agency problems, just that you don't learn much from the model. Perhaps if we'd spoken to more economists there would have been more disagreement on this point.

2RobinHanson1yWe have lots of models that are useful even when the conclusions follow pretty directly. Such as supply and demand. The question is whether such models are useful, not if they are simple.
What can the principal-agent literature tell us about AI risk?

Thanks for catching this! You’re correct that that sentence is inaccurate. Our views changed while iterating the piece and that sentence should have been changed to: “PAL confirms that due to diverging interests and imperfect monitoring, AI agents could get some rents.”

This sentence too: “Overall, PAL tells us that agents will inevitably extract some agency rents…” would be better as “Overall, PAL is consistent with AI agents extracting some agency rents…”

I’ll make these edits, with a f... (read more)

2Wei Dai1yThanks for making the changes, but even with "PAL confirms that due to diverging interests and imperfect monitoring, AI agents could get some rents." I'd still like to understand why imperfect monitoring could lead to rents, because I don't currently know a model that clearly shows this (i.e., where the rent isn't due to the agent having some other kind of advantage, like not having many competitors). Also, I get that the PAL in its current form may not be directly relevant to AI, so I'm just trying to understand it on its own terms for now. Possibly I should just dig into the literature myself...