apc - AI Alignment Forum

Glad to see this published—nice work!

[AN #55] Regulatory markets and international standards as a means of ensuring beneficial AI

Re Regulatory markets for AI safety: You say that the proposal doesn’t seem likely to work if “alignment is really hard and we only get one shot at it” (i.e. unbounded maximiser with discontinuous takeoff). Do you expect that status-quo government regulation would do any better, or just that any regulation wouldn’t be helpful in such a scenario? My intuition is that even if alignment is really hard, regulation could be helpful e.g. by reducing races to the bottom, and I’d rather have a more informed group (like people from a policy and technical safety team at a top lab) implementing it instead of a less-informed government agency. I’m also not sure what you mean by legible regulation.

Cortés, Pizarro, and Afonso as Precedents for Takeover

apc4y80

Is this a fair description of your disagreement re the 90% argument?

Daniel thinks that a 90% reduction in the population of a civilization corresponds to a ~90% reduction in their power/influentialness. Because the Americans so greatly outnumbered the Spanish, this ten-fold reduction in power/influentialness doesn’t much alter the conclusion.

Matthew thinks that a 90% reduction in the population of a civilization means that “you don’t really have a civilization”, which I interpret to mean something like a ~99.9%+ reduction in the power/influentialness of a civilization, which occurs mainly through a reduction in their ability to coordinate (e.g. “chain of command in ruins”). This is significant enough to undermine the main conclusion.

If this is accurate, would a historical survey of the power/influentialness of civilisations after they lose 90% of the population (inasmuch as these cases exist) resolve the disagreement?

What can the principal-agent literature tell us about AI risk?

apc4y20

I agree that this seems like a promising research direction! I think this would be done best while also thinking about concrete traits of AI systems, as discussed in this footnote. One potential beneficial outcome would be to understand which kind of systems earn rents and which don't; I wouldn't be surprised if the distinction between rent earning agents vs others mapped pretty cleanly onto a Bostromian utility maximiser vs CAIS distinction, but maybe it won't.

In any case, the alternative perspective offered by the agency rents framing compared to typical AI alignment discussion could help generate interesting new insights.

What can the principal-agent literature tell us about AI risk?

apc4y50

The claim that this couldn't work because such models are limited seems just arbitrary and wrong to me.

The economists I spoke to seemed to think that in agency unawareness models conclusions follow pretty immediately from the assumptions and so don't teach you much. It's not that they can't model real agency problems, just that you don't learn much from the model. Perhaps if we'd spoken to more economists there would have been more disagreement on this point.

What can the principal-agent literature tell us about AI risk?

apc4y30

Thanks for catching this! You’re correct that that sentence is inaccurate. Our views changed while iterating the piece and that sentence should have been changed to: “PAL confirms that due to diverging interests and imperfect monitoring, AI agents could get some rents.”

This sentence too: “Overall, PAL tells us that agents will inevitably extract some agency rents…” would be better as “Overall, PAL is consistent with AI agents extracting some agency rents…”

I’ll make these edits, with a footnote pointing to your comment.

The main aim of that section was to point out that Paul’s scenario isn’t in conflict with PAL. Without further research, I wouldn’t want to make strong claims about what PAL implies for AI agency rents because the models are so brittle and AIs will likely be very different to humans; it’s an open question.

For there to be no agency rents at all, I think you’d need something close to perfect competition between agents. In practice the necessary conditions are basically never satisfied because they are very strong, so it seems very plausible to me that AI agents extract rents.

Re monopoly rents vs agency rents: Monopoly rents refer to the opposite extreme with very little competition, and in the economics literature is used when talking about firms, while agency rents are present whenever competition and monitoring are imperfect. Also, agency rents refer specifically to the costs inherent to delegating to an agent (e.g. an agent making investment decisions optimising for commission over firm profit) vs the rents from monopoly power (e.g. being the only firm able to use a technology due to a patent). But as you say, it's true that lack of competition is a cause of both of these.

AI ALIGNMENT FORUM
AF

Posts

Wiki Contributions

Comments