Matthew Barnett

Just someone who wants to learn about the world. I think about AI risk sometimes, but I still have a lot to learn.

I also change my views often. Anything I wrote that's more than 10 days old should be treated as potentially outdated.

Comments

What specific dangers arise when asking GPT-N to write an Alignment Forum post?
To me the most obvious risk (which I don't ATM think of as very likely for the next few iterations, or possibly ever, since the training is myopic/SL) would be that GPT-N in fact is computing (e.g. among other things) a superintelligent mesa-optimization process that understands the situation it is in and is agent-y.

Do you have any idea of what the mesa objective might be. I agree that this is a worrisome risk, but I was more interested in the type of answer that specified, "Here's a plausible mesa objective given the incentives." Mesa optimization is a more general risk that isn't specific to the narrow training scheme used by GPT-N.

Modelling Continuous Progress
Second, the major disagreement is between those who think progress will be discontinuous and sudden (such as Eliezer Yudkowsky, MIRI) and those who think progress will be very fast by normal historical standards but continuous (Paul Chrisiano, Robin Hanson).

I'm not actually convinced this is a fair summary of the disagreement. As I explained in my post about different AI takeoffs, I had the impression that the primary disagreement between the two groups was over locality rather than the amount of time takeoff lasts. Though of course, I may be misinterpreting people.

Possible takeaways from the coronavirus pandemic for slow AI takeoff

I tend to think that the pandemic shares more properties with fast takeoff than it does with slow takeoff. Under fast takeoff, a very powerful system will spring into existence after a long period of AI being otherwise irrelevant, in a similar way to how the virus was dormant until early this year. The defining feature of slow takeoff, by contrast, is a gradual increase in abilities from AI systems all across the world.

In particular, I object to this portion of your post,

The "moving goalposts" effect, where new advances in AI are dismissed as not real AI, could continue indefinitely as increasingly advanced AI systems are deployed. I would expect the "no fire alarm" hypothesis to hold in the slow takeoff scenario - there may not be a consensus on the importance of general AI until it arrives, so risks from advanced AI would continue to be seen as "overblown" until it is too late to address them.

I'm not convinced that these parallels to COVID-19 are very informative. Compared to this pandemic, I expect the direct effects of AI to be very obvious to observers, in a similar way that the direct effects of cars are obvious to people who go outside. Under a slow takeoff, AI will already be performing a lot of important economic labor before the world "goes crazy" in the important senses. Compare to the pandemic, in which

  • It is not empirically obvious that it's worse than a seasonal flu (we only know that it is due to careful data analysis after months of collection).
  • It's not clearly affecting everyone around you in the way that cars, computers, software, and other forms of engineering are.
  • Is considered natural, and primarily affects old people who are conventionally considered to be less worthy of concern (though people give lip service denying this).
An Analytic Perspective on AI Alignment
weaker claim?

Oops yes. That's the weaker claim, that I agree with. The stronger claim is that because we can't understand something "all at once" then mechanistic transparency is too hard and so we shouldn't take Daniel's approach. But the way we understand laptops is also in a mechanistic sense. No one argues that because laptops are too hard to understand all at once, then we should't try to understand them mechanistically.

This seems to be assuming that we have to be able to take any complex trained AGI-as-a-neural-net and determine whether or not it is dangerous. Under that assumption, I agree that the problem is itself very hard, and mechanistic transparency is not uniquely bad relative to other possibilities.

I didn't assume that. I objected to the specific example of a laptop as an instance of mechanistic transparency being too hard. Laptops are normally understood well because understanding can be broken into components and built up from abstractions. But each our understanding of each component and abstraction is pretty mechanistic -- and this understanding is useful.

Furthermore, because laptops did not fall out of the sky one day, but instead slowly built over successive years of research and development, it seems like a great example of how Daniel's mechanistic transparency approach does not rely on us having to understand arbitrary systems. Just as we built up an understanding of laptops, presumably we could do the same with neural networks. This was my interpretation of why he is using Zoom In as an example.

All of the other stories for preventing catastrophe that I mentioned in the grandparent are tackling a hopefully easier problem than "detect whether an arbitrary neural net is dangerous".

Indeed, but I don't think this was the crux of my objection.

An Analytic Perspective on AI Alignment
I'd be shocked if there was anyone to whom it was mechanistically transparent how a laptop loads a website, down to the gates in the laptop.

Could you clarify why this is an important counterpoint. It seems obviously useful to understand mechanistic details of a laptop in order to debug it. You seem to be arguing the [ETA: weaker] claim that nobody understands the an entire laptop "all at once", as in, they can understand all the details in their head simultaneously. But such an understanding is almost never possible for any complex system, and yet we still try to approach it. So this objection could show that mechanistic transparency is hard in the limit, but it doesn't show that mechanistic transparency is uniquely bad in any sense. Perhaps you disagree?

Cortés, Pizarro, and Afonso as Precedents for Takeover

For my part, I think you summarized my position fairly well. However, after thinking about this argument for another few days, I have more points to add.

  • Disease seems especially likely to cause coordination failures since it's an internal threat rather than an external threat (which unlike internal threats, tend to unite empires). We can compare the effects of the smallpox epidemic in the Aztec and Inca empires alongside other historical diseases during wartime, such as the Plauge of Athens which arguably is what caused Athens to lose the Peloponnesian War.
  • Along these same lines, the Aztec/Inca didn't have any germ theory of disease, and therefore didn't understand what was going on. They may have thought that the gods were punishing them for some reason, and therefore they probably spent a lot of time blaming random groups for the catastrophe. We can contrast these circumstances to eg. the Paraguayan War which killed up to 90% of the male population, but people probably had a much better idea what was going on and who was to blame, so I expect that the surviving population had an easier time coordinating.
  • A large chunk of the remaining population likely had some sort of disability. Think of what would happen if you got measles and smallpox in the same two year window: even if you survived it probably wouldn't look good. This means that the pure death rate is an underestimate of the impact of a disease. The Aztecs, for whom "only" 40 percent died of disease, were still greatly affected
It killed many of its victims outright, particularly infants and young children. Many other adults were incapacitated by the disease – because they were either sick themselves, caring for sick relatives and neighbors, or simply lost the will to resist the Spaniards as they saw disease ravage those around them. Finally, people could no longer tend to their crops, leading to widespread famine, further weakening the immune systems of survivors of the epidemic. [...] a third of those afflicted with the disease typically develop blindness.
Cortés, Pizarro, and Afonso as Precedents for Takeover
Later, other Europeans would come along with other advantages, and they would conquer India, Persia, Vietnam, etc., evidence that while disease was a contributing factor (I certainly am not denying it helped!) it wasn't so important a factor as to render my conclusion invalid (my conclusion, again, is that a moderate technological and strategic advantage can enable a small group to take over a large region.)

Europeans conquered places such as India, but that was centuries later, after they had a large technological advantage, and they also didn't come with just a few warships either: they came with vast armadas. I don't see why that supports the point that a small group can take over a large region?

Cortés, Pizarro, and Afonso as Precedents for Takeover
I really don't think the disease thing is important enough to undermine my conclusion. For the two reasons I gave: One, Afonso didn't benefit from disease

This makes sense, but I think the case of Afonso is sufficiently different from the others that it's a bit of a stretch to use it to imply much about AI takeovers. I think if you want to make a more general point about how AI can be militarily successful, then a better point of evidence is a broad survey of historical military campaigns. Of course, it's still a historically interesting case to consider!

two, the 90% argument: Suppose there was no disease but instead the Aztecs and Incas were 90% smaller in population and also in the middle of civil war. Same result would have happened, and it still would have proved my point.

Yeah but why are we assuming that they are still in the civil war? Call me out if I'm wrong here, but your thesis now seems to be: if some civilization is in complete disarray, then a well coordinated group of slightly more advanced people/AI can take control of the civilization.

This would be a reasonable thesis, but it doesn't shed too much light on AI takeovers. The important part lies in the "if some civilization is in complete disarray" conditional, and I think it's far from obvious that AI will emerge in such a world, unless some other more important causal factor already occurred that gave rise to the massive disarray in the first place. But even in that case, don't you think we should focus on that thing that caused the disarray instead?

Cortés, Pizarro, and Afonso as Precedents for Takeover
I agree that it would be good to think about how AI might create devastating pandemics. I suspect it wouldn't be that hard to do, for an AI that is generally smarter than us. However, I think my original point still stands.

It's worth clarifying exactly what "original point" stands because I'm currently unsure.

I don't get why you think a small technologically primitive tribe could take over the world if they were immune to disease. Seems very implausible to me.

Sorry, I meant to say, "Were immune to diseases that were currently killing everyone else." If everyone is dying around you, then your level of technology doesn't really matter that much. You just wait for your enemy to die and then settle the land after they are gone. This is arguably what Europeans did in America. My point is that by focusing on technology, you are missing the main reason for the successful conquest.

But I don't want to do this yet, because it seems to me that even with disease factored in, "most" of the "credit" for Cortes and Pizarro's success goes to the factors I mentioned.
After all, suppose the disease reduced the on-paper strength of the Americans by 90%. They were still several orders of magnitude stronger than Cortes and Pizarro. So it's still surprising that Cortes/Pizarro won... until we factor in the technological and strategic advantages I mentioned.

I feel like you don't actually have a civilization if 90% of your people died. I think it's more fair to say that when 90% of your people die, your civilization basically stops existing rather than just being weakened. For example, I can totally imagine an Incan voyage to Spain conquering Madrid if 90% of the Spanish died. Their chain of command would be in complete shambles. It wouldn't just be like some clean 90% reduction in GDP with everything else held constant.

But the civilizations wouldn't have been destroyed without the Spaniards. (I might be wrong about this, but... hadn't the disease mostly swept through Inca territory by the time Pizarro arrived? So clearly their civilization had survived.)
I think I am somewhat close to being convinced by your criticism, at least when phrased in the way you just did: "your thesis is trivial!" But I'm not yet convinced, because of my argument about the 90% reduction. (I keep making the same argument basically in response to all your points; it is the crux for me I think.)

Look, if 90% of a country dies of a disease, and then the surviving 10% become engulfed in a civil war, and then some military group who is immune to the disease comes in and takes the capital city during this all, don't you think it's very misleading to conclude "A small group of people with a slight military advantage can take over a large civilization" without heavily emphasizing the whole 90% of people dying of a disease part? This is the heart of my critique.

Load More