I go to Amazon, search for “air conditioner”, and sort by average customer rating. There’s a couple pages of evaporative coolers (not what I’m looking for), one used window unit (?), and then this:
Average rating: 4.7 out of 5 stars.
However, this air conditioner has a major problem. Take a look at this picture:
Key thing to notice: there is one hose going to the window. Only one.
Why is that significant?
Here’s how this air conditioner works. It sucks in some air from the room. It splits that air into two streams, and pumps heat from one stream to the other - making some air hotter, and some air cooler. The cool air, it blows back into the room. The hot air, it blows out the window.
See the problem yet?
Air is blowing out the window. In order for the room to not end up a vacuum, air has to come back into the room from outside. In practice, houses are very not airtight (we don’t want to suffocate), so air from outside will be pulled in through lots of openings throughout the house. And presumably that air being pulled in from outside is hot; one typically does not use an air conditioner on cool days.
The actual effect of this air conditioner is to make the space right in front of the air conditioner nice and cool, but fill the rest of the house with hot outdoor air. Probably not what one wants from an air conditioner!
Ok, that’s amusing, but the point of this post is not physics-101 level case studies in how not to build an air conditioner. The real fact of interest is that this is apparently the top rated new air conditioner on Amazon. How does such a bad design end up so popular?
One aspect of the story, presumably, is fake reviews. That phenomenon is itself a rich source of insight, but not the point of this post, and definitely not enough to account for the popularity of this air conditioner. The reviews shown on the product page are all “verified purchase”, and mostly 5-stars. There are only 4 one-star reviews (out of 104). If most customers noticed how bad this air conditioner is, I do not think a 4.7 rating would be sustainable. Customers actually do like this air conditioner.
And hey, this air conditioner has a lot going for it! There’s wheels on the bottom, so it’s very portable. Setup is super easy - only one hose to the window, much less fiddly than those two-hose designs where you attach one hose and the other pops off.
Sure, the air conditioner has a major problem, but it’s not a major problem which most people will notice. They may notice that most of the house is still hot, but the space right in front of the air conditioner will be cool, so obviously the air conditioner is doing its job. Very few people will realize that the air conditioner is drawing hot air into the rest of the house. (Indeed, I saw zero reviews which mentioned that the air conditioner pulls hot air into the house - even the 1-star reviewers apparently did not realize why the air conditioner was so bad.)
[EDIT: several commenters seem to think that I'm claiming this air conditioner does not work at all, so I want to clarify that it will still cool down a room on net. If the air inside is all perfectly mixed together, it will still end up cooler with the air conditioner than without. The point is not that it doesn't work at all. The point is that it's stupidly inefficient in a way which I do not think consumers would plausibly choose over the relatively-low cost of a second hose if they recognized the problems.]
Major problems are only fixed when those problems are obvious. Problems which most people won’t notice (or won’t attribute correctly) tend to stick around. There’s no economic incentive to fix them.
And in practice, there are plenty of problems which most people won’t notice. A few more examples:
- Most charities have pretty mediocre impact. But the actual impact is very-not-visible to the person making donations, so people keep donating. (Also people care about things besides impact, but nonetheless I doubt low-impact charities would survive if their ineffectiveness were generally obvious.)
- Medical research has a replication rate below 50%. But when the effect sizes are expected to be small anyways, it’s hard to tell whether it’s working, so doctors (and patients) keep using crap treatments.
- Based on my firsthand experience with the B2B software industry, success is mostly determined by how good the product looks to managers making the decision to purchase. Successful B2B software (think “enterprise software”) is usually crap, but has great salespeople and great dashboards for the managers.
… and presumably this extends to lots of other industries which I’m less familiar with.
Two points to highlight here:
- Regulation does not fix the problem, just moves it from the consumer to the regulator. A regulator will only regulate a problem which is obvious to the regulator. A regulator may sometimes have more expertise than a layperson, but even that requires that the politicians ultimately appointing people can distinguish real from fake expertise, which is hard in general.
- Waiting longer does not fix the problem. All those people who did not notice their air conditioner pulling hot air into the house will not start noticing if we just wait a few years. Problems do not automatically become obvious over time.
How Does This Relate To Takeoff Speeds?
There’s a common view that, as long as AI does not take off too quickly, we’ll have time to see what goes wrong and iterate on it. It's a view with a lot of intuitive outside-view appeal: AI will work just like other industries. We try stuff, see what goes wrong, fix it. It worked like that in all the other industries, presumably it will work like that in AI too.
The point of the air conditioner is that other industries do not, in fact, work like that. Other industries are absolutely packed with major problems which are not fixed because they’re not obvious. Even assuming that AI does not take off quickly (itself a dubious assumption at best), we should expect the same to be true of AI.
… But Won’t Big Problems Be Obvious?
Most industries have major problems which aren’t fixed because they’re not obvious. But these problems can only be so bad. If they were really disastrous, the disasters would be obvious. Why not expect the same from AI?
Because AI will eventually be far more capable than human industries. It will, by default, optimize way harder than human industries are capable of optimizing.
What does it look like, when the optimization power is turned up to 11 on something like the air conditioner problem? Well, it looks really good. But all the resources are spent on looking good, not on actually being good. It’s “Potemkin village world”: a world designed to look amazing, but with nothing behind the facade. Maybe not even any living humans behind the facade - after all, even generally-happy real humans will inevitably sometimes appear less-than-maximally “good”.
… But Isn’t Solving The Obvious Problems Still Valuable?
The nonobvious problems are the whole reason why AI alignment is hard in the first place.
Think about the “game tree” of alignment - the basic starting points, how they fail, what strategies address the failures, how those fail, etc. The most basic starting points are generally of the form “collect data from humans on which things are good/bad, then train something to do good stuff and avoid bad stuff”. Assuming such a strategy could be implemented efficiently, why would it fail? Well:
- In cases where humans label bad things as “good”, the trained system will also be selected to label bad things as “good”. In other words, the trained AI will optimize for things which look “good'' to humans, even when those things are not very good.
- The trained system will likely end up implementing strategies which do “good”-labeled things in the training environment, but those strategies will not necessarily continue to do the things humans would consider “good” in other environments.
(Somewhat more detail on these failure modes here.) Optimizing for things which look “good” to humans obviously raises exactly the sort of failure which the air conditioner points to. Failure of systems to generalize in “good” ways is less centrally about obviousness, but note that if it were obvious that the system were going to generalize badly, this would also be a pretty easy issue to solve: just don’t deploy the system if it will generalize badly. Problem is, we can’t tell whether a system will do what we want in deployment just by looking at what it does in training; we can’t tell by looking at the system's behavior whether there’s problems in there.
Point is: problems which are highly visible to humans are already easy, from an alignment perspective. They will probably be solved by default. There’s not much marginal value in dealing with them. The value is in dealing with the problems which are hard to recognize.
Corollary: alignment is not importantly easier in slow-takeoff worlds, at least not due to the ability to iterate. The hard parts of the alignment problem are the parts where it’s nonobvious that something is wrong. That’s true regardless of how fast takeoff speeds are. And the ability to iterate does not make that hard part easier. Iteration mainly helps on the parts of the problem which were already easy anyway.
So I don't really care about takeoff speeds. The technical problems are basically similar either way.
... though admittedly I did not actually learn everything I need to know about takeoff speeds just from air conditioner ratings on Amazon. It took a lot of examples in different industries. Fortunately, there was no shortage of examples to hammer the idea into my head.