Oliver Habryka
2413 Martin Luther King Jr Way, Berkeley, CA 94704, USA

Coding day in and out on LessWrong 2.0. You can reach me at habryka@lesswrong.com

Wiki Contributions


High-stakes alignment via adversarial training [Redwood Research report]

One of the primary questions that comes to mind for me is "well, did this whole thing actually work?". If I understand the paper correctly, while we definitely substantially decreased the fraction of random samples that got misclassified (which always seemed very likely to happen, and I am indeed a bit surprised at only getting it to move ~3 OOMs, which my guess is mostly capability related, since you used small models), we only doubled the amount of effort necessary to generate an adversarial counterexample. 

A doubling is still pretty substantial, and stacking a few doublings on top of each other could potentially result in safety guarantees, but my guess is the first doubling is a lot cheaper to get, and so I feel tempted to say that at least the set of techniques explored here, did not substantially increase adversarial robustness, unless I am missing something.

This doesn't mean that there isn't more potential work in this space that maybe does increase adversarial robustness, and I might be misreading the paper, but when I first heard about this project, I was definitely hoping about some approach that might increase robustness by many orders of magnitude, making it impossible for an unassisted person to generate a counterexample, and very very difficult to generate a counterexample even with transparency tools. But it seems like the difficulty of a human generating a counterexample stayed mostly the same, which was (at least to me) the primary outcome variable I was interested in. 

I am curious whether any of the authors disagree with this assessment. My guess is most of you must, since neither the post nor the paper seems to talk about this very much. 

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

This article explains the difference: https://www.consumeranalysis.com/guides/portable-ac/best-portable-air-conditioner/

For example, a 14,000 BTU model that draws 1,400 watts of power on maximum settings would have an EER of 10.0 as 14,000/1,400 = 10.0.

A 14,000 BTU unit that draws 1200 watts of power would have an EER of 11.67 as 14,000/1,200 = 11.67.

Taken at face value, this looks like a good and proper metric to use for energy efficiency. The lower the power draw (watts) compared to the cooling capacity (BTUs/hr), the higher the EER. And the higher the EER, the better the energy efficiency.

Thus, if we were to look at the EER of the two example units above we could easily say that the second has better energy efficiency because it has a higher EER – 11.67 compared to 10.0.

However, taking into account what you’ve learned so far about the old method used to determine cooling capacity (standard BTUs) vs the new method used to do so (SACC), you should be able to spot one major problem with EER. That’s right. It uses standard BTU’s – yes, the old BTU metric – in its equation.

The Department of Energy also recognized this issue with EER and acted accordingly by instituting a new metric by which to determine a portable AC unit’s energy efficiency.

That metric is called CEER – Combined Energy Efficiency Ratio.

Unfortunately, CEER is a lot more complicated than EER. The new energy efficiency ratio could have simply involved taking SACC and dividing it by maximum power draw on cooling mode in watts. But the DOE decided that the equation needed a little bit more nuance than that. Let’s take a look at the end result:

EER measures performance in BTUs, which are simply measuring how much work the AC performs, without taking into account any backflow of cold air back into the AC, or infiltration issues.

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

EER does not account for heat infiltration issues, so this seems confused. CEER does, and that does suggest something in the 20% range, but I am pretty sure you can't use EER to compare a single-hose and a dual-hose system.

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

I think that paragraph is discussing a second reason that infiltration is bad.

Yeah, sorry, I didn't mean to imply the section is saying something totally wrong. The section just makes it sound like that is the only concern with infiltration, which seems wrong, and my current model of the author of the post is that they weren't actually thinking through heat-related infiltration issues (though it's hard to say from just this one paragraph, of course). 

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

My overall take on this post and comment (after spending like 1.5 hours reading about AC design and statistics): 

Overall I feel like both the OP and this reply say some wrong things. The top Wirecutter recommendation is a dual-hose design. The testing procedure of Wirecutter does not seem to address infiltration in any way, and indeed the whole article does not discuss infiltration as it relates to cooling-efficiency. 

Overall efficiency loss from going to dual to single is something like 20-30%, which I do think is much lower than I think the OP implied, though it also is quite substantial, and indeed most of the top-ranked Amazon listings do not use any of the updated measurements that Paul is talking about, and so consumers do likely end up deceived about that. 

The top-rated AC Wentworth links to is really very weak if you take into account those losses, and I would be surprised if it adequately cooled people's homes. 

My current model: Wirecutter is doing OK but really not great here (with an actively confused testing procedure), Amazon ratings are indeed performing quite badly, and basically display most of the problems that Wentworth talks about. It's unclear whether AC companies are optimizing for the Amazon or the Wirecutter case here, which would require me seeing a trend in SACC over time. I expect many, possibly most consumers, are indeed being fooled into thinking their AC is 20-30% more effective than they think it is, because of infiltration issues. 

Separately, it's possible to me that consumers actually prefer a world where an AC doesn't cool their whole home, but creates a temperature gradient across the home. In the case of heating, I strongly prefer stoves and fireplaces that produce local heating (and substantially radiation heating), despite similar infiltration issues, because I like being able to move closer and farther away from the source of the heat to locally adjust temperature preferences, and I often live/work with people who have substantially different temperature preferences to me.

Overall, If I were to buy a portable AC today, I would choose Wirecutter's top pick, which happens to be a two-hose design. If that wasn't the case, I would probably still just go with the single-hose design, because I actually like temperature gradients. If I cared about cooling a whole building (which historically hasn't been the case), I would probably go with a dual-hose design. Reading Amazon reviews and ratings would have probably made my decision-making here marginally worse. Following the Wirecutter recommendation would have been OK but not great. It mostly would have pointed me towards the SACC as a better rating, and I probably would have figured out to buy an AC with a high SACC rating, and not historical BTU rating. 

My current guess is that most consumers are probably still deceived by non-SACC BTU ratings, but the situation seems to be getting better due to regulations.

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

The best thing we took away from our tests was the chance at a direct comparison between a single-hose design and a dual-hose design that were otherwise identical, and our experience confirmed our suspicions that dual-hose portable ACs are slightly more effective than single-hose models but not effective enough to make a real difference

After having looked into this quite a bit, it does really seem like the Wirecutter testing process had no ability to notice infiltration issues, so it seems like the Wirecutter crew themselves is kind of confused here? 

The... Wirecutter article does also not seem to discuss the issue of infiltration of hot air in any reasonable way. Instead it just says that: 

This produces a slight vacuum effect, which pulls in “infiltration air” from anywhere it can in order to equalize the pressure. In the presence of a gas-powered device such as a furnace, that negative pressure creates a backdraft or downdraft, which can cause the machine to malfunction—or worse, fill the room with gas fumes and carbon monoxide. We don’t think that most people plan to use their portable AC in such a room, but if your home is set up in such a way that you’re concerned about ventilation, skip the rest of our recommendations

Which... seems to misunderstand the actual problem of infiltration as it relates to heating efficiency? This is the only mention of the word infiltration in the whole article, and I can't find any other section that discusses infiltration problems in other words. 

Indeed, the article says directly: 

Although our testing has shown that dual-hose models tend to outperform some single-hose units in extremely hot or muggy weather, the difference is usually minimal, and we don’t think it outweighs the convenience of a single hose.

Based on a testing methodology that is very unlikely to be able to measure the primary issue with single-hose vs. double-hose designs. This seems like Wirecutter is directly falling prey to the exact problem the OP is describing. 

I feel a bit confused how the official SACC measures account for infiltration, but assuming they are doing it properly, the overall difference between single-hose and dual-hose designs does only seem to be something like 20%. I do expect the current Wirecutter tests to fail to measure that difference, but am also not sure that 20% is really worth the loss of convenience from having a single hose.

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

A 2-hose unit will definitely cool more efficiently, but I think for many people who are using portable units it's the right tradeoff with convenience. The wirecutter reviews both types of units together and usually end up preferring 1-hose units.

It is important to note that the current top wirecutter pick is a 2-hose unit, though one that combined the two hoses into one big hose. I guess maybe that is recent, but it does seem important to acknowledge here (and it wouldn't surprise me that much if Wirecutter went through reasoning pretty similar to the one in this post, and then updated towards the two-hose unit because of concerns about infiltration and looking at more comprehensive metrics like SACC). 

A broad basin of attraction around human values?

Mod note: I reposted this post to the frontpage, because it wasn't actually shown on a frontpage due to an interaction with the GreaterWrong post-submission interface. It seemed like a post many people are interested in, and it seemed like it didn't really get the visibility it deserved.

Late 2021 MIRI Conversations: AMA / Discussion

Relevant Feynman quote: 

I had a scheme, which I still use today when somebody is explaining something that I’m trying to understand: I keep making up examples.

For instance, the mathematicians would come in with a terrific theorem, and they’re all excited. As they’re telling me the conditions of the theorem, I construct something which fits all the conditions. You know, you have a set (one ball)-- disjoint (two balls). Then the balls turn colors, grow hairs, or whatever, in my head as they put more conditions on.

Finally they state the theorem, which is some dumb thing about the ball which isn’t true for my hairy green ball thing, so I say “False!” [and] point out my counterexample.

Instrumental Convergence For Realistic Agent Objectives


No real power-seeking tendencies if we only plausibly will specify a negative vector.

Seems like two sentences got merged together.

Load More