All of habryka's Comments + Replies

Oh, I do think a bunch of my problems with WebGPT is that we are training the system on direct internet access.

I agree that "train a system with internet access, but then remove it, then hope that it's safe", doesn't really make much sense. In-general, I expect bad things to happen during training, and separately, a lot of the problems that I have with training things on the internet is that it's an environment that seems like it would incentivize a lot of agency and make supervision really hard because you have a ton of permanent side effects.

2Rohin Shah23d
Oh you're making a claim directly about other people's approaches, not about what other people think about their own approaches. Okay, that makes sense (though I disagree). I was suggesting that the plan was "train a system without Internet access, then add it at deployment time" (aka "box the AI system during training"). I wasn't at any point talking about WebGPT.

Here is an example quote from the latest OpenAI blogpost on AI Alignment:

Language models are particularly well-suited for automating alignment research because they come “preloaded” with a lot of knowledge and information about human values from reading the internet. Out of the box, they aren’t independent agents and thus don’t pursue their own goals in the world. To do alignment research they don’t need unrestricted access to the internet. Yet a lot of alignment research tasks can be phrased as natural language or coding tasks.

This sounds super straig... (read more)

3Rohin Shah24d
The immediately preceding paragraph is: I would have guessed the claim is "boxing the AI system during training will be helpful for ensuring that the resulting AI system is aligned", rather than "after training, the AI system might be trying to pursue its own goals, but we'll ensure it can't accomplish them via boxing". But I can see your interpretation as well.

I think the smiling example is much more analogous than you are making it out here. I think the basic argument for "this just encourages taking control of the reward" or "this just encourages deception" goes through the same way.

Like, RLHF is not some magical "we have definitely figured out whether a behavior is really good or bad" signal, it's historically been just some contractors thinking for like a minute about whether a thing is fine. I don't think there is less bayesian evidence conveyed by people smiling (like, the variance in smiling is greater th... (read more)

and in particular the abstraction which it seems John is using, where making progress on outer alignment makes almost no difference to inner alignment

I am confused. How does RLHF help with outer alignment? Isn't optimizing fur human approval the classical outer-alignment problem? (e.g. tiling the universe with smiling faces)

I don't think the argument for RLHF runs through outer alignment. I think it has to run through using it as a lens to study how models generalize, and eliciting misalignment (i.e. the points about empirical data that you mentioned, I just don't understand where the inner/outer alignment distinction comes from in this context)

2Richard Ngo1mo
RLHF helps with outer alignment because it leads to rewards which more accurately reflect human preferences than the hard-coded reward functions (including the classic specification gaming examples [] , but also intrinsic motivation functions like curiosity and empowerment []) which are used to train agents in the absence of RLHF. The smiley faces example feels confusing as a "classic" outer alignment problem because AGIs won't be trained on a reward function anywhere near as limited as smiley faces. An alternative like "AGIs are trained on a reward function in which all behavior on a wide range of tasks is classified by humans as good or bad" feels more realistic, but also lacks the intuitive force of the smiley face example - it's much less clear in this example why generalization will go badly, given the breadth of the data collected.

I agree that having many shots is helpful, but lacking them is not the core difficulty (just as having many shots to launch a rocket doesn't help you very much if you have no idea how rockets work).

I do really feel like it would have been really extremely hard to build rockets if we had to get it right on the very first try.

I think for rockets the fact that it is so costly to experiment with stuff, explains the majority of the difficulty of rocket engineering. I agree you also have very little chance to build a successful space rocket without having a g... (read more)

1David Scott Krueger1mo
Well you could probably build a rocket that looks like it works, anyways. Could you build one you would want to try to travel to the moon in? (Are you imagining you get to fly in these rockets? Or just launch and watch from ground? I was imagining the 2nd...)

At a sufficiently high level of abstraction, I agree that "cost of experimenting" could be seen as the core difficulty. But at a very high level of abstraction, many other things could also be seen as the core difficulty, like "our inability to coordinate as a civilization" or "the power of intelligence" or "a lack of interpretability", etc. Given this, John's comment seemed like mainly rhetorical flourishing rather than a contentful claim about the structure of the difficult parts of the alignment problem.

Also, I think that "on our first try" thing isn't ... (read more)

I think the story would be way different if the actual risk posed by WebGPT was meaningful (say if it were driving >0.1% of the risk of OpenAI's activities).

Huh, I definitely expect it to drive >0.1% of OpenAI's activities. Seems like the WebGPT stuff is pretty close to commercial application, and is consuming much more than 0.1% of OpenAI's research staff, while probably substantially increasing OpenAI's ability to generally solve reinforcement learning problems. I am confused why you would estimate it at below 0.1%. 1% seems more reasonable to m... (read more)

I think the direct risk of OpenAI's activities is overwhelmingly dominated by training new smarter models and by deploying the public AI that could potentially be used in unanticipated ways.

I agree that if we consider indirect risks broadly (including e.g. "this helps OpenAI succeed or raise money and OpenAI's success is dangerous") then I'd probably move back towards "what % of OpenAI's activities is it."

I believe the most important drivers of catastrophic misalignment risk are models that optimize in ways humans don't understand or are deceptively aligned. So the great majority of risk comes from actions that accelerate those events, and especially making models smarter. I think your threat model here is quantitatively wrong, and that it's an important disagreement.

I agree with this! But I feel like this kind of reinforcement learning on a basically unsupervisable action-space while interfacing with humans and getting direct reinforcement on approval i... (read more)

But people attempting to box smart unaligned AIs, or believing that boxed AIs are significantly safer because they can't access the internet, seems to me like a bad situation. An AI smart enough to cause risk with internet access is very likely to be able to cause risk anyway, and at best you are creating a super unstable situation where a lab leak is catastrophic.

I do think we are likely to be in a bad spot, and talking to people at OpenAI, Deepmind and Anthropic (e.g. the places where most of the heavily-applied prosaic alignment work is happening), I... (read more)

6Rohin Shah1mo
... Who are you talking to? I'm having trouble naming a single person at either of OpenAI or Anthropic who seems to me to be interested in extensive boxing (though admittedly I don't know them that well). At DeepMind there's a small minority who think about boxing, but I think even they wouldn't think of this as a major aspect of their plan. I agree that they aren't aiming for a "much more comprehensive AI alignment solution" in the sense you probably mean it but saying "they rely on boxing" seems wildly off. My best-but-still-probably-incorrect guess is that you hear people proposing schemes that seem to you like they will obviously not work in producing intent aligned systems and so you assume that the people proposing them also believe that and are putting their trust in boxing, rather than noticing that they have different empirical predictions about how likely those schemes are to produce intent aligned systems.

If you thought that researchers working on WebGPT were shortening timelines significantly more efficiently than the average AI researcher, then the direct harm starts to become relevant compared to opportunity costs.

Yeah, my current model is that WebGPT feels like some of the most timelines-reducing work that I've seen (as has most of OpenAIs work). In-general, OpenAI seems to have been the organization that has most shortened timelines in the last 5 years, with the average researcher seeming ~10x more efficient at shortening timelines than even researc... (read more)

I think almost all of the acceleration comes from either products that generate $ and hype and further investment, or more directly from scaleup to more powerful models. I think "We have powerful AI systems but haven't deployed them to do stuff they are capable of" is a very short-term kind of situation and not particularly desirable besides.

I'm not sure what you are comparing RLHF or WebGPT to when you say "paradigm of AIs that are much harder to align." I think I probably just think this is wrong, in that (i) you are comparing to pure generative modeling... (read more)

I moved that thread over the AIAF as well!

Yeah, I agree that I am doing reasoning on people's motivations here, which is iffy and given the pushback I will be a bit more hesitant to do, but also like, in this case reasoning about people's motivations is really important, because what I care about is what the people working at OpenAI will actually do when they have extremely powerful AI in their hands, and that will depend a bunch on their motivations.

I am honestly a bit surprised to see that WebGPT was as much driven by people who I do know reasonably well and who seem to be driven primarily by sa... (read more)

I don't think "your AI wants to kill you but it can't get out of the box so it helps you with alignment instead" is the mainline scenario. You should be building an AI that wouldn't stab you if your back was turned and it was holding a knife, and if you can't do that then you should not build the AI.

That's interesting. I do think this is true about your current research direction (which I really like about your research and I do really hope we can get there), but when I e.g. talk to Carl Shulman he (if I recall correctly) said things like "we'll just h... (read more)

6Paul Christiano1mo
Even in those schemes, I think the AI systems in question will have much better levers for causing trouble than access to the internet, including all sorts of internal access and their involvement in the process of improving your AI (and that trying to constrain them so severely would mean increasing their intelligence far enough that you come out behind). The mechanisms making AI uprising difficult are not mostly things like "you are in a secure box and can't get out," they are mostly facts about all the other AI systems you are dealing with. That said, I think you are overestimating how representative these are of the "mainline" hope most places, I think the goal is primarily that AI systems powerful enough to beat all of us combined come after AI systems powerful enough to greatly improve the situation. I also think there are a lot of subtle distinctions about how AI systems are trained that are very relevant to a lot of these stories (e.g. WebGPT is not doing RL over inscrutable long-term consequences on the internet---just over human evaluations of the quality of answers or browsing behavior).

WebGPT is approximately "reinforcement learning on the internet".

There are some very minimal safeguards implemented (search via Bing API, but the AI can click on arbitrary links), but I do indeed think "reinforcement learning on the internet" is approximately the worst direction for modern AI to go in terms of immediate risks.

I don't think connecting GPT-3 to the internet is risky at current capability levels, but pushing AI in the direction of just hooking up language models with reinforcement learning to a browser seems like one of the worst directions f... (read more)

The primary job of OpenAI is to be a clear leader here and do the obvious good things to keep an AI safe, which will hopefully include boxing it. Saying "well, seems like the cost is kinda high so we won't do it" seems like exactly the kind of attitude that I am worried will cause humanity to go extinct. 

  • When you say "good things to keep an AI safe" I think you are referring to a goal like "maximize capability while minimizing catastrophic alignment risk." But in my opinion "don't give your models access to the internet or anything equally risky" is a
... (read more)

The main group of people working on alignment (other than interpretability) at OpenAI at the time of the Anthropic split at the end of 2020 was the Reflection team, which has since been renamed to the Alignment team. Of the 7 members of the team at that time (who are listed on the summarization paper), 4 are still working at OpenAI, and none are working at Anthropic.

I think this is literally true, but at least as far as I know is not really conveying the underlying dynamics and so I expect readers to walk away with the wrong impression.

Again, I might be... (read more)

1Jacob Hilton1mo
Without commenting on the specifics, I have edited to the post to mitigate potential confusion: "this fact alone is not intended to provide a complete picture of the Anthropic split, which is more complicated than I am able to explain here".

Huh, I thought you agreed with statements like "if we had many shots at AI Alignment and could get reliable empirical feedback on whether an AI Alignment solution is working, AI Alignment would be much easier".

My model is that John is talking about "evidence on whether an AI alignment solution is sufficient", and you understood him to say "evidence on whether the AI Alignment problem is real/difficult". My guess is you both agree on the former, but I am not confident.

6Richard Ngo1mo
I agree that having many shots is helpful, but lacking them is not the core difficulty (just as having many shots to launch a rocket doesn't help you very much if you have no idea how rockets work). I don't really know what "reliable empirical feedback" means in this context - if you have sufficiently reliable feedback mechanisms, then you've solved most of the alignment problem. But, out of the things John listed: I expect that we'll observe a bunch of empirical examples of each of these things happening (except for the hard takeoff phase change), and not know how to fix them.

WebGPT seemed like one of the most in-expectation harmful projects that OpenAI has worked on, with no (to me) obvious safety relevance, so my guess is I would still mostly categorize the things you list under the first misconception as capabilities research. InstructGPT also seems to be almost fully capabilities research (like, I agree that there are some safety lessons to be learned here, but it seems somewhat clear to me that people are working on WebGPT and InstructGPT primarily for capabilities reasons, not for existential-risk-from-AI reasons)

(Edit: M... (read more)

like, I agree that there are some safety lessons to be learned here, but it seems somewhat clear to me that people are working on WebGPT and InstructGPT primarily for capabilities reasons, not for existential-risk-from-AI reasons

This also seems like an odd statement - it seems reasonable to say "I think the net effect of InstructGPT is to boost capabilities" or even "If someone was motivated by x-risk it would be poor prioritisation/a mistake to work on InstructGPT". But it feels like you're assuming some deep insight into the intention behind the people w... (read more)

2Neel Nanda1mo
That seems weirdly strong. Why do you think that?

I was the project lead on WebGPT and my motivation was to explore ideas for scalable oversight and truthfulness (some further explanation is given here).

A while ago I got most of the way to set up a feature on LW/AIAF that would export LW/AIAF posts to a nicely formatted academic-looking PDF that is linkable. I ended up running into a hurdle somewhat close to the end and shelved the feature, but if there is a lot of demand here, I could probably finish up the work, which would make this process even easier.

A while ago I made a very quick Python script to pull Markdown from LW, then use pandoc to export to a PDF (because I prefer reading physical papers and Latex formatting). I used it somewhat regularly for ~6 months and found that it was good enough for my purposes. I assume the LW developers could write something much better, but I've thrown it into this Github [repo]( in case it's of help or interest.

I would especially especially love it if it popped out a .tex file that I could edit, since I'm very likely to be using different language on LW than I would in a fancy academic paper.
2Neel Nanda1mo
I would love this! I'm currently paying someone ~$200 to port my grokking post to LaTeX, getting a PDF automatically would be great

Yeah, I think Open AI tried to do some empirical work, but approximately just produced capability progress, in my current model of the world (though I also think the incentive environment there was particularly bad). I feel confused about the "learning to summarize from human feedback" work, and currently think it was overall bad for the world, but am not super confident (in general I feel very confused about the sign of RLHF research).

I think Rohin Shah doesn't think of himself as having produced empirical work that helps with AI Alignment, but only to ha... (read more)

I'm pretty confused about how to think about the value of various ML alignment papers. But I think even if some piece of empirical ML work on alignment is really valuable for reducing x-risk, I wouldn't expect its value to take the form of providing insight to readers like you or me. So you as a reader not getting much out of it is compatible with the work being super valuable, and we probably need to assess it on different terms.

The main channel of value that I see for doing work like "learning to summarize" and the critiques project and various interpret... (read more)

Hmm, there might be some mismatch of words here. Like, most of the work so far on the problem has been theoretical. I am confused how you could not be excited about the theoretical work that established the whole problem, the arguments for why it's hard, and that helped us figure out at least some of the basic parameters of the problem. Given that (I think) you currently think AI Alignment is among the global priorities, you presumably think the work that allowed you to come to believe that (and that allowed others to do the same) was very valuable and imp... (read more)

I was mainly talking about the current margin when I talked about how excited I am about the theoretical vs empirical work I see "going on" right now and how excited I tend to be about currently-active researchers who are doing theory vs empirical research. And I was talking about the future when I said that I expect empirical work to end up with the lion's share of credit for AI risk reduction.

Eliezer, Bostrom, and co certainly made a big impact in raising the problem to people's awareness and articulating some of its contours. It's kind of a matter of s... (read more)

I mostly disagreed with bullet point two. The primary result of "empirical AI Alignment research" that I've seen in the last 5 years has been a lot of capabilities gain, with approximately zero in terms of progress on any AI Alignment problems. I agree more with the "in the long run there will be a lot of empirical work to be done", but right now on the margin, we have approximately zero traction on useful empirical work, as far as I can tell (outside of transparency research).

4Ajeya Cotra2mo
I agree that in an absolute sense there is very little empirical work that I'm excited about going on, but I think there's even less theoretical work going on that I'm excited about, and when people who share my views on the nature of the problem work on empirical work I feel that it works better than when they do theoretical work.
1Thomas Kwa2mo
Were any cautious people trying empirical alignment research before Redwood/Conjecture?

FWIW, I had a mildly negative reaction to this title. I agree with you, but I feel like the term "PSA" should be reserved for things that are really very straightforward and non-controversial, and I feel like it's a bit of a bad rhetorical technique to frame your arguments as a PSA. I like the overall content of the post, but feel like a self-summarizing post title like "Most AI Alignment research is not parallelizable" would be better.

I think this is a great comment that feels to me like it communicated a better intuition for why corrigibility might be natural than anything else I've read so far.

If Australia was pursuing a strategy of "lock down irrespective of cost", then I don't think it makes sense to describe the initial response as competent. It just happened to be right in this case, but in order for the overall response to helpful, it has to be adaptive to the actual costs. I agree that the early response on its own would have indicated a potentially competent decision-making algorithm, but the later followup showed that the algorithm seems to have mostly been correct on accident, and not on-purpose.

I do appreciate the link to the GDP cost article. I would have to look into the methodology more to comment on that, but it certainly seems like an interest analysis and suggestive result.

Australia seems to have suffered a lot more from the pandemic than the U.S., paying much more in the cost of lockdown than even a relatively conservative worst-case estimate would have been for the costs of an uncontrolled COVID pandemic. I don't know about the others, but given that you put Australia on this list, I don't currently trust the others to have acted sensibly.

I'm not sure if you actually read carefully what you are commenting on. I emphasized early response, or initial governmental-level response in both comments in this thread. Sure, multiple countries on the list made mistakes later, some countries sort of become insane, and so on. Later, almost everyone made mistakes with vaccines, rapid tests, investments in contact tracing, etc. Arguing that the early lockdown was more costly than "an uncontrolled pandemic" would be pretty insane position (cf GDP costs [] , Italy had the closest thing to an uncontrolled pandemic). (Btw the whole notion of "an uncontrolled pandemic" is deeply confused - unless you are a totalitarian dictatorship, you cannot just order people "live as normally" during a pandemic when enough other people are dying; you get spontaneous "anarchic lockdowns" anyway, just later and in a more costly way)

(Mod note: I turned on two-factor voting. @Evan: Feel free to ask me to turn it off.)

Ok, but why isn't it better to have Godzilla fighting Mega-Godzilla instead of leaving Mega-Godzilla unchallenged?

Because Tokyo still gets destroyed.

Important thing to bear in mind here: the relevant point for comparison is not the fantasy-world where the Godzilla-vs-Mega-Godzilla fight happens exactly the way the clever elaborate scheme imagined. The relevant point for comparison is the realistic-world where something went wrong, and the elaborate clever scheme fell apart, and now there's monsters rampaging around anyway.

Yeah, OK, I think this distinction makes sense, and I do feel like this distinction is important. 

Having settled this, my primary response is: 

Sure, I guess it's the most prototypical catastrophic action until we have solved it, but like, even if we solve it, we haven't solved the problem where the AI does actually get a lot smarter than humans and takes a substantially more "positive-sum" action and kills approximately everyone with the use of a bioweapon, or launches all the nukes, or develops nanotechnology. We do have to solve this problem fi... (read more)

I'm using "catastrophic" in the technical sense of "unacceptably bad even if it happens very rarely, and even if the AI does what you wanted the rest of the time", rather than "very bad thing that happens because of AI", apologies if this was confusing.

My guess is that you will wildly disagree with the frame I'm going to use here, but I'll just spell it out anyway: I'm interested in "catastrophes" as a remaining problem after you have solved the scalable oversight problem. If your action is able to do one of these "positive-sum" pivotal acts in a single ac... (read more)

Mod note: I activated two-axis voting on this post, since it seemed like it would make the conversation go better.

it’s the AI doing something relatively easy that is entirely a zero-sum action that removes control of the situation from humans.

I also don't really understand the emphasis on zero-sum action. It seems pretty plausible that the AI will trade for compute with some other person around the world. My guess is, in-general, it will be easier to get access to the internet or have some kind of side-channel attack than to get root access to the datacenter the AI runs on. I do think most-likely the AI will then be capable of just taking resources instead of payin... (read more)

1Buck Shlegeris4mo
Whether this is what I'm trying to call a zero-sum action depends on whose resources it's trading. If the plan is "spend a bunch of the capital that its creators have given it on compute somewhere else", then I think this is importantly zero-sum--the resources are being taken from the creators of AI, which is why the AI was able to spend so many resources. If the plan was instead "produce some ten trillion dollar invention, then sell it, then use the proceeds to buy compute elsewhere", this would seem less zero-sum, and I'm saying that I expect the first kind of thing to happen before the second.

I feel like the focus on getting access to its own datacenter is too strong in this story. Seems like it could also just involve hacking some random remote server, or convincing some random person on the internet to buy some compute for them, or to execute some other plan for them (like producing a custom chip), or convincing a researcher that it should get more resources on the existing datacenter, or threatening some other stakeholder somewhere in order to give them power or compute of some kind. Also, all of course selected for plans that are least like... (read more)

2Buck Shlegeris4mo
Except for maybe "producing a custom chip", I agree with these as other possibilities, and I think they're in line with the point I wanted to make, which is that the catastrophic action involves taking someone else's resource such that it can prevent humans from observing it or interfering with it, rather than doing something which is directly a pivotal act. Does this distinction make sense? Maybe this would have been clearer if I'd titled it "AI catastrophic actions are mostly not pivotal acts"?

I like this. Would this have to be publicly available models? Seems kind of hard to do for private models.

2Ramana Kumar4mo
What kind of access might be needed to private models? Could there be a secure multi-party computation approach that is sufficient?

One of the primary questions that comes to mind for me is "well, did this whole thing actually work?". If I understand the paper correctly, while we definitely substantially decreased the fraction of random samples that got misclassified (which always seemed very likely to happen, and I am indeed a bit surprised at only getting it to move ~3 OOMs, which my guess is mostly capability related, since you used small models), we only doubled the amount of effort necessary to generate an adversarial counterexample. 

A doubling is still pretty substantial, an... (read more)

Excellent question -- I wish we had included more of an answer to this in the post.

I think we made some real progress on the defense side -- but I 100% was hoping for more and agree we have a long way to go.

I think the classifier is quite robust in an absolute sense, at least compared to normal ML models. We haven't actually tried it on the final classifier, but my guess is it takes at least several hours to find a crisp failure unassisted (whereas almost all ML models you can find are trivially breakable). We're interested in people giving it a shot! :)

Pa... (read more)

I do not understand the "we only doubled the amount of effort necessary to generate an adversarial counterexample.". Aren't we talking about 3oom?

This article explains the difference:

For example, a 14,000 BTU model that draws 1,400 watts of power on maximum settings would have an EER of 10.0 as 14,000/1,400 = 10.0.

A 14,000 BTU unit that draws 1200 watts of power would have an EER of 11.67 as 14,000/1,200 = 11.67.

Taken at face value, this looks like a good and proper metric to use for energy efficiency. The lower the power draw (watts) compared to the cooling capacity (BTUs/hr), the higher the EER. And the higher the E

... (read more)

EER does not account for heat infiltration issues, so this seems confused. CEER does, and that does suggest something in the 20% range, but I am pretty sure you can't use EER to compare a single-hose and a dual-hose system.

2Jessica Taylor5mo
I assumed EER did account for that based on:

I think that paragraph is discussing a second reason that infiltration is bad.

Yeah, sorry, I didn't mean to imply the section is saying something totally wrong. The section just makes it sound like that is the only concern with infiltration, which seems wrong, and my current model of the author of the post is that they weren't actually thinking through heat-related infiltration issues (though it's hard to say from just this one paragraph, of course). 

My overall take on this post and comment (after spending like 1.5 hours reading about AC design and statistics): 

Overall I feel like both the OP and this reply say some wrong things. The top Wirecutter recommendation is a dual-hose design. The testing procedure of Wirecutter does not seem to address infiltration in any way, and indeed the whole article does not discuss infiltration as it relates to cooling-efficiency. 

Overall efficiency loss from going to dual to single is something like 20-30%, which I do think is much lower than I think the OP ... (read more)

Update: I too have now spent like 1.5 hours reading about AC design and statistics, and I can now give a reasonable guess at exactly where the I-claim-obviously-ridiculous 20-30% number came from. Summary: the SACC/CEER standards use a weighted mix of two test conditions, with 80% of the weight on conditions in which outdoor air is only 3°F/1.6°C hotter than indoor air.

The whole backstory of the DOE's SACC/CEER rating rules is here. Single-hose air conditioners take center stage. The comments on the DOE's rule proposals can basically be summarized as:

  • Singl
... (read more)
2[comment deleted]5mo
  • The top wirecutter recommendation is roughly 3x as expensive as the Amazon AC being reviewed. The top budget pick is a single-hose model.
  • People usually want to cool the room they are spending their time in. Those ACs are marketed to cool a 300 sq ft room, not a whole home. That's what reviewers are clearly doing with the unit. 
  • I'd guess that in extreme cases (where you care about the room with AC no more than other rooms in the house + rest of house is cool) consumers are overestimating efficiency by ~30%. On average in reality I'd guess they are over
... (read more)

The best thing we took away from our tests was the chance at a direct comparison between a single-hose design and a dual-hose design that were otherwise identical, and our experience confirmed our suspicions that dual-hose portable ACs are slightly more effective than single-hose models but not effective enough to make a real difference

After having looked into this quite a bit, it does really seem like the Wirecutter testing process had no ability to notice infiltration issues, so it seems like the Wirecutter crew themselves is kind of confused here? ... (read more)

4Paul Christiano5mo
They measure the temperature in the room, which captures the effect of negative pressure pulling in hot air from the rest of the building. It underestimates the costs if the rest of the building is significantly cooler than the outside (I'd guess by the ballpark of 20-30% in the extreme case where you care equally about all spaces in the building, the rest of your building is kept at the same temp as the room you are cooling, and a negligible fraction of air exchange with the outside is via the room you are cooling). I think that paragraph is discussing a second reason that infiltration is bad.

A 2-hose unit will definitely cool more efficiently, but I think for many people who are using portable units it's the right tradeoff with convenience. The wirecutter reviews both types of units together and usually end up preferring 1-hose units.

It is important to note that the current top wirecutter pick is a 2-hose unit, though one that combined the two hoses into one big hose. I guess maybe that is recent, but it does seem important to acknowledge here (and it wouldn't surprise me that much if Wirecutter went through reasoning pretty similar to the one... (read more)

Here is the wirecutter discussion of the distinction for reference:

Starting in 2019, we began comparing dual- and single-hose models according to the same criteria, and we didn’t dismiss any models based on their hose count. Our research, however, ultimately steered us toward single-hose portable models—in part because so many newer models use this design. In fact, we found no compelling new double-hose models from major manufacturers in 2019 or 2020 (although a few new ones cropped up in 2021, including our new top pick). Owner reviews indicate that most

... (read more)

Mod note: I reposted this post to the frontpage, because it wasn't actually shown on a frontpage due to an interaction with the GreaterWrong post-submission interface. It seemed like a post many people are interested in, and it seemed like it didn't really get the visibility it deserved.

Relevant Feynman quote: 

I had a scheme, which I still use today when somebody is explaining something that I’m trying to understand: I keep making up examples.

For instance, the mathematicians would come in with a terrific theorem, and they’re all excited. As they’re telling me the conditions of the theorem, I construct something which fits all the conditions. You know, you have a set (one ball)-- disjoint (two balls). Then the balls turn colors, grow hairs, or whatever, in my head as they put more conditions on.

Finally they state the theorem, which is some dumb thing about the ball which isn’t true for my hairy green ball thing, so I say “False!” [and] point out my counterexample.


No real power-seeking tendencies if we only plausibly will specify a negative vector.

Seems like two sentences got merged together.

2Alex Turner8mo
Fixed, thanks!

The post feels like it's trying pretty hard to point towards an alternative forecasting method, though I also agree it's not fully succeeding at getting there. 

I feel like de-facto the forecasting methodology of people who are actually good at forecasting don't usually strike me as low-inferential distance, such that it is obvious how to communicate the full methodology. My sense from talking to a number of superforecasters over the years is that they do pretty complicated things, and I don't feel like the critique of "A critique is only really valid ... (read more)

I think it's fine to say that you think something else is better without being able to precisely say what it is. I just think "the trick that never works" is an overstatement if you aren't providing evidence about whether it has  worked, and that it's hard to provide such evidence without saying something about what you are comparing to.

(Like I said though, I just skimmed the post and it's possible it contains evidence or argument that I didn't catch.)

It's possible the action is in disagreements about Moravec's view rather than the lack of an alternat... (read more)

This is a very good point. IIRC Paul is working on some new blog posts that summarize his more up-to-date approach, though I don't know when they'll be done. I will ask Paul when I next run into him about what he thinks might be the best way to update the sequence.

Thank you! I am glad you are doing this!

Promoted to curated: I found this conversation useful from a number of different perspectives, and found the transcript surprisingly easy to read (though it is still very long). The key question the conversation tried to tackle, about whether we should put resources into increasing the safety of AI systems by reducing the degree to which they try to model humans, is one that I've been interested in for a while. But I also felt like this conversation, more so than most other transcripts, gave me a better insight into how both Scott and Rohin think about these topics in general, and what kind of heuristics they use to evaluate various AI alignment proposals.

I also found these very valuable! I wonder whether a better title might help more people see how great these are, but not sure.

Replaced the image in the post with this image.

Minor meta feedback: I think it's better to put the "Comprehensive Information Gathering" part of the title at the end, if you want to have many of these. That makes it much easier to see differences in the title and skim a list of them.

1Adam Shimi1y
Sure, I hadn't thought about that.

The newsletter is back! I missed these! Glad to have these back.

Promoted to curated: I've had a number of disagreements with a perspective on AI that generates arguments like the above, which takes something like "ownership of material resources" as a really fundamental unit of analysis, and I feel like this post has both helped me get a better grasp on that paradigm of thinking, and also helped me get a bit of a better sense of what feels off to me, and I have a feeling this post will be useful in bridging that gap eventually. 

Load More