Aryeh Englander

I work on applied mathematics and AI at the Johns Hopkins University Applied Physics Laboratory. I also do AI safety related work for the Johns Hopkins Institute for Assured Autonomy. I am currently doing a CS PhD focused on AI safety at the University of Maryland, Baltimore County.

I think part of what I was reacting to is a kind of half-formed argument that goes something like:

  • My prior credence is very low that all these really smart, carefully thought-through people are making the kinds of stupid or biased mistakes they are being accused of.
  • In fact, my prior for the above is sufficiently low that I suspect it's more likely that the author is the one making the mistake(s) here, at least in the sense of straw-manning his opponents.
  • But if that's the case then I shouldn't trust the other things he says as much, because it looks like he's making reasoning mistakes himself or else he's biased.
  • Therefore I shouldn't take his arguments so seriously.

Again, this isn't actually an argument I would make. It's just me trying to articulate my initial negative reactions to the post.


I noticed that I found it very difficult to read through this post, even though I felt the content was important, because of the (deliberately) condescending style. I also noticed that I'm finding it difficult to take the ideas as seriously as I think I should, again due to the style. I did manage to read through it in the end, because I do think it's important, and I think I am mostly able to avoid letting the style influence my judgments. But I find it fascinating to watch my own reaction to the post, and I'm wondering if others have any (constructive) insights on this.

In general I I've noticed that I have a very hard time reading things that are written in a polemical, condescending, insulting, or ridiculing manner. This is particularly true of course if the target is a group / person / idea that I happen to like. But even if it's written by someone on "my side" I find I have a hard time getting myself to read it. There have been several times when I've been told I should really go read a certain book, blog, article, etc., and that it has important content I should know about, but I couldn't get myself to read the whole thing due to the polemical or insulting way in which it was written.

Similarly, as I noted above, I've noticed that I often have a hard time taking ideas as seriously as I probably should if they're written in a polemical / condescending / insulting / ridiculing style. I think maybe I tend to down-weight the credibility of anybody who writes like that, and by extension maybe I subconsciously down-weight the content? Maybe I'm subconsciously associating condescension (at least towards ideas / people I think of as worth taking seriously) with bias? Not sure.

I've heard from other people that they especially like polemical / condescending articles, and I imagine that it is effective / persuasive for a lot of readers. For all I know this is far and away the most effective way of writing this kind of thing. And even if not, Eliezer is perfectly within his rights to use whatever style he wants. Eliezer explicitly acknowledges the condescending-sounding tone of the article, but felt it was worth writing it that way anyway, and that's fine.

So to be clear: This is not at all a criticism of the way this post was written. I am simply curious about my own reaction to it, and I'm interested to hear what others think about that.

A few questions:

  1. Am I unusual in this? Do other people here find it difficult to read polemical or condescending writing, and/or do you find that the style makes it difficult for you to take the content as seriously as you perhaps should?
  2. Are there any studies you're aware of on how people react to polemical writing?
  3. Are there some situations in which it actually does make sense to use the kind of intuitive heuristic I was using - i.e., if it's written in a polemical / insulting style then it's probably less credible? Or is this just a generally bad heuristic that I should try to get rid of entirely?
  4. This is a topic I'm very interested in so I'd appreciate any other related comments or thoughts you might have.

Thanks Daniel for that strong vote of confidence!

The full graph is in fact expandable / collapsible, and it does have the ability to display the relevant paragraphs when you hover over a node (although the descriptions are not all filled in yet). It also allows people to enter in their own numbers and spit out updated calculations, exactly as you described. We actually built a nice dashboard for that - we haven't shown it yet in this sequence because this sequence is mostly focused on phase 1 and that's for phase 2.

Analytica does have a web version, but it's a bit clunky and buggy so we haven't used it so far. However, I was just informed that they are coming out with a major update soon that will include a significantly better web version, so hopefully we can do all this then.

I certainly don't think we'd say no to additional funding or interns! We could certainly use them - there are quite a few areas that we have not looked into sufficiently because all of our team members were focused on other parts of the model. And we haven't gotten yet to much of the quantitative part (phase 2 as you called it), or the formal elicitation part.

I'd like to hear more thoughts, from Rohin or anybody else, about how the scaling hypothesis might affect safety work.

Thanks Adam for setting this up! I have no idea if my experience is representative, but that was definitely one of the highest-quality discussion sessions I've had at events of this type.

I don't think this is quite an example of a treacherous turn, but this still looks relevant:

Lewis et al., Deal or no deal? end-to-end learning for negotiation dialogues (2017):

Analysing the performance of our agents, we find evidence of sophisticated negotiation strategies. For example, we find instances of the model feigning interest in a valueless issue, so that it can later ‘compromise’ by conceding it. Deceit is a complex skill that requires hypothesising the other agent’s beliefs, and is learnt relatively late in child development (Talwar and Lee, 2002). Our agents have learnt to deceive without any explicit human design, simply by trying to achieve their goals.

(I found this reference cited in Kenton et al., Alignment of Language Agents (2021).)

That's later in the linked wiki page:

