Raymond Arnold
Berkeley, CA, USA

I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.


The case for becoming a black-box investigator of language models

Curated. I've heard a few offhand comments about this type of research work in the past few months, but wasn't quite sure how seriously to take it. 

I like this writeup for spelling out details of why it blackbox investigators might be useful, what skills it requires and how you might go about it. 

I expect this sort of skillset to have major limitations, but I think I agree with the stated claims that it's a useful skillset to have in conjunction with other techniques.

Why Copilot Accelerates Timelines

GPT-N that you can prompt with "I am stuck with this transformer architecture trying to solve problem X". GPT-N would be AIHHAI if it answers along the lines of "In this arXiv article, they used trick Z to solve problems similar to X. Have you considered implementing it?", and using an implementation of Z would solve X >50% of the time.

I haven't finished reading the post, but I found it worthwhile for this quote alone. This is the first description I've read of how GPT-N could be transformative. (Upon reflection this was super obvious and I'm embarrassed I didn't think of it myself – I got so distracted with arguments about whether GPT could do anything 'original' that I missed this strategy)

I recently chatted with someone who described an older-researcher that had encyclopedic knowledge of AI papers (ML and otherwise), who had seen a ton of papers exploring various tools and tricks for solving ML problems. The older researcher had the experience of watching many younger researchers reinvent the same tricks over and over, and being like "that's cool, but, like, there's a better version of this idea published 20 years ago, which not only does what your algorithm does but also solves SubProblem X better than you".

So it was interesting that the world is a bit bottlenecked on only having so many 'living libraries' who are able to keep track of the deluge of research, but that this is something I'd expect GPT to be good enough at to meaningfully help.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

Well yeah, that's my point. It seems to me that any pivotal act worthy of the name would essentially require the AI team to become an AGI-powered world government, which seems pretty darn difficult to pull off safely. The superpowered-AI-propaganda plan falls under this category.

Yeah. I think this sort of thing is why Eliezer thinks we're doomed – getting the humanity to coordinate collectively seems doomed (i.e. see Gain of Function Research), and there are no weak pivotal acts that aren't basically impossible to execute safely.

The nanomachine gpu-melting pivotal act is meant to be a gesture at the difficulty / power level, not an actual working example. The other gestured-example I've heard is "upload aligned people who think hard for 1000 subjective years and hopefully figure something out." I've heard someone from MIRI argue that one is also unworkable but wasn't sure on the exact reasons.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

Followup point on the Gain-of-Function-Ban as practice-run for AI:

My sense is that the biorisk people who were thinking about Gain-of-Function-Ban were not primarily modeling it as a practice run for regulating AGI. This may result in them not really prioritizing it.

I think biorisk is significantly lower than AGI risk, so if it's tractable and useful to regulate Gain of Function research as a practice run for regulating AGI, it's plausible this is actually much more important than business-as-usual biorisk. 

BUT I think smart people I know seem to disagree about how any of this works, so the "if tractable and useful" conditional is pretty non-obvious to me.

If bio-and-AI-people haven't had a serious conversation about this where they mapped out the considerations in more detail, I do think that should happen.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

Various thoughts that this inspires:

Gain of Function Ban as Practice-Run/Learning for relevant AI Bans

I have heard vague-musings-of-plans in the direction of "get the world to successfully ban Gain of Function research, as a practice-case for getting the world to successfully ban dangerous AI." 

I have vague memories of the actual top bio people around not being too focused on this, because they thought there were easier ways to make progress on biosecurity. (I may be conflating a few different statements – they might have just critiquing a particular strategy I mentioned for banning Gain of Function research)

A few considerations for banning Gain of Function research for AI-related reasons:

  • because you will gain skills / capacities that transfer into banning relevant AI systems. (i.e. "you'll learn what works")
  • you won't learn what works, but you'll hit a bunch of brick walls that teaches you what doesn't work.
  • A gain of function ban is "lower stakes" (biorisk is way less likely to kill everyone than AI), and (hopefully?) won't have many side effects that specifically make it harder to ban AI-stuff later. (By contrast, if you try ineffectually to regulate AI in some way, you will cause the AI industry to raise it's hackles, and maybe cause one political party to get a reputation as "the anti-AI party", causing the other party to become "anti-anti-AI" in response, or maybe you will get everyone's epistemics about what's worth banning all clouded.

Distinction between "Regulating AI is possible/impossible" vs "pivotal act framing is harmful/unharmful".

I currently believe John's point of "man, sure seems real hard to actually usefully regulate things even when it's comparatively easy, I don't feel that hopeful about regulation processes working."

But, that doesn't necessarily contradict the point that it's really hard to build an organization capable of unilaterally implementing a pivotal act, and that the process of doing is likely to create enemies, erode coordination-fabric, make people fearful, etc.

It seems obvious to me that the arguments in this post are true-to-some-degree. There's some actual math / hashing-out that I haven't seen to my satisfaction of how all the arguments actually balance against each toher. 

Something feels off about the way people relate to "everyone else's ideas seeming more impossible than mine own, even if we all agree it's pretty impossible."

Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

Also, like, Berkeley heat waves may just significantly different than, like, Reno heat waves. My current read is that part of the issue here is that a lot of places don't actually get that hot so having less robustly good air conditioners is fine.

[Link] A minimal viable product for alignment

Not sure I buy this – I have a model of how hard it is to be deceptive, and how competent our current ML systems are, and it looks like it's more like "as competent as a deceptive four-year old" (my parents totally caught me when I told my first lie), than "as competent as a silver-tongued sociopath playing a long game."

I do expect there to be signs of deceptive alignment, in a noticeable fashion before we get so-deceptive-we-don't-notice deception.

A Longlist of Theories of Impact for Interpretability


I've long had a vague sense that interpretability should be helpful somehow, but recently when I tried to spell out exactly how it helped I had a surprisingly hard time. I appreciated this post's exploration of the concept.

Job Offering: Help Communicate Infrabayesianism

I've had on my TODO to try reading the LW post transcript of that and seeing if it could be distilled further.

Job Offering: Help Communicate Infrabayesianism

As someone kinda confused about InfraBayesianism but has some sense that it's important, I am glad to see this initiative. :)

Load More