How it started I used to think that anything that LLMs said about having something like subjective experience or what it felt like on the inside was necessarily just a confabulated story. And there were several good reasons for this. First, something that Peter Watts mentioned in an early blog...
There was conversation on Facebook over an argument that any sufficiently complex system, whether a human, a society, or an AGI, will be unable to pursue a unified goal due to internal conflict among its parts, and that this should make us less worried about "one paperclipper"-style AI FOOM scenarios....
Alternative title: "Evolution suggests robust rather than fragile generalization of alignment properties." A frequently repeated argument goes something like this: 1. Evolution has optimized humans for inclusive genetic fitness (IGF) 2. However, humans didn’t end up explicitly optimizing for genetic fitness (e.g. they use contraception to avoid having children) 3....
I occasionally have some thoughts about why AGI might not be as near as a lot of people seem to think, but I'm confused about how/whether to talk about them in public. The biggest reason for not talking about them is that one person's "here is a list of capabilities...
Short version Humans have an innate motivation ("preference fulfillment", PF) to fulfill the preferences of those they care about. It corresponds to at least some of the senses of the word "love", as well as related words such as "kindness" and "compassion". I hypothesize that it works by simulating the...
The predominant view on LW seems to be "pure AI capabilities research is bad, because capabilities progress alone doesn't contribute to alignment progress, and capabilities progress without alignment progress means that we're doomed". I understand the arguments for this position, but I have what might be called the opposite position....
A paper investigating how individual neurons in a CLIP model (an image/text neural net combining a ResNet vision model with a Transformer language model) respond to various abstract concepts. This shouldn't be very surprising after GPT-3 and DALL-E but still, identifying multimodal neurons feels scarily close to "neural net that...