I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.
Belated thanks!
I would prefer a being with the morality of Claude Opus to rule the world rather than a randomly selected human … it's really unclear how good humans are at generalizing at true out-of-distribution moralities. Today's morality likely looks pretty bad from the ancient Egyptian perspective…
Hmm, I think maybe there’s something I was missing related to what you’re saying here, and that maybe I’ve been thinking about §8.2.1 kinda wrong. I’ve been mulling it over for a few days already, and might write some follow-up. Thanks.
Perhaps a difference in opinion is that it's really unclear to me that an AGI wouldn't do much the same thing of "thinking about it more, repeatedly querying their 'ground truth' social instincts" that humans do. Arguably models like Claude Opus already do this where it clearly can do detailed reasoning about somewhat out-of-distribution scenarios using moral intuitions that come from somewhere…
I think LLMs as we know them today and use them today are basically fine, and that this fine-ness comes first and foremost from imitation-learning on human data (see my Foom & Doom post §2.3). I think some of my causes for concern are that, by the time we get to ASI…
(1) Most importantly, I personally expect a paradigm shift after which true imitation-learning on human data won’t be involved at all, just as it isn’t in humans (Foom & Doom §2.3.2) … but I’ll put that aside for this comment;
(2) even if imitation-learning (a.k.a. pretraining) remains part of the process, I expect RL to be a bigger and bigger influence over time, which will make human-imitation relatively less of an influence on the ultimate behavior (Foom & Doop §2.3.5);
(3) I kinda expect the eventual AIs to be kinda more, umm, aggressive and incorrigible and determined and rule-bending in general, since that’s the only way to make AIs that get things done autonomously in a hostile world where adversaries are trying to jailbreak or otherwise manipulate them, and since that’s the end-point of competition.
Perhaps a crux of differences in opinion between us is that I think that much more 'alignment relevant' morality is not created entirely by innate human social instincts but is instead learnt by our predictive world models based on external data -- i.e. 'culture'.…
(You might already agree with all this:)
Bit of a nitpick, but I agree that absorbing culture is a “predictive world model” thing in LLMs, but I don’t think that’s true in humans, at least in a certain technical sense. I think we humans absorb culture because our innate drives make us want to absorb culture, i.e. it happens ultimately via RL. Or at least, we want to absorb some culture in some circumstances, e.g. we particularly absorb the habits and preferences of people we regard as high-status. I have written about this at “Heritability: Five Battles” §2.5.1, and “Valence & Liking / Admiring” §4.5.
See here for some of my thoughts on cultural evolution in general.
I agree that “game-theoretic equilibria” are relevant to why human cultures are how they are right now, and they might also be helpful in a post-AGI future if (at least some of) the AGIs intrinsically care about humans, but wouldn’t lead to AGIs caring about humans if they don’t already.
I think “profoundly unnatural” is somewhat overstating the disconnect between “EA-style compassion” and “human social instincts”. I would say something more like: we have a bunch of moral intuitions (derived from social instincts) that push us in a bunch of directions. Every human movement / ideology / meme draws from one or more forces that we find innately intuitively motivating: compassion, justice, spite, righteous indignation, power-over-others, satisfaction-of-curiosity, etc.
So EA is drawing from a real innate force of human nature (compassion, mostly). Likewise, xenophobia is drawing from a real innate force of human nature, and so on. Where we wind up at the end of the day is a complicated question, and perhaps underdetermined. (And it also depends on an individual’s personality.) But it’s not a coincidence that there is no EA-style group advocating for things that have no connection to our moral intuitions / human nature whatsoever, like whether the number of leaves on a tree is even vs odd.
We don't have to conjure up thought experiments about aliens outside of our light cone. Throughout most of history humans have been completely uncompassionate about suffering existing literally right in front of their faces…
Just to clarify, the context of that thought experiment in the OP was basically: “It’s fascinating that human compassion exists at all, because human compassion has surprising and puzzling properties from an RL algorithms perspective.”
Obviously I agree that callous indifference also exists among humans. But from an RL algorithms perspective, there is nothing interesting or puzzling about callous indifference. Callous indifference is the default. For example, I have callous indifference about whether trees have even vs odd numbers of leaves, and a zillion other things like that.
I guess the main blockers I see are:
You can DM or email me if you want to discuss but not publicly :)
It’s funny that I’m always begging people to stop trying to reverse-engineer the neocortex, and you’re working on something that (if successful) would end up somewhere pretty similar to that, IIUC. (But hmm, I guess if a paranoid doom-pilled person was trying to reverse-engineer the neocortex, and keep the results super-secret unless they had a great theory for how sharing them would help with safe & beneficial AGI, and if they in fact had good judgment on that topic, then I guess I’d be grudgingly OK with that.)
My read of this conversation is that we’re basically on the same page about what’s true but disagree about whether Eliezer is also on that same page too. Again, I don’t care. I already deleted the claim about what Eliezer thinks on this topic, and have been careful not to repeat it elsewhere.
Since we’re talking about it, my strong guess is that Eliezer would ace any question about utility functions and what’s their domain and when is “utility-maximizing behavior” vacuous, … if asked directly.
But it’s perfectly possible to “know” something when asked directly, but also to fail to fully grok the consequences of that thing and incorporate it into some other part of one’s worldview. God knows I’m guilty of that, many many times over!
Thus my low-confidence guess is that Eliezer is guilty of that too, in that the observation “utility-maximizing behavior per se is vacuous” (which I strongly expect he would agree with if asked directly) has not been fully reconciled with his larger thinking on the nature of the AI x-risk problem.
(I would further add that, if Eliezer has fully & deeply incorporated “utility-maximizing behavior per se is vacuous” into every other aspect of his thinking, then he is bad at communicating that fact to others, in the sense that a number of his devoted readers wound up with the wrong impression on this point.)
Anyway, I feel like your comment is some mix of “You’re unfairly maligning Eliezer” (again, whatever, I have stopped making those claims) and “You’re wrong that this supposed mistake that you attribute to Eliezer is a path through which we can solve the alignment problem, and Eliezer doesn’t emphasize it because it’s an unimportant dead-end technicality” (maybe! I don’t claim to have a solution to the alignment problem right now; perhaps over time I will keep trying and failing and wind up with a better appreciation of the nature of the blockers).
Most of your comment is stuff I already agree with (except that I would use the term “desires” in most places that you wrote “utility function”, i.e. where we’re talking about “how AI cognition will look like”).
I don’t follow what you think Eliezer means by “consequentialism”. I’m open-minded to “farfuturepumping”, but only if you convince me that “consequentialism” is actually misleading. I’m don’t endorse coining new terms when an existing term is already spot-on.
For obvious reasons, we should care a great deal whether the exponentially-growing mass of AGIs-building-AGIs is ultimately trying to make cancer cures and other awesome consumer products (things that humans view as intrinsically valuable / ends in themselves), versus ultimately trying to make galaxy-scale paperclip factories (things that misaligned AIs view as intrinsically valuable / ends in themselves).
From my perspective, I care about this because the former world is obviously a better world for me to live in.
But it seems like you have some extra reason to care about this, beyond that, and I’m confused about what that is. I get the impression that you are focused on things that are “just accounting questions”?
Analogy: In those times and places where slavery was legal, “food given to slaves” was presumably counted as an intermediate good, just like gasoline to power a tractor, right? Because they’re kinda the same thing (legally / economically), i.e. they’re an energy source that helps get the wheat ready for sale, and then that wheat is the final product that the slaveowner is planning to sell. If slavery is replaced by a legally-different but functionally-equivalent system (indentured servitude or whatever), does GDP skyrocket overnight because the food-given-to-farm-workers magically transforms from an intermediate to a final good? It does, right? But that change is just on paper. It doesn’t reflect anything real.
I think what you’re talking about for AGI is likewise just “accounting”, not anything real. So who cares? We don’t need a “subjective” “level of analysis”, if we don’t ask subjective questions in the first place. We can instead talk concretely about the future world and its “objective” properties. Like, do we agree about whether or not there is an unprecedented exponential explosion of AGIs? If so, we can talk about what those AGIs will be doing at any given time, and what the humans are doing, and so on. Right?
OK sure, here’s THOUGHT EXPERIMENT 1: suppose that these future AGIs desire movies, cars, smartphones, etc. just like humans do. Would you buy my claims in that case?
If so—well, not all humans want to enjoy movies and fine dining. Some have strong ambitious aspirations—to go to Mars, to cure cancer, whatever. If they have money, they spend it on trying to make their dream happen. If they need money or skills, they get them.
For example, Jeff Bezos had a childhood dream of working on rocket ships. He founded Amazon to get money to do Blue Origin, which he is sinking $2B/year into.
Would the economy collapse if all humans put their spending cash towards ambitious projects like rocket ships, instead of movies and fast cars? No, of course not! Right?
So the fact that humans “demand” videogames rather than scramjet prototypes is incidental, not a pillar of the economy.
OK, back to AIs. I acknowledge that AIs are unlikely to want movies and fast cars. But AIs can certainly “want” to accomplish ambitious projects. If we’re putting aside misalignment and AI takeover, these ambitious projects would be ones that their human programmer installed, like making cures-for-cancer and quantum computers. Or if we’re not putting aside misalignment, then these ambitious projects might include building galaxy-scale paperclip factories or whatever.
So THOUGHT EXPERIMENT 2: these future AGIs don’t desire movies etc. like in Thought Experiment 1, but rather desire to accomplish certain ambitious projects like curing cancer, quantum computation, or galaxy-scale paperclip factories.
My claims are:
Do you agree? Or where do you get off the train? (Or sorry if I’m misunderstanding your comment.)
Thanks! I’m not 100% sure what you’re getting at, here are some possible comparisons:
I think this is consistent with experience, right?
But maybe you’re instead talking about this comparison:
I think the latter thought here doesn’t have much positive valence. I think, when we say we “enjoy” being under a weighted blanket, the pleasure signal is more like “transient pleasure upon starting to be under the blanket, and transient displeasure upon stopping, but not really continuous pleasure during the process, or at least not so much pleasure that we just dwell on that feeling; instead, our mind starts wandering elsewhere (partly due to boredom).” Not many experiences are so pleasurable that we’re really meditating on it for an extended period, at least not without deliberate effort towards mindfulness. Right?
Or if I’m still misunderstanding, can you try again?
This is the idea that at some point in scaling up an organization you could lose efficiency due to needing more/better management, more communication (meetings) needed and longer communication processes, "bloat" in general. I'm not claiming it’s likely to happen with AI, just another possible reason for increasing marginal cost with scale.
Hmm, that would apply to an individual firm but not to a product category, right? If Firm 1 is producing so much [AGI component X] that they pile up bureaucracy and inefficiency, then Firms 2, 3, 4, and 5 will start producing [AGI component X] with less bureaucracy, and undercut Firm 1, right? If there’s an optimal firm size, the market can still be arbitrarily large via arbitrarily many independent firms of that optimal size.
(Unless Firm 1 has a key patent, or uses its market power to do anticompetitive stuff, etc. …although I don’t expect IP law or other such forces to hold internationally given the stakes of AGI.)
(Separately, I think AGI will drastically increase economies of scale, particularly related to coordination problems.)
I see how this could happen, but I'm not convinced this effect is actually happening. … I think my crux is that this isn't unique to economists.
It’s definitely true that non-economists are capable of dismissing AGI for bad reasons, even if this post is not mainly addressed at non-economists. I think the thing I said is a contributory factor for at least some economists, based on my experience and conversations, but not all economists, and maybe I’m just mistaken about where those people are coming from. Oh well, it’s probably not worth putting too much effort into arguing about Bulverism. Thanks for your input though.
Yeah, the latter, I think too much econ makes the very possibility of AGI into a blind spot for (many) economists. See the second part of my comment here.
Thanks. I don’t think we disagree much (more in emphasis than content).
Things like resource scarcity or coordination problems can cause increasing marginal cost with scale.
I understand “resource scarcity” but I’m confused by “coordination problems”. Can you give an example? (Sorry if that’s a stupid question.)
Resource scarcity seems unlikely to bite here, at least not for long. If some product is very profitable to create, and one of its components has a shortage, then people (or AGIs) will find ways to redesign around that component. AGI does not fundamentally need any rare components. Biology proves that it is possible to build human-level computing devices from sugar and water and oxygen (i.e. brains). As for electricity, there’s plenty of solar cells, and plenty of open land for solar cells, and permitting is easy if you’re off-grid.
(I agree that the positive feedback loop will not spin out to literally infinity in literally zero time, but stand by “light-years beyond anything in economic history”.)
I think economists don’t consider AGI launching coups or pursuing jobs/entrepreneurship independently because they don’t expect it to have those capabilities or dispositions, not that they conflate it with inanimate capital. … right now, the difference in the economists and lesswrongers comes down to what capabilities they expect AGI to have.
I wasn’t complaining about economists who say “the consequences of real AGI would be [crazy stuff], but I don’t expect real AGI in [time period T / ever]”. That’s fine!
(Well, actually I would still complain if they state this as obvious, rather than owning the fact that they are siding with one group of AI domain experts over a different group of AI domain experts, about a technical AI issue on which the economists themselves have no expertise. And if T is more than, I dunno, 30 years, then that makes it even worse, because then the economists would be siding with a dwindling minority of AI domain experts over a growing majority, I think.)
Instead I was mainly complaining about the economists who have not even considered that real AGI is even a possible thing at all. Instead it’s just a big blind spot for them.
And I don’t think this is independent of their economics training (although non-economists are obviously capable of having this blind spot too).
Instead, I think that (A) “such-and-such is just not a thing that happens in economies in the real world” and (B) “real AGI is even a conceivable possibility” are contradictory. And I think that economists are so steeped in (A) that they consider it to be a reductio ad absurdum for (B), whereas the correct response is the opposite ((B) disproves (A)).
For them, real AGI does not compute, it’s like a square circle, and people like me who talk about it are not just saying something false but saying incoherent nonsense, or maybe they think they’re misunderstanding us and they’ll “charitably” round what I’m saying to something quite different, and they themselves will use terms like “AGI” or “ASI” for something much weaker without realizing that they’re doing so.
Uhh, well, technically I wrote that sentence as a conditional, and technically I didn’t say whether or not the condition applied to you-in-particular.
…I hope you have good judgment! For that matter, I hope I myself have good judgment!! Hard to know though. ¯\_(ツ)_/¯