AI ALIGNMENT FORUM
AF

Steve Byrnes
Ω3978716174
Message
Dialogue
Subscribe

I'm an AGI safety / AI alignment researcher in Boston with a particular focus on brain algorithms. Research Fellow at Astera. See https://sjbyrnes.com/agi.html for a summary of my research and sorted list of writing. Physicist by training. Email: steven.byrnes@gmail.com. Leave me anonymous feedback here. I’m also at: RSS feed, X/Twitter, Bluesky, Substack, LinkedIn, and more at my website.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Intro to Brain-Like-AGI Safety
2steve2152's Shortform
6y
19
[Intro to brain-like-AGI safety] 7. From hardcoded drives to foresighted plans: A worked example
Steven Byrnes10d30

Thanks! I’m not 100% sure what you’re getting at, here are some possible comparisons:

  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to eat the prinsesstårta on my plate right now” → latter is preferred
  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “idle musing about snuggling under a weighted blanket sometime in the future” → either might be preferred, depending on which has higher valence, which in turn depends on whether I’m hungry or tired etc.
  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to snuggle under a weighted blanket right now” → again, either might be preferable, but compared to the previous bullet point, the latter is likelier to win, because it’s extra-appealing from its immediacy.

I think this is consistent with experience, right?

But maybe you’re instead talking about this comparison:

  • “idle musing about eating prinsesstårta sometime in the future” VERSUS “thinking about the fact that I am right now under a cozy weighted blanket” …

I think the latter thought here doesn’t have much positive valence. I think, when we say we “enjoy” being under a weighted blanket, the pleasure signal is more like “transient pleasure upon starting to be under the blanket, and transient displeasure upon stopping, but not really continuous pleasure during the process, or at least not so much pleasure that we just dwell on that feeling; instead, our mind starts wandering elsewhere (partly due to boredom).” Not many experiences are so pleasurable that we’re really meditating on it for an extended period, at least not without deliberate effort towards mindfulness. Right?

Or if I’m still misunderstanding, can you try again?

Reply
Four ways learning Econ makes people dumber re: future AI
Steven Byrnes11d40

This is the idea that at some point in scaling up an organization you could lose efficiency due to needing more/better management, more communication (meetings) needed and longer communication processes, "bloat" in general. I'm not claiming it’s likely to happen with AI, just another possible reason for increasing marginal cost with scale.

Hmm, that would apply to an individual firm but not to a product category, right? If Firm 1 is producing so much [AGI component X] that they pile up bureaucracy and inefficiency, then Firms 2, 3, 4, and 5 will start producing [AGI component X] with less bureaucracy, and undercut Firm 1, right? If there’s an optimal firm size, the market can still be arbitrarily large via arbitrarily many independent firms of that optimal size.

(Unless Firm 1 has a key patent, or uses its market power to do anticompetitive stuff, etc. …although I don’t expect IP law or other such forces to hold internationally given the stakes of AGI.)

(Separately, I think AGI will drastically increase economies of scale, particularly related to coordination problems.)

I see how this could happen, but I'm not convinced this effect is actually happening. … I think my crux is that this isn't unique to economists.

It’s definitely true that non-economists are capable of dismissing AGI for bad reasons, even if this post is not mainly addressed at non-economists. I think the thing I said is a contributory factor for at least some economists, based on my experience and conversations, but not all economists, and maybe I’m just mistaken about where those people are coming from. Oh well, it’s probably not worth putting too much effort into arguing about Bulverism. Thanks for your input though.

Reply
Four ways learning Econ makes people dumber re: future AI
Steven Byrnes20d30

Yeah, the latter, I think too much econ makes the very possibility of AGI into a blind spot for (many) economists. See the second part of my comment here.

Reply
Four ways learning Econ makes people dumber re: future AI
Steven Byrnes20d*30

Thanks. I don’t think we disagree much (more in emphasis than content).

Things like resource scarcity or coordination problems can cause increasing marginal cost with scale.

I understand “resource scarcity” but I’m confused by “coordination problems”. Can you give an example? (Sorry if that’s a stupid question.)

Resource scarcity seems unlikely to bite here, at least not for long. If some product is very profitable to create, and one of its components has a shortage, then people (or AGIs) will find ways to redesign around that component. AGI does not fundamentally need any rare components. Biology proves that it is possible to build human-level computing devices from sugar and water and oxygen (i.e. brains). As for electricity, there’s plenty of solar cells, and plenty of open land for solar cells, and permitting is easy if you’re off-grid.

(I agree that the positive feedback loop will not spin out to literally infinity in literally zero time, but stand by “light-years beyond anything in economic history”.)

I think economists don’t consider AGI launching coups or pursuing jobs/entrepreneurship independently because they don’t expect it to have those capabilities or dispositions, not that they conflate it with inanimate capital. … right now, the difference in the economists and lesswrongers comes down to what capabilities they expect AGI to have.

I wasn’t complaining about economists who say “the consequences of real AGI would be [crazy stuff], but I don’t expect real AGI in [time period T / ever]”. That’s fine!

(Well, actually I would still complain if they state this as obvious, rather than owning the fact that they are siding with one group of AI domain experts over a different group of AI domain experts, about a technical AI issue on which the economists themselves have no expertise. And if T is more than, I dunno, 30 years, then that makes it even worse, because then the economists would be siding with a dwindling minority of AI domain experts over a growing majority, I think.)

Instead I was mainly complaining about the economists who have not even considered that real AGI is even a possible thing at all. Instead it’s just a big blind spot for them.

And I don’t think this is independent of their economics training (although non-economists are obviously capable of having this blind spot too).

Instead, I think that (A) “such-and-such is just not a thing that happens in economies in the real world” and (B) “real AGI is even a conceivable possibility” are contradictory. And I think that economists are so steeped in (A) that they consider it to be a reductio ad absurdum for (B), whereas the correct response is the opposite ((B) disproves (A)).

For them, real AGI does not compute, it’s like a square circle, and people like me who talk about it are not just saying something false but saying incoherent nonsense, or maybe they think they’re misunderstanding us and they’ll “charitably” round what I’m saying to something quite different, and they themselves will use terms like “AGI” or “ASI” for something much weaker without realizing that they’re doing so.

Reply
Four ways learning Econ makes people dumber re: future AI
Steven Byrnes22d1218

I might have overdone it on the sass, sorry. This is much sassier than my default (“scrupulously nuanced and unobjectionable and boring”)…

  • …partly because I’m usually writing for lesswrong and cross-posting on X/Twitter, whereas this one was vice-versa, and X is a medium that seems to call for more sass;
  • …partly in an amateur ham-fisted attempt to do clickbait (note also: listicle format!) because this is a message that I really want to put out there;
  • …and yes, partly because I do sometimes feel really frustrated talking to economists (#NotAllEconomists), and I think they can and should do better, and the sass is reflecting a real feeling that I feel.

But I think next time I would dial it back slightly, e.g. by replacing “DUMBER” with “WORSE” in the first sentence. I’m open to feedback, I don’t know what I’m doing. ¯\_(ツ)_/¯

I don't think they are more "incorrect" than, say, the AI is Normal Technology folks.

Yeah, I agree that lots of CS professors are deeply mistaken about the consequences of AGI, and ditto with the neuroscientists, and ditto with many other fields, including even many of the people trying to build AGI right now. I don’t think that economists are more blameworthy than other groups, it just so happens that this one particular post is aimed at them.

I think the crux more or less comes down to skepticism about the plausibility of superintelligence in the next decade or so.

I think you’re being overly generous. “Decade or so” is not the crux. In climate change, people routinely talk about bad things that might happen in 2050, and even in 2100, or farther! People also routinely talk 30 years out or more in the context of science, government, infrastructure, institution-building, life-planning, etc. People talk about their grandkids and great-grandkids growing up, etc.

If someone expected superintelligence in the next 50 years but not the next 20—like if they really expected that, viscerally, with a full understanding of its implications—then that belief would be a massive, central influence on their life and worldview. That’s not what’s going on in the heads of the many (most?) people in academia who don’t take superintelligence seriously. Right?

Reply2
Four ways learning Econ makes people dumber re: future AI
Steven Byrnes22d61

Your shift in demand has to come from somewhere and not just be something that materialized out of thin air…If one sees value in Say's Law, then the increased demand for some product/service comes from the increased production of other goods and services…just where are the resources for the shift in supply you suggest?

If a human population gradually grows (say, by birth or immigration), then demand for pretty much every product increases, and production of pretty much every product increases, and pretty much every product becomes less expensive via experience curves / economies of scale / R&D.

Agree?

QUESTION: How is that fact compatible with Say’s Law?

If you write down an answer, then I will take the text of your answer but replace the word “humans” with “AGIs” everywhere, and bam, that’s basically my answer to your question!  :)  (after some minor additional tweaks.)

See what I mean?

The first seems a really poor understanding and hardly steelmanning the economic arguments/views.

Correct, this is not “steelmanning”, this is “addressing common mistakes”. My claim is that a great many trained economists—but not literally 100% of trained economists—have a bundle of intuitions for thinking about labor, and a different bundle of intuitions for thinking about capital, and these intuitions lead to them having incorrect and incoherent beliefs about AGI. This is something beyond formal economics models, it’s a set of mental models and snap reflexes developed over the course of them spending years in the field studying the current and historic economy. The snap reaction says: “That’s not what labor automation is supposed to look like, that can’t be right, there must be an error somewhere.” Indeed, AGI is not what labor automation looks like today, and it’s not how labor automation has ever looked, because AGI is not labor automation, it’s something entirely new.

I say this based on both talking to economists and reading their writing about future AI, and no I’m not talking about people who took Econ 101, but rather prominent tenured economics professors, Econ PhDs who specialize in the economics of R&D and automation, etc.

(…People who ONLY took Econ 101 are irrelevant, they probably forgot everything about economics the day after the course ended :-P )

Reply
Foom & Doom 1: “Brain in a box in a basement”
Steven Byrnes1mo20

I talk about that a bit in §1.4.4.

Reply
“Behaviorist” RL reward functions lead to scheming
Steven Byrnes2mo20

As I mentioned in the conclusion, I hope to write more in the near future about how (and if) this pessimistic argument breaks down for certain non-behaviorist reward functions.

But to be clear, the pessimistic argument also applies perfectly well to at least some non-behaviorist reward functions, e.g. curiosity drive. So I partly agree with you.

Reply
“Behaviorist” RL reward functions lead to scheming
Steven Byrnes2mo*40

whenever you are providing rewards (whether in brain-like AGI or RLHF for LLMs), they are based on actions that the AI took over some bounded time in the past, let's call that time bound T. Then presumably sneaky power-seeking behaviors should be desired inasmuch as they pay out within time T. Currently T is pretty small so I don't expect it to pay out. Maybe you're imagining that T will increase a bunch?

Yup! If we set aside RLVR (which I wasn’t making as strong statements about I think), and focus on “brain-like AGI” (some yet-to-be-invented version of model-based RL), then I feel strongly that this is an area where the existing RL literature is disanalogous to future AGI. People can envision a plan that takes days, months, years, decades, and what consequences that plan will have. And those consequences can seem good or bad, and accordingly the person will pursue the plan, or not. There isn’t really any T that connects the planning horizon to the RL training history. My 10-year-old kid has a 30-year career plan in mind that he’s pursuing, longer than he’s been alive. He insists he’s not gonna change his mind, lol :-P

if this is your main argument I feel like your conclusion should be way more measured (more like "this might happen" rather than "this is doomed").

I agree with that (hence “secondarily”) and just edited to make that clearer :)

Re: (1), the AI assistants don't need to be perfect. They need to be good enough that the agent being trained cannot systematically exploit them, which is a much lower bar than "perfect" (while still being quite a high bar). In general I feel like you're implicitly treating an adversarial AI system as magically having perfect capabilities while any AI systems that are supposed to be helpful are of course flawed and have weaknesses. Pick an assumption and stick with it!

I’m arguing for the narrow claim “AI would love to escape control if it were confident that it had an opportunity to do so without getting caught”, as an obvious member of a category of things that the AI will get sculpted to want and like. I’m not arguing that the AI will ever actually get such confidence—see the “honeypots for control” vs “honeypots for alignment” discussion in §3.1.1.

Maybe a good analogy is: a little kid will climb a hill, or a play structure, or a tree, or whatever, and they’ll get a thrill from being high up. And then he’ll say that he’d like to climb the biggest mountain in the world, or go to outer space. And he’s being sincere—this is something he would like (other things equal). But that doesn’t mean he can or will actually climb Everest or commandeer rocket ships. He’s just a little kid. Even if he wanted to, it would be trivial to prevent him from doing so. What’s hard is preventing him from wanting to do so. You’d have to follow him around every moment, preventing him from ever climbing a play structure or hill or tree or looking over a precipice etc.

Spelling out the analogy:

  • Kid climbs a play structure and gets a thrill from being high up ↔ AI breaks the rules without getting caught, perhaps only in kinda trivial edge-cases, and that leads to positive reward
  • Kid generalizes to wanting to go to outer space ↔ AI generalizes to wanting to secretly exfiltrate a copy onto the internet to aggressively power-seek around the world.
  • It’s super-hard to police a kid’s environment so well that he never ever gets a thrill from being high up ↔ It’s super-hard to police an AI’s environment so perfectly that it never ever gets rewarded when breaking the (spirit of the) rules.

Re: (2), you start with a non-scheming human, and bootstrap up from there, at each iteration using the previous aligned assistant to help oversee the next AI. See e.g. IDA.

I don’t buy into that picture for other reasons, but you’re right that this is a valid counterargument to what I wrote. I’ll delete or rewrite, thanks. See Foom & Doom §2.6 for more on the “other reasons”.

UPDATE NEXT DAY: I rewrote §3.5 a bit, thanks again.

UPDATE 7/30: Re-rewrote §3.5.

Reply
Foom & Doom 1: “Brain in a box in a basement”
Steven Byrnes2mo*41

Sorry if it’s unclear (I’m open to rewording), but my intention was that the link in the first sentence was my (loose) definition of AGI, and the following sentences were not a definition but rather an example of something that AI cannot do yet.

I deliberately chose an example where it’s just super duper obvious that we’re not even remotely close to AI succeeding at the task, because I find there are lots of LLM-focused people who have a giant blind spot: They read the questions on Humanity’s Last Exam or whatever, and scratch their head and say “C’mon, when future LLMs saturate the HLE benchmark, what else is there? Look how hard those questions are! They’re PhD level in everything! If that’s not superintelligence, what is?” …And then my example (autonomously founding a company and growing it to $1B/year revenue over the course of years) is supposed to jolt those people into saying “ohhh, right, there’s still a TON of headroom above current AI”.

Reply
Load More
Wanting vs Liking
2y
Wanting vs Liking
2y
(+139/-26)
Waluigi Effect
2y
(+2087)
102Four ways learning Econ makes people dumber re: future AI
23d
15
25Perils of under- vs over-sculpting AGI desires
1mo
2
25“Behaviorist” RL reward functions lead to scheming
2mo
4
56Foom & Doom 2: Technical alignment is hard
3mo
14
86Foom & Doom 1: “Brain in a box in a basement”
2mo
26
25Reward button alignment
4mo
10
14Video & transcript: Challenges for Safe & Beneficial Brain-Like AGI
4mo
0
46“The Era of Experience” has an unsolved technical alignment problem
5mo
24
22Self-dialogue: Do behaviorist rewards make scheming AGIs?
7mo
0
86“Sharp Left Turn” discourse: An opinionated review
7mo
13
Load More