G Gordon Worley III

Director of Research at PAISRI


Formal Alignment


To what extent are the scaling properties of Transformer networks exceptional?

Most systems eventually face scaling bottlenecks. In fact, unless your system is completely free of coordination, it definitely has bottlenecks even if you haven't scaled large enough to hit them. And since Transformers definitely require some coordination since no matter how large the models are and how much parallelism their hardware supports they still produce a single reduced output, we should expect that there are some scaling limits on Transformers that at some size will prevent them for effectively taking advantage of having a larger network.

Further, you point at this a bit, but most systems also experiencing diminishing returns on performance for additional resources because of these constraints.

Transformers may just be special in that they have yet to start hitting diminishing returns because we haven't yet run up against their coordination bottlenecks, although that doesn't make them too special since we should expect them to still have them lying in wait somewhere, just like they do in every other system that is not coordination free.

Developmental Stages of GPTs

You're careful here to talk about transformative AI rather than AGI, and I think that's right. GPT-N does seem like it stands to have transformative effects without necessarily being AGI, and that is quite worrisome. I think many of us expected to find ourselves in a world where AGI was primarily what we had to worry about, and instead we're in a world where "lesser" AI is on track to be powerful enough to dramatically change society. Or at least, so it seems from where we stand, extracting out the trends.

What are the high-level approaches to AI alignment?

Based on comments/links so far it seems I should revise the names and add a fourth:

  • IDA = IDA
  • IRL -> Ambitious Value Learning (AVL)
  • DTA -> Embedded Agency (EA)
  • + Brain Emulation (BE)
    • Build AI that either emulates how humans brains work or is bootstrapped from human brain emulations.
What are the high-level approaches to AI alignment?

Oh, I forgot about emulation approaches, i.e. bootstrap AI by "copying" human brains, which you mention. Thanks!

What are the high-level approaches to AI alignment?

That's true, but there's a natural and historical relationship here with what was in the past termed "seed AI", even if this is not an approach anyone is actively pursuing, which is the kind of thing I was hoping to point at without using that outmoded term.

What are the high-level approaches to AI alignment?

Thanks. Your post specifically is pretty helpful because it helps with one of the things that was tripping me up, which is what standard names people call different methods. Your names do a better job of capturing them than mine did.

What are the high-level approaches to AI alignment?

Actually this post was not especially helpful for my purpose and I should have explained why in advance because I anticipated someone would link it. Although it helpfully lays out a number of proposals people have made, it does more to work out what's going on with those proposals rather than find ways they can be grouped together (except incidentally). I even reread this post before posting this question and it didn't help me improve on the taxonomy I proposed, which I already had in mind as of a few months ago.

What are the high-level approaches to AI alignment?

My initial thought is that there are at least 3, which I'll give the follow names (with short explanations):

  • Iterated Distillation and Amplification (IDA)
    • Build an AI, have it interact with a human, create a new AI based on the interaction of the human and the AI, and repeat until the AI is good enough or it reaches a fixed point and additional iterations don't change it.
  • Inverse Reinforcement Learning (IRL)
    • Build an AI that tries to infer human values from observations and then acts based on those inferred values.
  • Decision Theorized Agent (DTA)
    • Build an AI that uses a decision theory that causes it to make choices that will be aligned with human interests.

All of these are woefully underspecified, so improved summaries of these approaches that you think accurately explain these approaches also appreciated.

Cartesian Boundary as Abstraction Boundary

Lately I've been thinking a bit about why programmers have preferences for different programming languages. Like, why is it that I want a language that is (speaking informally)

  • flexible (lets me do things many ways)
  • dynamic (decides what to do at run time rather than compile time)
  • reflexive (let's me change the program source while it's running, or change how to interpret a computation)

and other people want the opposite:

  • rigid (there's one right way to do something)
  • static (decides what to do at compile time, often so much so that static analysis is possible)
  • fixed (the same code is guaranteed to always behave the same way no matter the execution context)

And I think a reasonable argument might be a difference in how much different programmers value the creation of a Cartesian boundary in the language, i.e. how much they want to be able to reason about the program as if it existed outside the execution environment. My preference is to move along the "embedded" end of a less-to-more Cartesian-like abstraction dimension of program design, preferring abstractions that, if not less hide the underlying structure, then at least more make it accessible.

This might not be perfect as I'm babbling a bit to try to build out my model of what the difference in preference comes from, but seems kind of interesting and a possible additional example of where this kind of Cartesian-like boundary gets created.

Goal-directedness is behavioral, not structural

Attempting to approach goal directedness behaviorally is, I expect, going to run into the same problems as trying to infer policy from behaviors only: you can't do it unless you make some normative assumption. This is exactly analogous to the Armstrong's No Free Lunch Theorem for value learning and, to turn it around the other way, we can similarly assign any goal whatsoever to a system based solely on its behavior unless we make some sufficiently strong normative assumption about it.

Load More