Adele Lopez

Wiki Contributions


Strong encouragement to write about (1)!

Alright, to check if I understand, would these be the sorts of things that your model is surprised by?

  1. An LLM solves a mathematical problem by introducing a novel definition which humans can interpret as a compelling and useful concept.
  2. An LLM which can be introduced to a wide variety of new concepts not in its training data, and after a few examples and/or clarifying questions is able to correctly use the concept to reason about something.
  3. A image diffusion model which is shown to have a detailed understanding of anatomy and 3D space, such that you can use it to transform an photo of a person into an image of the same person in a novel pose (not in its training data) and angle with correct proportions and realistic joint angles for the person in the input photo.

Is there a specific thing you think LLMs won't be able to do soon, such that you would make a substantial update toward shorter timelines if there was an LLM able to do it within 3 years from now?

That... seems like a big part of what having "solved alignment" would mean, given that you have AGI-level optimization aimed at (indirectly via a counter-factual) evaluating this (IIUC).

Nice graphic!

What stops e.g. "QACI(expensive_computation())" from being an optimization process which ends up trying to "hack its way out" into the real QACI?


For the poset example, I'm using Chu spaces with only 2 colors. I'm also not thinking of the rows or columns of a Chu space as having an ordering (they're sets), you can rearrange them as you please and have a Chu space representing the same structure.

I would suggest reading through to the ## There and Back Again section and in particular while trying to understand how the other poset examples work, and see if that helps the idea click. And/or you can suggest another coloring you think should be possible, and I can tell you what it represents.

I'm not sure if I can find it easily, but I recall Eliezer pointing out (several years ago) that he thought that Value Identification was the "easy part" of the alignment problem, with the getting it to care part being something like an order of magnitude more difficult. He seemed to think (IIRC) this itself could still be somewhat difficult, as you point out. Additionally, the difficulty was always considered in the context of having an alignable AGI (i.e. something you can point in a specific direction), which GPT-N is not under this paradigm.

A human can write a rap battle in an hour. A GPT loss function would like the GPT to be intelligent enough to predict it on the fly.

Very minor point, but humans can rap battle on the fly:

This market by Eliezer about the possible reasons why AI may yet have a positive outcome seems to refute your first sentence.

Also, I haven't seen any AI notkilleveryoneism people advocating terrorism or giving up.

This does not seem like it counts as "publicly humiliating" in any way? Rude, sure, but that's quite different.

Load More