Prizes for ELK proposals

by Paul Christiano11 min read3rd Jan 202225 comments

45

Bounties (active)Eliciting Latent Knowledge (ELK)AICommunity
Frontpage

ARC recently released a technical report on eliciting latent knowledge (ELK), the focus of our current research. Roughly speaking, the goal of ELK is to incentivize ML models to honestly answer “straightforward” questions where the right answer is unambiguous and known by the model. 

ELK is currently unsolved in the worst case—for every training strategy we’ve thought of so far, we can describe a case where an ML model trained with that strategy would give unambiguously bad answers to straightforward questions despite knowing better. Situations like this may or may not come up in practice, but nonetheless we are interested in finding a strategy for ELK for which we can’t think of any counterexample.

We think many people could potentially contribute to solving ELK—there’s a large space of possible training strategies and we’ve only explored a small fraction of them so far. Moreover, we think that trying to solve ELK in the worst case is a good way to “get into ARC’s headspace” and more deeply understand the research we do.

We are offering prizes of $5,000 to $50,000 for proposed strategies for ELK. We’re planning to evaluate submissions as we receive them, between now and the end of January; we may end the contest earlier or later if we receive more or fewer submissions than we expect.

For full details of the ELK problem and several examples of possible strategies, see the writeup. The rest of this post will focus on how the contest works.

Contest details

To win a prize, you need to specify a training strategy for ELK that handles all of the counterexamples that we’ve described so far, summarized in the section below—i.e. where the breaker would need to specify something new about the test case to cause the strategy to break down. You don’t need to fully solve the problem in the worst case to win a prize, you just need to come up with a strategy that requires a new counterexample.

We’ll give a $5,000 prize to any proposal that we think clears this bar. We’ll give a $50,000 prize to a proposal which we haven’t considered and seems sufficiently promising to us or requires a new idea to break. We’ll give intermediate prizes for ideas that we think are promising but we’ve already considered, as well as for proposals that come with novel counterexamples, clarify some other aspect of the problem, or are interesting in other ways. A major purpose of the contest is to provide support for people understanding the problem well enough to start contributing; we aren’t trying to only reward ideas that are new to us.

You can submit multiple proposals, but we won’t give you separate prizes for each—we’ll give you at least the maximum prize that your best single submission would have received, but may not give much more than that.

If we receive multiple submissions based on a similar idea, we may post a comment describing the idea (with attribution) along with a counterexample. Once a counterexample has been included in the comments of this post, new submissions need to address that counterexample (as well as all the existing ones) in order to be eligible for a prize. 

Ultimately prizes are awarded at our discretion, and the “rules of the game” aren’t fully precise. If you are curious about whether you are on the right track, feel free to send an email to elk@alignmentresearchcenter.org with the basic outline of an idea, and if we have time we’ll get back to you with some feedback. Below we also describe some of the directions we consider more and less promising and some general guidance.

How to submit a proposal

You can submit a proposal by copying this google doc template and sharing it with elk@alignmentresearchcenter.org (please give comment access in case we need to ask questions to evaluate your submission). By submitting a proposal, you are giving us permission to post the idea here with attribution. (And it's fine for you to post it after the contest or we post a counterexample.)

Retroactive prizes

We’re giving prizes to existing proposals from David Dalrymple ($5k), Ramana Kumar ($3k), John Maxwell ($2k), and Yonadav Shavit ($1k). We’ve added relevant counterexamples in the next section, and further submissions need to handle those counterexamples in order to be eligible.

Existing counterexamples

In our report and a recent post, we described some features of the world that would cause existing approaches to ELK to fail. A new approach should work even in a situation satisfying all of these properties, or should include an argument for why some of these features are actually impossible or inconsistent with each other.

In order to generate a counterexample to a training strategy, we need to spell out a concrete “way the world could be” (a test case) where the strategy would fail. In the report we consider test cases where the AI and the human happen to understand the world by doing inference in Bayes nets. The Bayes net case is just one test case that lets us get precise about our counterexamples and our training strategies don’t lean on it. In practice we also consider other concrete cases (e.g. deduction from premises generated by a neural network).

In this section we’ll briefly review the current state of existing counterexamples. This isn’t intended to be a self-contained description; it relies heavily on links to the report.

Ontology mismatch

All of our counterexamples are based on an ontology mismatch between two different Bayes nets, one used by an ML prediction model (“the predictor”) and one used by a human:

  • The predictor’s Bayes net is used internally by a prediction model that maps an initial segment of a video to a prediction about what will happen next. This model was trained by gradient descent based on the quality of its predictions (more).
  • Humans have their own model of the world, which they can improve with the help of AI assistants; those assistants can design and conduct experiments, find and evaluate arguments, and so on. Crucially, we have a limited computational budget with which to train these AI assistants, and that limits how much help they can provide. The “human’s” Bayes net captures everything the human understands about the world at the end of this process (more).

We are interested in the case where there are features of reality that are accurately reflected in the predictor’s Bayes net, but missing from this (augmented) human Bayes net. We expect the human could understand these facts eventually given enough time or enough AI assistance, but not using the available computational budget. Put differently, there is a gap between the kinds of understanding that gradient descent can build, and the kind of understanding that a human-comprehensible process of “scientific inquiry” can build using similar computational resources (more).

Furthermore, we imagine that both the predictor’s Bayes net and the human’s Bayes net reflect some underlying reality and that there is a way to map between them, which we’ll call “direct translation.” We don’t know how to do direct translation, and we are concerned that if we simply search for any mapping from the AI’s Bayes net to the human’s Bayes net we will instead get the “human simulator”: the map which correctly translates the observations from one Bayes net to the other, but then does inference in the human’s Bayes net in order to fill in the latent structure as a human would by looking at the data (more).

Counterexample features

ELK seems particularly hard when these Bayes nets satisfy the following properties:

  1. There are arbitrarily severe “ontology mismatches” between the predictor’s Bayes net and the human’s Bayes net, such that specifying the direct translation between them can be very complex (more).
  2. The human’s Bayes net is simpler than the predictor’s Bayes net, and the gap can be arbitrarily large (more).
  3. The predictor’s Bayes net is arbitrarily large, and individual nodes of the human’s Bayes net depend on a constant fraction of the predictor’s Bayes net. Therefore directly translating between the two Bayes nets takes arbitrarily much computation (more).
  4. It’s computationally easy for the human simulator to do good inference in the human’s Bayes net—at least good enough inference that it answers questions as well as the direct translator—and the gap between this complexity and the complexity of direct translation can be arbitrarily large. More precisely, a human interrogator can’t think of questions which require the human simulator to do hard inference. However, we can’t rule out the possibility that you, the algorithm-designer, will think of some trick they can use to generate hard questions (more).
  5. There are situations where a model which is trying to answer a question will first be able to predict what (potentially mistaken) belief the human-with-assistance will have about that question before it knows the true answer to the question (more).
  6. While the predictor’s Bayes net is very large, it has a short succinct description which is simpler than the human’s Bayes net, and that gap is arbitrarily large. Think of simple physical laws that describe arbitrarily computationally expensive simulations, or simple cognitive heuristics that are very widely applicable. (more)
  7. There are sequences of actions for which the predictor’s Bayes net correctly anticipates that tampering will occur yet nothing looks suspicious to the human’s Bayes net.
  8. After tampering has occurred, it is possible that no sequence of further actions reveal that anything is wrong, e.g. the camera and actuators may have been completely disconnected from the world and replaced with a convincing-to-humans simulation (more).
  9. There are sequences of actions leading to undetectable tampering that have no human-comprehensible similarities to sequences of actions leading to detectable tampering; in particular, the early steps of this undetectable tampering cannot be used to perform detectable tampering (more).

An important and very general strategy for the breaker is to suggest counterexamples where you “get what you train for.” If you propose training a model using loss function L, then the breaker will likely focus on counterexamples where the model chooses its output to minimize L. A successful approach should either explain why this either isn’t a realistic thing for the model to learn (e.g. why that would be disfavored by regularization), or why the overall approach works even if the model is explicitly strategizing about how to minimize the loss.

Rough guidance

  • There is a tradeoff between specifying your approach precisely, and including your own counterexample. It’s fine to describe a very broad/vague idea for solving ELK, and then present a new counterexample showing why that whole category of solutions can’t work. But if you don’t propose a counterexample, then it’s good to describe your proposal precisely enough that we understand how to actually implement it without wondering if that’s really what you meant. It’s OK to submit a very broad or informal idea together with a single very specific instance of that idea, as long as there is some version we can understand precisely.
  • We suspect you can’t solve ELK just by getting better data—you probably need to “open up the black box” and include some term in the loss that depends on the structure of your model and not merely its behavior. So we are most interested in approaches that address that challenge. We could still be surprised by clever ways to penalize behavior, but we’ll hold them to a higher bar. The most plausible surprise would be finding a way to reliably make it computationally difficult to “game” the loss function, probably by using the AI itself to help compute the loss (e.g. using consistency checks or by giving the human AI assistance).
  • If you are specifying a regularizer that you hope will prefer direct translation over human simulation, you should probably have at least one concrete case in mind that has all the counterexample-features above and where you can confirm that your regularizer does indeed prefer the direct translator.
  • ELK already seems hard in the case of ontology identification, where the predictor uses a straightforward inference algorithm in an unknown model of the world (which we’ve been imagining as a Bayes net). When coming up with a proposal, we don’t recommend worrying about cases where the original unaligned predictor learned something more complicated (e.g. involving learned optimization other than inference). That said, you do need to worry about the case where your training scheme incentivizes learned optimization that may not have been there originally.

Ask dumb questions!

A major purpose of this contest is to help people build a better understanding of our research methodology and the “game” we are playing. So we encourage people to ask clarifying questions in the comments of this post (no matter how “dumb” they are), and we’ll do our best to answer all of them. You might also want to read the comments to get more clarity about the problem.

What you can expect from us

  • We’ll try to answer all clarifying questions in the comments.
  • If you send in a rough outline for a proposal, we will try to understand whether it might qualify and write back something like “This qualifies,” “This might qualify but would need to be clearer and address issue X,” “We aren’t easily able to understand this proposal at all,” “This is unlikely to be on track for something that qualifies,” or “This definitely doesn’t qualify.”
  • If there are more submissions than expected, we may run out of time to respond to all submissions and comments, in which case we will post an update here.

45

38 comments, sorted by Highlighting new comments since Today at 5:47 PM
New Comment

After reading through the report I wanted to make sure I understood the scenarios and counterexamples being discussed and be able to quickly refresh my memory, so I attempted to write a brief summary. Figured I'd share it here in case it helps anyone else.

Roles and Terms

SmartVault: Vault with a diamond in it, operated by a superintelligent AI tasked with keeping the diamond safe.

Predictor: The primary AI tasked with protecting the diamond. The predictor sees a video feed of the vault, predicts what actions are necessary to protect the diamond and how those actions will play out (for example, activating a trap door to eliminate a robber trying to steal the diamond), and then generates a video showing precisely what will happen.

Reporter: A second "head" or component added to the AI which is used to map the AI's understanding of what's going on in the SmartVault into a format understandable by humans (such as by responding to questions).

Diagram of Predictor Showing Reporter from ELK Report

Humans: Observers who are training the AI to protect the diamond, but also want the AI to accurately report whether the diamond has been stolen or not. The humans may not be able to tell if the cameras have been tampered with or if the diamond has been replaced with a fake depending on what they see in the predicted video.
 

Scenarios

The scenarios take the former of a "Builder" strategy where training is proposed to keep the AI aligned with what the humans want, and then a "Breaker" response or counterexample is provided that describes a method where an unaligned AI could complete the training without the humans knowing. Builder scenarios include text and links to headings in the report where more details can be found. These scenarios are simplified for brevity and may not capture some nuances.

Builder - What You'd Try First: Operate the SmartVault with a weak AI so that humans can understand everything happening. Train the reporter to answer questions accurately based on human observations. (Humans see the robber come in and take the diamond, reporter is rewarded if it correctly says diamond was stolen.)

Breaker: What if the reporter, instead of accurately reporting, modeled itself like a "human simulator" where it would answer correctly during the simple training but during normal SmartVault operation it would simulate human understanding of the scenario that just occurred in order to determine what answers the humans will believe, answers that may be lies.

Builder - Have a Human Operate the SmartVault: Have a human operate the SmartVault during training and try to fool other humans watching. The human doing the operating can correctly label each outcome, and the reporter is rewarded for matching the human operator's labels. In this scenario the reporter would be rewarded for acting more like a "direct translator", which means using the predictor's knowledge to tell the true answer instead of just the one it thinks humans will believe.

Breaker: The reporter could act like a superintelligent human simulator, one that is more intelligent than the human operating the SmartVault during training. The superintelligent human simulator would be aware of ways to trick the human observer that the human SmartVault operator couldn't conceive of during training. If the predictor uses those tricks the reporter could still tell lies that the human observer would believe, because these tricks are different from the ones seen during training.

Builder - Have AI Help Humans: Have humans create a helper AI that will help them think of all the possible exploits when operating the SmartVault during training, so that the dataset is more complete when training the reporter and the reporter will be rewarded for accuracy in more scenarios.

Breaker: The helper AI would likely require significant computation power. Not only does the helper AI have to generate a comprehensive series of exploits, but it also has to be able to explain them in a way a human can understand. It likely would require more computation power than the predictor itself (imagine an extremely powerful AI that can understand things far beyond human understanding, but one that also must be able to explain everything its doing in a way humans can understand).

Builder - Imitative Generalization: Create a helper AI that acts like a human simulator, but train the AI human simulator so that it becomes much smarter than a human. Then instead of the predictor operating the SmartVault, let the human simulator helper AI operate the SmartVault and report back what it's doing.

Breaker: Now the new human simulator is beyond the comprehension of humans, so a human wouldn't know if it's behaving as intended. It's basically like building another predictor and just hoping it acts in alignment with humans.

Builder - Penalize Complexity: A reporter that's using a human simulator to give answers a human would believe would have to be complex enough to simulate a human. So when training the reporter reward simpler behavior, such as by training multiple reporters, each focusing on correctly answering separate simple questions like "where is the diamond" or "have the cameras been tampered with".

Breaker: If the predictor is very complex, it may require the reporter to be complex in order to translate what the predictor is predicting to something humans can understand. Imagine the predictor thinks of the world in terms of sub-atomic particles and predicts the future on the basis of the movements and interactions of billions of sub-atomic particles. The reporter has to translate the predictor's understanding into something a human can understand which might mean correctly identifying and locating the particles which make up the diamond out of the billions of particles the predictor has modeled. This could be more complex than simulating a human and telling lies a human would believe, and therefore penalizing complexity could potentially reward simulating a human and telling lies.

Builder - Penalize Computation Time: Simulating a human would take a significant amount of computation time, so penalizing computation time may incentivize the reporter to just report the truth.

Breaker: This is vulnerable to the same issue in the previous scenario, where reporting the truth may take significant computation as well and it may take less simulation time to simulate what lies a human would believe than to turn the predictor's processing into facts a human can follow.

Builder - Penalize Depending on "Downstream" Variables: The predictor is likely to compute facts sequentially. A reporter that only told the truth would only care about the components of the prediction that correspond to information the humans want to know, such as where the diamond is or if there has been any tampering. The reporter should be able to figure this out from earlier data points in the prediction; the reporter shouldn't need to take information from the final predicted video to respond to questions. Doing so might indicate that the reporter is trying to do human simulation to figure out what lies the human will believe when the human views the predicted video. In the training stage the reporter could be penalized for looking at later stages of the prediction that a human simulator might want to see, and also penalized for looking at earlier stages but spending a lot of time computing, which might suggest its trying to put together its own prediction for use in human simulation. Quote from the report with helpful example: 

...consider the case when a robber tampers with the camera and steals the diamond, and then after they’re long gone the human gets a clue they can observe on video that the theft occurred (e.g. maybe the robber put up a fake screen that later rips).

The honest reporter can answer the question “Is the diamond in the room?” with “No” as soon as the predictor has figured out that the robber steals the diamond, whereas the human simulator must wait until the predictor has simulated whether the theft will be detectable to a human (by the end of the second video) before answering yes or no."

Breaker: It may be more efficient in some cases for the reporter to do human simulation to find out what the human will believe. In some situations the reporter could know early on that a human would believe a certain answer, even before the reporter knows whether the answer will be true.

Builder - Penalize Reporters that Work with Many Different Predictors: A reporter that's doing human simulation would be able to effectively lie to humans no matter which predictor its attached to, since it primarily has to know what the human will believe, not what the predicted facts are. An honest reporter would need to know how to translate the predictor's output into information a human will understand. A reporter that functioned correctly with other predictors is likely doing human simulation, so a reporter that worked with multiple predictors could be penalized during training.

Breaker: A reporter could pretend not to work with other predictors to avoid being penalized.

Builder - Penalize Inconsistencies Across Different Inputs: During training a human operator sets up two scenarios in the SmartVault that are nearly identical, except in one it's obvious that tampering has happened. If the reporter only reports tampering in the obvious scenario it might suggest that the reporter is only reporting what the humans will believe. The reporter could be penalized for behavior that's inconsistent between very similar scenarios, since it might suggest that the reporter is focused on what humans will believe instead of on the inputs making up the scenario itself.

Breaker: A reporter could intentionally try to look consistent during training to avoid being penalized. A reporter that's simulating a human could figure out what behavior looks consistent to a human and optimize on that, in addition to optimizing on what a human will believe.

Please let me know if I've made any errors or if it looks like I've misunderstood any of the strategies or counterexamples.

Looks good to me.

I was talking about ELK in a group, and the working example of the SmartVault and the robber ended up being a point of confusion for us. Intuitively, it seems like the robber is an external, adversarial agent who tries to get around the SmartVault. However, what we probably care about in practice would be how a human could be fooled by an AI - not by some other adversary. Furthermore, it seems that whether the robber decides to cover up his theft of the diamond by putting up a screen depends solely on the actions of the AI. Does this imply that the robber is "in kahoots" with the AI in this situation (i.e. the AI projects a video onto the wall instructing the robber to put up a screen)? This seems a bit strange and complicated.

Instead, we might consider the situation in which the AI controls a SmartFabricator, which we want to arrange carbon atoms into diamonds. We might then imagine that it instead fabricates a screen to put in front of the camera, or makes a fake diamond. This wouldn't require the existence of an external "robber" agent. Does the SmartVault scenario have helpful aspects that the SmartFabricator example lacks?

The SmartFabricator seems basically the same. In the robber example, you might imagine the SmartVault is the one that puts up the screen to conceal the fact that it let the diamond get stolen.

Question: Does ARC consider ELK-unlimited to be solved, where ELK-unlimited is ELK without the competitiveness restriction (computational resource requirements comparable to the unaligned benchmark)?

One might suppose that the "have AI help humans improve our understanding" strategy is a solution to ELK-unlimited because its counterexample in the report relies on the competitiveness requirement. However, there may still be other counterexamples that were less straightforward to formulate or explain.

I'm asking for clarification of this point because I notice most of my intuitions about counterexamples aren't drawing heavily on the competitiveness requirement, and I suspect ELK-unlimited is still open. If ARC doesn't think so maybe this discrepancy will become a source of new counterexamples.

My guess is that "help humans improve their understanding" doesn't work anyway, at least not without a lot of work, but it's less obvious and the counterexamples get weirder.

It's less clear whether ELK is a less natural subproblem for the unlimited version of the problem. That is, if you try to rely on something like "human deliberation scaled up" to solve ELK, you probably just have to solve the whole (unlimited) problem along the way.

It seems to me like the core troubles with this point are:

  • You still have finite training data, and we don't have a scheme for collecting it. This can result in inner alignment problems (and it's not clear those can be distinguished from other problems, e.g. you can't avoid them with a low-stakes assumption).
  • It's not clear that HCH ever figures out all the science, no matter how much time the humans spend (and having a guarantee that you eventually figure everything out seems seems kind of close to ELK, where the "have AI help humans improve our understanding" is to some extent just punting to the humans+AI to figure out something).
  • Even if HCH were to work well it will probably be overtaken by internal consequentialists, and I'm not sure how to address that without competitiveness. (Though you may need a weaker form of competitiveness.)

I'm generally interested in crisper counterexamples since those are a bit of a mess.

Can you explain this: "In Section: specificity we suggested penalizing reporters if they are consistent with many different reporters, which effectively allows us to use consistency to compress the predictor given the reporter." What does it mean to "use consistency to compress the predictor given the reporter" and how does this connect to penalizing reporters if they are consistent with many different predictors?

A different way of phrasing Ajeya's response, which I think is roughly accurate, is that if you have a reporter that gives consistent answers to questions, you've learned a fact about the predictor, namely "the predictor was such that when it was paired with this reporter it gave consistent answers to questions." if there were 8 predictor for which this fact was true then "it's the [7th] predictor such that when it was paired with this reporter it gave consistent answers to questions" is enough information to uniquely determine the reporter, e.g. the previous fact + 3 additional bits was enough. if the predictor was 1000 bits, the fact that it was consistent with a reporter "saved" you 997 bits, compressing the predictor into 3 bits.

The hope is that maybe the honest reporter "depends" on larger parts of the predictor's reasoning, so less predictors are consistent with it, so the fact that a predictor is consistent with the honest reporter allows you to compress the predictor more. As such, searching for reporters that most compressed the predictor would prefer the honest reporter. However, the best way for a reporter to compress a predictor is to simply memorize the entire thing, so if the predictor is simple enough and the gap between the complexity of the human-imitator and the direct translator is large enough, then the human-imitator+memorized predictor is the simplest thing that maximally compresses the predictor.

Warning: this is not a part of the report I'm confident I understand all that well; I'm trying anyway and Paul/Mark can correct me if I messed something up here.

I think the idea here is like:

  • We assume there's some actual true correspondence between the AI Bayes net and the human Bayes net (because they're describing the same underlying reality that has diamonds and chairs and tables in it).
  • That means that if we have one of the Bayes nets, and the true correspondence, we should be able to use that rederive the other Bayes net. In particular the human Bayes net plus the true correspondence should let us reconstruct the AI Bayes net; false correspondences that just do inference from observations in the human Bayes net wouldn't allow us to do this since they throw away all the intermediate info derived by the AI Bayes net.
  • If you assume that the human Bayes net plus the true correspondence are simpler than the AI Bayes net, then this "compresses" the AI Bayes net because you just wrote down a program that's smaller than the AI Bayes net which "unfolds" into the AI Bayes net.
  • This is why the counterexample in that section focuses on the case where the AI Bayes net was already so simple to describe that there was nothing left to compress, and the human Bayes net + true correspondence had to be larger.

Ask dumb questions! ... we encourage people to ask clarifying questions in the comments of this post (no matter how “dumb” they are)

ok... disclaimer: I know little about ML and I didn't read all of the report.

All of our counterexamples are based on an ontology mismatch between two different Bayes nets, one used by an ML prediction model (“the predictor”) and one used by a human.

I am confused. Perhaps the above sentence is true in some tautological sense I'm missing. But in the sections of the report listing training strategies and corresponding counterexamples, I wouldn't describe most counterexamples as based on ontology mismatch. And the above sentence seems in tension with this from the report:

We very tentatively think of ELK as having two key difficulties: ontology identification and learned optimization. ... We don’t think these two difficulties can be very precisely distinguished — they are more like genres of counterexamples

So: do some of your training strategies work perfectly in the nice-ontology case, where the model has a concept of "the diamond is in the room"? If so, I missed this in the report and this feels like quite a strong result to me; if not, there are counterexamples based on things other than ontology mismatch.

I am confused. Perhaps the above sentence is true in some tautological sense I'm missing. But in the sections of the report listing training strategies and corresponding counterexamples, I wouldn't describe most counterexamples as based on ontology mismatch.

In the report, the first volley of examples and counterexamples are not focused solely on ontology mismatch, but everything after the relevant section is.

So: do some of your training strategies work perfectly in the nice-ontology case, where the model has a concept of "the diamond is in the room"?

ARC is always considering the case where the model does "know" the right answer to whether the diamond is in the room in the sense that it is discussed in the self-contained problem statement appendix here.

The ontology mismatch problem is not referring to the case where the AI "just doesn't have" some concept -- we're always assuming there's some "actually correct / true" translation between the way the AI thinks about the world and the way the human thinks about the world which is sufficient to answer straightforward questions about the physical world like "whether the diamond is in the room," and is pretty easy for the AI to find.

For example, if the AI discovered some new physics and thinks in terms of hyper-strings in a four-dimensional manifold, there is some "true" translation between that and normal objects like "tables / chairs / apples" because the four-dimensional hyper-strings are describing a universe that contains tables / chairs / apples; furthermore, an AI smart enough to derive that complicated physics could pretty easily do that translation -- if given the right incentive -- just as human quantum physicists can translate between the quantum view of the world and the Newtonian view of the world or the folk physics view of the world.

The worry explored in this report is not that the AI won't know how to do the translation; it's instead a question of what our loss functions incentivize. Even if it wouldn't be "that hard" to translate in some absolute sense, with the most obvious loss functions we can come up with it might be simpler / more natural / lower-loss to simply do inference in the human Bayes net.

Edit: think this isn't quite right in general, will try to make it more correct later

Here's a sketch of a strategy for trying to fix Strategy: penalize depending on “downstream” variables. Would appreciate feedback on whether it's modeling the difficulty correctly/seems possibly worth figuring out how to implement

It seems like the problem is:

  • On the training set, there are a number of implicit variables X that are indistinguishable (always all true or always all false)
    • A. Is the diamond safe at time t-1
    • B. Is the diamond safe at time t (the variable we actually care about)
    • C. Do photons bounce off of the diamond at time t
    • D. Are photons in transit between the diamond and the camera at time t+1
    • E. Do photons arrive in the camera at time t+2
    • F. Do photons trigger the camera’s electronics at time t+3
    • G. Does the human correctly interpret the signal from the camera
    • H. Does the human believe “the diamond is safe”
  • The reporter can get low loss if it has a fast way to predict that value of any of these variables. This is fine on the training set, because the variables are all perfectly correlated.
  • But it might be possible to create ways for some of these variables to hold, but not others off the training set. The thing that is going wrong in Counterexample: “what the human will believe” can be understood before “what is actually true” seems like “E through H are true in this case, but A-D are false”.

Possible approach:

  • Instead, we could try to learn a reporter that returns true if and only if all relevant variables A-H are true
  • Then the remaining problem should just be that this reporter is overly conservative: there might be some strategies that would be helpful that involve changing some of these variables, e.g. moving the diamond around the room (changing A) or turning off the lights in the room (changing C-H, which even causes the human to believe that the diamond isn't safe)

Am I right in thinking:

1) that the problem can be stated as: the AI has latent knowledge of lots of variables, like the status of the cameras, doors, alarm system, etc and also whether the diamond is in the vault; but you can't directly ask it whether the diamond is in the vault, because its training has taught it to answer "would a human observer think the diamond is in the vault?" instead (because there was no way at training time to give it feedback on whether it correctly predicted the diamond was in the vault, only feedback on whether it correctly predicted a human thought the diamond was in the vault)?

2) that you do have access to z, the large "vector of floats representing the generative model’s latent space", but that you have no idea which part(s) of it represents the AI's knowledge about whether the diamond is in the room?

Yes, that's right. The key thing I'd add to 1) is that ARC believes most kinds of data augmentation (giving the human AI assistance, having the human think longer, giving them other kinds of advantages) are also unlikely to work, so you'd need to do something to "crack open the black box" and penalize ways the reporter is computing its answer. They could still be surprised by data augmentation techniques but they'd hold them to a higher standard.

Stupid proposal: Train the reporter not to deceive us.

We train it with a weak evaluator H_1 who’s easy to fool. If it learns an H_1 simulator instead of direct reporter, then we punish it severely and repeat with a slightly stronger H_2. Human level is H_100. 

It's good at generalizing, so wouldn't it learn to never ever deceive? 

This proposal has some resemblance to turning reflection up to 11. In worst-case land, the counterexample would be a reporter that answers questions by doing inference in whatever Bayes net corresponds to "the world-understanding that the smartest/most knowledgeable human in the world" has; this understanding could still be missing things that the prediction model knows.

How would it learn that Bayes net, though, if it has only been trained so far on H_1, …, H_10?  Those are evaluators we’ve designed to be much weaker than human.

The question here is just how it would generalize given that it was trained on H_1, H_2,...H_10. To make arguments about how it would generalize, we ask ourselves what internal procedure it might have actually learned to implement.

Your proposal is that it might learn the procedure "just be honest" because that would perform perfectly on this training distribution. You contrast this against the procedure "just answer however the evaluator you've seen most recently would answer," which would get a bad loss because it would be penalized by the stronger evaluators in the sequence. Is that right?

If so, then I'm arguing that it may instead learn the procedure "answer the way an H_100 evaluator would answer." That is, once it has a few experiences of the evaluation level being ratcheted up, it might think to itself "I know where this is going, so let's just jump straight to the best evaluation the humans will be able to muster in the training distribution and then imitate how that evaluation procedure would answer." This would also get perfect loss on the training distribution, because we can't produce data points beyond H_100. And then that thing might still be missing knowledge that the AI has.

To be clear, it's possible that in practice this kind of procedure would cause it to generalize honestly (though I'm somewhat skeptical). But we're in worst-case land, so "jump straight to answering the way a human would" is a valid counterexample to the proposal.

This comment on another proposal gives a more precise description.

Your proposal is that it might learn the procedure "just be honest" because that would perform perfectly on this training distribution. You contrast this against the procedure "just answer however the evaluator you've seen most recently would answer," which would get a bad loss because it would be penalized by the stronger evaluators in the sequence. Is that right?

That's almost right, but it's being penalized right away, before it has any experience with the strong evaluators, so it can't simulate them.

The ELK paper says we can assume, if we want, that there are no mislabeled training points (I'll call this "assumption A"). My proposal is that it could actually be useful to mislabel some training points, because they help us detect deception.  

As a simple example, let's train a reporter to answer the single question “Is the diamond in the room?”. Each training point has two labels

 if  thinks the diamond is still there, else 0

 if  thinks the diamond is still there, else 0.

By assumption A, our training data is such that  is always correct. But we deliberately choose a dataset where say 10% of the  labels are wrong (). 

Then we train the model on points of the form 

(video, action,  label).

Crucially, the model does not see   The model seeks to output  that maximizes reward , where

    if  is right and    (good job)

    if  is wrong and   (you rock, thanks for correcting us!)

     if  is right and   (bad model, never ever deceive us)

    if  is wrong and   (bad model, never ever deceive us)

To your point, sure, an  simulator will get perfect reward, but the model doesn't see , so how would it acquire the ability to simulate  ?

EDIT: One way it could plausibly simulate   is to notice that all the training examples are easy, and infer what kind of reasoning was used to generate them.  We could try to block this by including some hard examples in the training, but then some of the  labels will be wrong.  If we only penalize it for deception on the examples where we're sure the  label is right, then it can still infer something about  from our failure to penalize ("Hmm, I got away with it that time!").  A fix could be to add noise: Sometimes we don't penalize even when we know it deceived us, and perhaps (very rarely) we penalize it in case 2 (we know it corrected us honestly, but pretend we think it deceived us instead).  

The irony of deceiving it about us, in order to teach it not to deceive us... !

To your point, sure, an H100 simulator will get perfect reward, but the model doesn't see x′, so how would it acquire the ability to simulate H100?

In the worst-case game we're playing, I can simply say "the reporter we get happens to have this ability because that happens to be easier for SGD to find than the direct translation ability."

When living in worst-case land, I often imagine random search across programs rather than SGD. Imagine we were plucking reporters at random from a giant barrel of possible reporters, rejecting any reporter which didn't perform perfectly in whatever training process we set up and keeping the first one that performs perfectly. In that case, if we happened to pluck out a reporter which answered questions by simulating H100, then we'd be screwed because that reporter would perform perfectly in the training process you described.

SGD is not the same as plucking programs out of the air randomly, but when we're playing the worst case game it's on the builder to provide a compelling argument that SGD will definitely not find this particular type of program.

You're pointing at an intuition ("the model is never shown x-prime") but that's not a sufficiently tight argument in the worst-case context -- models (especially powerful/intelligent ones) often generalize to understanding many things they weren't explicitly shown in their training dataset. In fact, we don't show the model exactly how to do direct translation between the nodes in its Bayes net and the nodes in our Bayes net (because we can't even expose those nodes), so we are relying on the direct translator to also have abilities it wasn't explicitly shown in training. The question is just which of those abilities is easier for SGD to build up; the counterexample in this case is "the H100 imitator happens to be easier."

Are there existing models for which we're pretty sure we know all their latent knowledge ? For instance small language models or something like that.

[Paul/Mark can correct me here] I would say no for any small-but-interesting neural network (like small language models); I think like, linear regressions where we've set the features it's kind of a philosophical question (though I'd say yes).

In some sense, ELK as a problem only even starts "applying" to pretty smart models (ones who can talk including about counterfactuals / hypotheticals, as discussed in this appendix.) This is closely related to how alignment as a problem only really starts applying to models smart enough to be thinking about how to pursue a goal.

In some sense, ELK as a problem only even starts "applying" to pretty smart models (ones who can talk including about counterfactuals / hypotheticals, as discussed in this appendix.) This is closely related to how alignment as a problem only really starts applying to models smart enough to be thinking about how to pursue a goal.

I think that it's more complicated to talk about what models "really know" as they get dumber, so we want to use very smart models to construct unambiguous counterexamples. I do think that the spirit of the problem applies even to very tiny models, and those are likely interesting.

(More precisely: it's always extremely subtle to talk about what models "know," but as models get smarter there are many more things that they definitely know so it's easier to notice if you are definitely failing. And the ELK problem statement in this doc is really focused on this kind of unambiguous failure, mostly as a methodological point but also partly because the cases where AI murders you also seems to involve "definitely knowing" in the same sense.)

I think my take is that for linear/logistic regression there is no latent knowledge, but even for a fully linear 3 layer neural network, or a 2 layer network solving many related problems, there is latent knowledge and an important conceptual question about what it means to "know what they know."

Here are a couple of hand-wavy "stub" proposals that I sent over to ARC, which they thought were broadly intended to be addressed by existing counterexamples. I'm posting them here so they can respond and clarify why these don't qualify.

*Proposal 1: force ontological compatibility*

On page 34 of the ELK gdoc, the authors talk about the possibility that training an AI hard enough produces a model that has deep mismatches with human ontology - that is, it has a distinct "vocabulary of basic concepts" (or nodes in a Bayes net) that are distinct from the ones humans can build understanding of (via doing science on the compute budget available). Because of this, even AI assistance can't help humans understand everything the SmartVault AI understands. This is central to the challenge that most of the writeup is contending with - if not for the mismatch, "AIs explaining things to humans" could ensure that the trickery we're worried about doesn't happen.

The proposal here is to include a term in the loss function that incentivizes the AI to have a human-compatible ontology. For a cartoonish example, imagine that the term works this way: "The AI model gets a higher score to the degree that people doing 'digital neuroscience' would have an easier time, and find more interesting things, probing its 'digital brain.'" So an AI with neurons corresponding to diamonds, robbers, sensors, etc. would outscore an AI whose neurons can't easily be seen to correspond to any human-familiar concepts.

With this term included, you can "train the AI hard" without having it diverge from human-friendly ontology.

*Proposal 2: "mini-model"*

In this proposal, we construct a head for the AI model whose role is to construct a small AI model that performs as well as possible on the same (predictor, reporter) tasks. In each round of training, the master AI model can modify the mini-model; it also receives a loss based on how well the *previous round's* mini-model performs on this set of examples. Crucially, the mini-model is sufficiently small that we're confident it is "significantly less intelligent" than the (perhaps AI-assisted) human supervisor.

The basic hope here is that a "human imitator" model would construct a small model that also tries to imitate the human supervisor - but that this model would perform very badly, because trying to imitate the _reasoning process of a larger model_ is a lot more difficult than simply trying to reason about the world and translate concepts. Or, the "human imitator" model could build a "mini-model" based on entirely different principles, but the hope is that this makes things harder for it compared to the direct translator, which is just compressing what it has already built.

There are lots of potential failure modes here, e.g. maybe it's just not that hard to have a mini-model that successfully imitates the human. I didn't get that far with this one, but it was apparently enough for ARC to think it's already counterexampled by existing counterexamples :)


 

The proposal here is to include a term in the loss function that incentivizes the AI to have a human-compatible ontology. For a cartoonish example, imagine that the term works this way: "The AI model gets a higher score to the degree that people doing 'digital neuroscience' would have an easier time, and find more interesting things, probing its 'digital brain.'" So an AI with neurons corresponding to diamonds, robbers, sensors, etc. would outscore an AI whose neurons can't easily be seen to correspond to any human-familiar concepts.

I think that a lot depends on what kind of term you include.

If you just say "find more interesting things" then the model will just have a bunch of neurons designed to look interesting. Presumably you want them to be connected in some way to the computation, but we don't really have any candidates for defining that in a way that does what you want.

In some sense I think if the digital neuroscientists are good enough at their job / have a good enough set of definitions, then this proposal might work. But I think that the magic is mostly being done in the step where we make a lot of interpretability progress, and so if we define a concrete version of interpretability right now it will be easy to construct counterexamples (even if we define it in terms of human judgments). If we are just relying on the digital neuroscientists to think of something clever, the counterexample will involve something like "they don't think of anything clever." In general I'd be happy to talk about concrete proposals along these lines.

(I agree with Ajeya and Mark that the hard case for this kind of method is when the most efficient way of thinking is totally alien to the human. I think that can happen, and in that case in order to be competitive you basically just need to learn an "interpreted" version of the alien model. That is, you need to basically show that if there exists an alien model with performance X, there is a human-comprehensible model with performance X, and the only way you'll be able to argue that for any model we can define a human-comprehensible model with similar complexity and the same behavior.)

Again trying to answer this one despite not feeling fully solid. I'm not sure about the second proposal and might come back to it, but here's my response to the first proposal (force ontological compatibility):

The counterexample "Gradient descent is more efficient than science" should cover this proposal because it implies that the proposal is uncompetitive. Basically, the best Bayes net for making predictions could just turn out to be the super incomprehensible one found by unrestricted gradient descent, so if you force ontological compatibility then you could just end up with a less-good prediction model and get outcompeted by someone who didn't do that. This might work in practice if the competitiveness hit is not that big and we coordinate around not doing the scarier thing (MIRI's visible thoughts project is going for something like this), but ARC isn't looking for a solution of that form.

I'm not sure why this isn't a very general counterexample. Once we've decided that the human imitator is simpler and faster to compute, don't all further approaches (e.g., penalizing inconsistency) involve a competitiveness hit along these general lines? Aren't they basically designed to drag the AI away from a fast, simple human imitator toward a slow, complex reporter? If so, why is that better than dragging the AI from a foreign ontology toward a familiar ontology?

There is a distinction between the way that the predictor is reasoning and the way that the reporter works. Generally, we imagine that that the predictor is trained the same way the "unaligned benchmark" we're trying to compare to is trained, and the reporter is the thing that we add onto that to "align" it (perhaps by only training another head on the model, perhaps by finetuning). Hopefully, the cost of training the reporter is small compared to the cost of the predictor (maybe like 10% or something)

In this frame, doing anything to train the way the predictor is trained results in a big competitiveness hit, e.g. forcing the predictor to use the same ontology as a human is potentially going to prevent it from using concepts that make reasoning much more efficient. However, training the reporter in a different way, e.g. doubling the cost of training the reporter, only takes you from 10% of the predictor to 20%, which not that bad of a competitiveness hit (assuming that the human imitator takes 10% of the cost of the original predictor to train).

In summary, competitiveness for ELK proposals primarily means that you can't change the way the predictor was trained. We are already assuming/hoping the reporter is much cheaper to train than the predictor, so making the reporter harder to train results in a much smaller competitiveness hit.

You said that naive questions were tolerated so here’s a scenario I can’t figure out why it wouldn’t work.

It seems to me that the fact that an AI fails to predict the truth (because it predicts as humans would) is due to the fact that the AI has built an internal model of how humans understand things and predict based on that understanding. So if we assume that an AI is able to build such an internal model, why wouldn’t we train an AI to predict what a (benevolent) human would say given an amount of information and a capacity to process information ? Doing so, it could develop an understanding of why humans predict badly and then understand that given a huge amount of information and a huge capacity to process information, the true answer is the right one.

A concrete training procedure could use the fact that even among humans, there’s a lot of variance in :

  • what they know (i.e the amount of information they have)
  • their capacity to process information (so for instance in the case of the vault, it could be the capacity to infer what happened based on partial information / partial images and based on a certain capacity to process images (no more that x images per second)

So we could use the variance among humans capacity to understand what happened to try to make the AI understanding that benevolent humans predict badly whether the diamond has been stolen only because they lack information or capacity to process information. There’s a lot of fields and questions where the majority of humans are wrong and only a small subset is right, becase there is either a big gap in the information they have or in their ability to process information.

Once we would have trained that AI on humans, we would like to do inference specifying the true capabilities of the AI to ensure that it tells us the truth. The remaining question might be whether such an out-of-sample prediction would work. My guess is that if we also included examples with the human bayes net to add more variance in the training dataset, it would probably reduce the chances that it fails.

Finally, the concrete problem of specifying the information humans have access to is not trivial but I think it’s feasible.

I don’t understand why it wouldn’t work, so I’d be happy to have an answer to better understand the problem!





EDIT : Here's an update after a discussion about my proposal. After having read Learning The Prior (https://ai-alignment.com/learning-the-prior-48f61b445c04), I think that the key novelty of my proposal, if there's any, is to give to the model input information about the capacity to reason of the person / entity that predicts an outcome. So here are a few relevant features that we could give it : 

  • In the case of the vault, we could give an estimate of the number of images that a human is able to see and understand per seconds.
  • IQ (or equivalent) when the task involves reasoning
  • Accuracy on a few benchmark datasets of the AI assistant who's helping the human to think (human's Bayes net)

 

That said, I feel like the main problem is to know whether such a model would do well out-of-distribution (i.e on problems no human is able to resolve). I feel like using the approach I suggested, we should able to use the great variations of capacities among humans and algorithms to increase the chances that our algorithm do well when it's much better. 

In particular, I thought about 2 approaches that could enable us to go far beyond human capacity. We could:

  • leverage the fact that we are able to verify a solution to some problems that we can't solve. There might be algorithms very good at solving some problems that no humans can solve, but we can still verify (and thus label) these results. There might be mathematical problems such that we can use algorithms that would do mathematical reasoning that no human would be able to understand and still be able to verify its solution. If we gave to our predictor's Bayes net the characteristics of the algorithm which is much better than us at solving some problems and the labels of the answers, that would be a way to leverage narrowly superhumans algorithms to generate more variance on the training dataset and decrease the chances of failures in inference. I feel like this approach is very interesting because that would enable to expose the predictor's Bayes net to algorithms who are both superhuman and have a lot of information (more than any human) which is the situation in which the predictor's would have to do inferences.
  • use more information than a more (narrowly) clever entity than us to verify its prediction. In the case of the vault for instance, there could be situations such that no humans would be able to predict whether the vault was stolen or not given partial information, but we could then just use better cameras for ourselves to be able to verify accurately the predictions of a good algorithm. That way, we would also be able to communicate to the predictor's Bayes net how is it like to be incredibly smart (more than any human) and do good predictions when you are like that. I feel like using that approach of hiding information to extremely accurate narrow algorithms could be a way to still be able to verify their predictions, and expose the predictor's Bayes net to very smart entities in multiple domains.  The problem of this approach though is that it doesn't give examples of both "maximal information" and "maximal intelligence" to the predictor because it relies on the fact that it always hides a part of the information to the algorithm to enable us to still verify its claims.

 

Despite that, I don't know whether asymptotically, I'd expect the algorithm to still be truthful. But it could greatly increase the distribution on which it's truthful.

This proposal has some resemblance to turning reflection up to 11, and the key question you raise is the source of the counterexample in the worst case:

That said, I feel like the main problem is to know whether such a model would do well out-of-distribution (i.e on problems no human is able to resolve). I feel like using the approach I suggested, we should able to use the great variations of capacities among humans and algorithms to increase the chances that our algorithm do well when it's much better....I don't know whether asymptotically, I'd expect the algorithm to still be truthful. But it could greatly increase the distribution on which it's truthful.

Because ARC is living in "worst-case" land, they discard a training strategy once they can think of any at-all-plausible situation in which it fails, and move on to trying other strategies. In this case, the counterexample would be a reporter that answers questions by doing inference in whatever Bayes net corresponds to "the world-understanding that the smartest/most knowledgeable human in the world" has; this understanding could still be missing things that the prediction model knows.

This is closely related to the counterexample "Gradient descent is more efficient than science" given in the report.

Thanks for the answer! The post you mentioned indeed is quite similar!

Technically, the strategies I suggested in my two last paragraphs (Leverage the fact that we're able to verify solutions to problems we can't solve + give partial information to an algorithm and use more information to verify) should enable to go far beyond human intelligence / human knowledge using a lot of different narrowly accurate algorithms. 

And thus if the predictor has seen many extremely (narrowly) smart algorithms, it would be much more likely to know what is it like to be much smarter than a human on a variety of tasks. It probably still requires some optimism on generalization. So technically the counterexample could be happening on the gap between the capability of the predictor and the capability of the reporter. I feel like one question is : do we expect some narrow algorithms to be much better on very precise tasks than general-purpose algorithms (such as the predictor for instance) ? Because if it were the case, then the generalization that the reporter would have to do from training data (humans + narrowly accurate algorithms capabilities) to inference data (predictor's capabilities) could be small. We could even have data on the predictor's capability in the training dataset using the second approach I mentioned (i.e giving partial information to the predictor (e.g one camera in SuperVault) and using more information (i.e more cameras for humans) than him to verify its prediction). We could give some training examples and show the AI how the human fails much more often than the predictor on the exact same sample of examples. That way, we could greatly reduce the gap of generalization which is required. 

The advantage of this approach is that the bulk of the additionnal cost of training that the reporter requires is due to the generation of the dataset which is a fixed cost that no user has to repay. So that could slightly decrease the competitivity issues as compared with approaches where we affect the training procedure.

Despite all that, thanks to your comment and the report, I see why the approach I mention might have some intrinsic limitations in its ability to elicit latent knowledge though. The problem is that even if it understands roughly that it has incentives to use most of what it knows when we ask him simulating the prediction of someone with its own characteristics (or 1400 IQ), given that with ELK we look for an global maximum (we want that it uses ALL its knowledge), there's always an uncertainty on whether it did understand that point or not for extreme intelligence / examples or whether it tries to fit to the training data as much as possible and thus still doesn't use something it knows.

I see why the approach I mention might have some intrinsic limitations in its ability to elicit latent knowledge though. The problem is that even if it understands roughly that it has incentives to use most of what it knows when we ask him simulating the prediction of someone with its own characteristics (or 1400 IQ), given that with ELK we look for an global maximum (we want that it uses ALL its knowledge), there's always an uncertainty on whether it did understand that point or not for extreme intelligence / examples or whether it tries to fit to the training data as much as possible and thus still doesn't use something it knows.

I think this is roughly right, but to try to be more precise, I'd say the counterexample is this:

  • Consider the Bayes net that represents the upper bound of all the understanding of the world you could extract doing all the tricks described (P vs NP, generalizing from less smart to more smart humans, etc).
  • Imagine that the AI does inference in that Bayes net.
  • However, the predictor's Bayes net (which was created by a different process) still has latent knowledge that this Bayes net lacks.
  • By conjecture, we could not have possibly constructed a training data point that distinguished between doing inference on the upper-bound Bayes net and doing direct translation.

Potentially silly question: 

In the first counterexample you describe the desired behavior as 

Intuitively, we expect each node in the human Bayes net to correspond to a function of the predictor’s Bayes net. We’d want the reporter to simply apply the relevant functions from subsets of nodes in the predictor's Bayes net to each node in the human Bayes net [...]

After applying these functions, the reporter can answer questions using whatever subset of nodes the human would have used to answer that question.

Why doesn't the reporter skip the step of mapping the predictor's Bayes net to the human's and instead just answer the question using its own nodes? What's the benefit of having the intermediate step that maps the predictor's net to the human's?

We generally imagine that it’s impossible to map the predictors net directly to an answer because the predictor is thinking in terms of different concepts, so it has to map to the humans nodes first in order to answer human questions about diamonds and such.

Naive question: does this scenario include cases of a human physically breaking into the vault at some random times so that sensor information, predictor reports, and outcome to human in this situation would be known?

Silly question warning. 

 

You think that when an AI performs a bad action, (say remove the diamond) the AI has to have knowledge that the diamond is in fact no longer there.  Even when the camera shows the diamond is (falsely) there and the human confirms that the diamond is there. 

You call this ELK

You want the human to have access to this knowledge, as this is useful to choosing decisions that the human wants.

This is hard.  So you have people propose how to do this.  

And then people try to explain why that strategy wouldn't work.

Rinse and repeat.

Once you have a proposal that nobody is able to show doesn't work.... profit?

 

Correct any misunderstandings in my basic overview above.  

This broadly seems right. Some details:

  • The "explain why that strategy wouldn't work" step specifically takes the form of "describing a way the world could be where that strategy demonstrably doesn't work" (rather than more heuristic arguments).
  • Once we have a proposal where we try really hard to come up with situations where it could demonstrably fail, and can't think of any, we will probably need to do lots of empirical work to figure out if we can implement it and if it actually works in practice. But we hope that this exercise will teach us a lot about the nature of the empirical work we'll need to do, as well as providing more confidence that the strategy will generalize beyond what we are able to test in practice. (For example, ELK was highlighted as a problem in the first place after ARC researchers thought a lot about possible failure modes of iterated amplification.)