Yep, I'd say I intuitively agree with all of that, though I'd add that if you want to specify the set of "outcomes" differently from the set of "goals", then that must mean you're implicitly defining a mapping from outcomes to goals. One analogy could be that an outcome is like a thermodynamic microstate (in the sense that it's a complete description of all the features of the universe) while a goal is like a thermodynamic macrostate (in the sense that it's a complete description of the features of the universe that the system can perceive).
This mapping from outcomes to goals won't be injective for any real embedded system. But in the unrealistic limit where your system is so capable that it has a "perfect ontology" — i.e., its perception apparatus can resolve every outcome / microstate from any other — then this mapping converges to the identity function, and the system's set of possible goals converges to its set of possible outcomes. (This is the dualistic case, e.g., AIXI and such. But plausibly, we also should expect a self-improving systems to improve its own perception apparatus such that its effective goal-set becomes finer and finer with each improvement cycle. So even this partition over goals can't be treated as constant in the general case.)
Gotcha. I definitely agree with what you're saying about the effectiveness of incentive structures. And to be clear, I also agree that some of the affordances in the quote reasonably fall under "alignment": e.g., if you explicitly set a specific mission statement, that's a good tactic for aligning your organization around that specific mission statement.
But some of the other affordances aren't as clearly goal-dependent. For example, iterating quickly is an instrumentally effective strategy across a pretty broad set of goals a company might have. That (in my view) makes it closer to a capability technique than to an alignment technique. i.e., you could imagine a scenario where I succeeded in building a company that iterated quickly, but I failed to also align it around the mission statement I wanted it to have. In this scenario, my company was capable, but it wasn't aligned with the goal I wanted.
Of course, this is a spectrum. Even setting a specific mission statement is an instrumentally effective strategy across all the goals that are plausible interpretations of that mission statement. And most real mission statements don't admit a unique interpretation. So you could also argue that setting a mission statement increases the company's capability to accomplish goals that are consistent with any interpretation of it. But as a heuristic, I tend to think of a capability as something that lowers the cost to the system of accomplishing any goal (averaged across the system's goal-space with a reasonable prior). Whereas I tend to think of alignment as something that increases the relative cost to the system of accomplishing classes of goals that the operator doesn't want.
I'd be interested to hear whether you have a different mental model of the difference, and if so, what it is. It's definitely possible I've missed something here, since I'm really just describing an intuition.
Thanks, great post.
These include formulating and repeating a clear mission statement, setting up a system for promotions that rewards well-calibrated risk taking, and iterating quickly at the beginning of the company in order to habituate a rhythm of quick iteration cycles.
I may be misunderstanding, but wouldn't these techniques fall more under the heading of capabilities rather than under alignment? These are tactics that should increase a company's effectiveness in general, for most reasonable mission statements or products the company could have.
This is fantastic. Really appreciate both the detailed deep-dive in the document, and the summary here. This is also timely, given that teams working on superscale models with concerning capabilities haven't generally been too forthcoming with compute estimates. (There are exceptions.)
As you and Alex point out in the sibling thread, the biggest remaining fudge factors seem to be:
Nonetheless, my flying guess would be that your method is pretty much guaranteed to be right within an OOM, and probably within a factor of 2 or less. That seems pretty good! It's certainly an improvement over anything I've seen previously along these lines. Congrats!
It's simply because we each (myself more than her) have an inclination to apply a fair amount of adjustment in a conservative direction, for generic "burden of proof" reasons, rather than go with the timelines that seem most reasonable based on the report in a vacuum.
While one can sympathize with the view that the burden of proof ought to lie with advocates of shorter timelines when it comes to the pure inference problem ("When will AGI occur?"), it's worth observing that in the decision problem ("What should we do about it?") this situation is reversed. The burden of proof in the decision problem probably ought instead to lie with advocates of non-action: when one's timelines are >1 generation, it is a bit too easy to kick the can down the road in various ways — leaving one unprepared if the future turns out to move faster than we expected. Conversely someone whose timelines are relatively short may take actions today that will leave us in a better position in the future, even if that future arrives more slowly than they believed originally.
(I don't think OpenPhil is confusing these two, just that in a conversation like this it is particularly worth emphasizing the difference.)
This is an excellent point and it's indeed one of the fundamental limitations of a public tracking approach. Extrapolating trends in an information environment like this can quickly degenerate into pure fantasy. All one can really be sure of is that the public numbers are merely lower bounds — and plausibly, very weak ones.
Yeah, great point about Gopher, we noticed the same thing and included a note to that effect in Gopher's entry in the tracker.
I agree there's reason to believe this sort of delay could become a bigger factor in the future, and may already be a factor now. If we see this pattern develop further (and if folks start publishing "model cards" more consistently like DM did, which gave us the date of Gopher's training) we probably will begin to include training date as separate from publication date. But for now, it's a possible trend to keep an eye on.
A more typical example: I can look at a chain of options on a stock, and use the prices of those options to back out market-implied probabilities for each possible stock price at expiry.
Gotcha, this is a great example. And the fundamental reasons why this works are 1) the immediate incentive that you can earn higher returns by pricing the option more correctly; combined with 2) the fact that the agents who are assigning these prices have (on a dollar-weighted-average basis) gone through multiple rounds of selection for higher returns.
(I wonder to what extent any selection mechanism ultimately yields agents with general reasoning capabilities, given tight enough competition between individuals in the selected population? Even if the environment doesn't start out especially complicated, if the individuals are embedded in it and are interacting with one another, after a few rounds of selection most of the complexity an individual perceives is going to be due to its competitors. Not everything is like this — e.g., training a neural net is a form of selection without competition — but it certainly seems to describe many of the more interesting bits of the world.)
Thanks for the clarifications here btw — this has really piqued my interest in selection theorems as a research angle.
Okay, then to make sure I've understood correctly: what you were saying in the quoted text is that you'll often see an economist, etc., use coherence theorems informally to justify a particular utility maximization model for some system, with particular priors and conditionals. (As opposed to using coherence theorems to justify the idea of EU models generally, which is what I'd thought you meant.) And this is a problem because the particular priors and conditionals they pick can't be justified solely by the coherence theorem(s) they cite.
The problem with VNM-style lotteries is that the probabilities involved have to come from somewhere besides the coherence theorems themselves. We need to have some other, external reason to think it's useful to model the environment using these probabilities.
To try to give an example of this: suppose I wanted to use coherence / consistency conditions alone to assign priors over the outcomes of a VNM lottery. Maybe the closest I could come to doing this would be to use maxent + transformation groups to assign an ignorance prior over those outcomes; and to do that, I'd need to additionally know the symmetries that are implied by my ignorance of those outcomes. But those symmetries are specific to the structure of my problem and are not contained in the coherence theorems themselves. So this information about symmetries would be what you would refer to as an "external reason to think it's useful to model the environment using these probabilities".
Is this a correct interpretation?
Thanks so much for the feedback!
The ability to sort by model size etc would be nice. Currently sorting is alphabetical.
Right now the default sort is actually chronological by publication date. I just added the ability to sort by model size and compute budget at your suggestion. You can use the "⇅ Sort" button in the Models tab to try it out; the rows should now sort correctly.
Also the rows with long textual information should be more to the right and the more informative/tighter/numerical columns more to the left (like "deep learning" in almost all rows, not very informative). Ideally the most relevant information would be on the initial page without scrolling.
You are absolutely right! I've just taken a shot at rearranging the columns to surface the most relevant parts up front and played around a bit with the sizing. Let me know what you think.
"Date published" and "date trained" can be quite different. Maybe worth including the latter?
That's true, though I've found the date at which a model was trained usually isn't disclosed as part of a publication (unlike parameter count and, to a lesser extent, compute cost). There is also generally an incentive to publish fairly soon after the model's been trained and characterized, so you can often rely on the model not being that stale, though that isn't universal.
Is there a particular reason you'd be interested in seeing training dates as opposed to (or in addition to) publication dates?