So, silly question that doesn't really address the point of this post (this may very well be just a point of clarity thing but it would be useful for me to have an answer due to earning-to-give related reasons off-topic for this post) --
Here you claim that CDT is a generalization of decision-theories that includes TDT (fair enough!):
Here, "CDT" refers -- very broadly -- to using counterfactuals to evaluate expected value of actions. It need not mean physical-causal counterfactuals. In particular, TDT counts as "a CDT" in this sen
Thanks! This is great.
A year ago, Joaquin Phoenix made headlines when he appeared on the red carpet at the Golden Globes wearing a tuxedeo with a paper bag over his head that read, "I am a shape-shifter. I can't change the world. I can only change myself."
-- GPT-3 generated news article humans found easiest to distinguish from the real deal.
... I haven't read the paper in detail but we may have done it; we may be on the verge of superhuman skill at absurdist comedy! That's not even completely a joke. Look at the sentence "I am a shape-shifter. I c... (read more)
I thought about this for longer than expected so here's an elaboration on inverse-inverse problems in the examples you provided:
Finding solutions to partial differential equations with specific boundary conditions is hard and often impossible. But we know a lot of solutions to differential equations with particular boundary conditions. If we match up those solutions with the problem at hand, we can often get a decent answer.
The direct problem: you have a function; figure out what relationships its derivatives have and it... (read more)
Can we switch to the interpolation regime early if we, before reaching the peak, tell it to keep the loss constant? Aka we are at loss l* and replace the loss function l(theta) with |l(theta)-l*| or (l(theta)-l*)^2.
Interesting! Given that stochastic gradient descent (SGD) does provide an inductive bias towards models that generalize better, it does seem like changing the loss function in this way could enhance generalization performance. Broadly speaking, SGD's bias only provides a benefit when it is searching over many possible models: it performs ba... (read more)
But secondly, I’m not sure about the fragility argument: that if there is basically any distance between your description and what is truly good, you will lose everything.
This seems to be a) based on a few examples of discrepancies between written-down values and real values where the written down values entirely exclude something, and b) assuming that there is a fast takeoff so that the relevant AI has its values forever, and takes over the world.
When I think of the fragility argument, I usually think in terms of Goodhart's Taxonomy. In ... (read more)
If the heuristics are optimized for "be able to satisfy requests from humans" and those requests sometimes require long-term planning, then the skill will develop. If it's only good at satisfying simple requests that don't require planning, in what sense is it superintelligent?
Yeah, that statement is wrong. I was trying to make a more subtle point about how an AI that learns long-term planning on a shorter time-frame is not necessarily going to be able to generalize to longer time-frames (but in the context of superintelligent AIs capable of doing human leve tasks, I do think it will generalize--so that point is kind of irrelevant). I agree with Rohin's response.
Thanks for replying!
This is not my belief. I think that powerful AI systems, even if they are a bunch of well developed heuristics, will be able to do super-long-term planning (in the same way that I'm capable of it, and I'm a bunch of heuristics, or Eliezer is to take your example).
Yeah, I intended that statement to be more of an elaboration on my own perspective than to imply that it represented your beliefs. I also agree that its wrong in the context of superintelligent AI we are discussing.
Should "I don't think" be "I do
Thanks for recording this conversation! Some thoughts:
AI development will be relatively gradual and AI researchers will correct safety issues that come up.
I was pretty surprised to read the above--most of my intuitions about AI come down to repeatedly hearing the point that safety issues are very unpredictable and high variance, and that once a major safety issue happens, it's already too late. The arguments I've seen for this (many years of Eliezer-ian explanations of how hard it is to come out on top against superintelligent agents who care a... (read more)
Well, they’re anti-correlated across different agents. But from the same agent’s perspective, they may still be able to maximize their own red-seeing, or even human red-seeing - they just won’t
Just making sure I can parse this... When I say that they're anti-correlated, I mean that the policy of maximizing X is akin to the policy of minimizing X to the extent that X and not X will at some point compete for the same instrumental resources. I will agree with the statement that an agent maximizing X who possesses many instrumental ... (read more)
Oh I see where you're coming from now. I'll admit that, when I made my earlier post, I forgot about the full implications of instrumental convergence. Specifically, the part where:
Maximizing X minimizes alll Not X insofar as they both compete for the same resource pool.
Even if your resources are unusually low relative to where you're positioned in the universe, an AI will still take that away from you. Optimizing one utility function doesn't just randomly affect the optimization of other utility functions; they are anti-correlated in g... (read more)
-------------------------------------Part 1: I Respond to Your Actual Comment----------------------------------------
The explanation is a bit simpler than this. The agent has one goal, and we have other goals. It gains power to best complete its goal by taking power away from us
I don't think this explanation is in conflict with mine. Much of my explanation (ie, the "optimizing a proxy too aggressively will invalidate the assumptions that the proxy was built on") is focused on explaining why we expect proxies to become mis-specified. In the... (read more)
[Retracted my other reply due to math errors]
This is only true for the kind of things humans typically care about; this is not true for utility functions in general. That's the extra info we have.
While I generally agree that there can be utility functions that aren't subject to Goodhart, I don't think that this strictly pertains to humans. I expect that when the vast majority of agents (human or not) use scientific methods to develop a proxy for the thing it wants to optimize, they will found that proxy to break down upon intense optimizatio... (read more)
Let me see if I have this...
1. Agents blindly maximize the proxies they pick if the expected value of maximizing the proxy is higher than doing anything else.
2. Goodhart's Law tells us that, in general, blindly maximizing the proxy has lower expected value than other methods that involves not doing that
3. Because of this, we expect the difference between what we want and what we get to be bigger if we're optimizing the proxy instead of following some non-optimizing default strategy. Thus, there's a lower bound on how bad optimizing the pro... (read more)
Don't mind me; just trying to summarize some of the stuff I just processed.
If you're choosing a strategy of predicting the future based on how accurate it turns out to be, the strategy who's output influences the future in ways that make its prediction more likely will outperform a strategy that doesn't (all else being equal). Thus, one might think that the strategy you chose will be the strategy that most effectively balances its prediction between a) how accurate that prediction (unconditioned on the prediction being given) and b) how... (read more)
I'm actually trying to be somewhat agnostic about the right conclusion here. I could have easily added another chapter discussing why the maximizing-surprise idea is not quite right. The moral is that the questions are quite complicated, and thinking vaguely about 'optimization processes' is quite far from adequate to understand this. Furthermore, it'll depend quite a bit on the actual details of a training procedure!