Chris_Leong

Wiki Contributions

Load More

Comments

Sounds really cool. Would be useful to have some idea of the kind of time you're planning to pick so that people in other timezones can make a call about whether or not to apply.

I see some value in the framing of "general intelligence" as a binary property, but it also doesn't quite feel as though it fully captures the phenomenon. Like, it would seem rather strange to describe GPT4 as being a 0 on the general intelligence scale.

I think maybe a better analogy would be to consider the sum of a geometric sequence.

Consider the sum for a few values of r as it increases at a steady rate.

0.5 - 2a
0.6 - 2.5a
0.7 - 3.3a
0.8 - 5a
0.9 - 10a
1 - Diverges to infinity

What we see then is quite significant returns to increases in r and then a sudden divergence.

(Aside: This model feels related to that of nuclear chain reactions in that you can model the total production of reactions as a geometric sequence, however, this model doesn't just have sub-criticality and super-criticality, but criticality.  And I'm not sure how you'd fit criticality in here).

In contrast, many economists want to model AI as a more traditional exponentially increasing system (ie. ).

So I've thought about this argument a bit more and concluded that you are correct, but also that there's a potential fix to get around this objection.

I think that it's quite plausible that an agent will have an understanding of its decision mechanism that a) let's it know it will take the same action in both counterfactuals b) won't tell it what action it will take in this counterfactual before it makes the decision.

And in that case, I think it makes sense to conclude that the Omega's prediction depends on your action such that paying gives you the $10,000 reward.

However, there's a potential fix in that we can construct a non-symmetrical version of this problem where Omega asks you for $200 instead of $100 in the tails case. Then the fact that you would pay in the heads case and combined with making decisions consistently doesn't automatically imply that you would pay in the tails case. So I suspect that with this fix you actually would have to consider strategies instead of just making a decision purely based on this branch.

Thanks for your response. There's a lot of good material here, although some of these components like modules or language seem less central to agency, at least from my perspective. I guess you might see these are appearing slightly down the stack?

Summary: John describes the problems of inner and outer alignment. He also describes the concept of True Names - mathematical formalisations that hold up under optimisation pressure. He suggests that having a "True Name" for optimizers would be useful if we wanted to inspect a trained system for an inner optimiser and not risk missing something.

He further suggests that the concept of agency breaks down into lower-level components like "optimisation", "goals", "world models", ect. It would be possible to make further arguments about how these lower-level concepts are important for AI safety.

This might be worth a shot, although it's not immediately clear that having such powerful maths provers would accelerate alignment more than capabilities. That said, I have previously wondered myself whether there is a need to solve embedded agency problems or whether we can just delegate that to a future AGI.

Oh wow, it's fascinating to see someone actually investigating this proposal. (I had a similar idea, but only posted it in the EA meme group).

Sorry, I'm confused by the terminology: 

Thanks for the extra detail!

(Actually, I was reading a post by Mark Xu which seems to suggest that the TradingAlgorithms have access to the price history rather than the update history as I suggested above)

My understanding after reading this is that TradingAlgorithms generate a new trading policy after each timestep (possibly with access to the update history, but I'm unsure). Is this correct? If so, it might be worth clarifying this, even though it seems clearer later.

Load More