Oliver Sourbut

Call me Oliver or Oly - I don't mind which.

I'm particularly interested in sustainable collaboration and the long-term future of value. I'd love to contribute to a safer and more prosperous future with AI! Always interested in discussions about axiology, x-risks, s-risks.

I'm currently (2022) just embarking on a PhD in AI in Oxford, and also spend time in (or in easy reach of) London. Until recently I was working as a senior data scientist and software engineer, and I've been doing occasional AI alignment research with SERI.

I enjoy meeting new perspectives and growing my understanding of the world and the people in it. I also love to read - let me know your suggestions! In no particular order, here are some I've enjoyed recently

  • Ord - The Precipice
  • Pearl - The Book of Why
  • Bostrom - Superintelligence
  • McCall Smith - The No. 1 Ladies' Detective Agency (and series)
  • Melville - Moby-Dick
  • Abelson & Sussman - Structure and Interpretation of Computer Programs
  • Stross - Accelerando
  • Graeme - The Rosie Project (and trilogy)

Cooperative gaming is a relatively recent but fruitful interest for me. Here are some of my favourites

  • Hanabi (can't recommend enough; try it out!)
  • Pandemic (ironic at time of writing...)
  • Dungeons and Dragons (I DM a bit and it keeps me on my creative toes)
  • Overcooked (my partner and I enjoy the foody themes and frantic realtime coordination playing this)

People who've got to know me only recently are sometimes surprised to learn that I'm a pretty handy trumpeter and hornist.


Breaking Down Goal-Directed Behaviour

Wiki Contributions


I think Quintin[1] is maybe alluding to the fact that in the limit of infinite counterfactual exploration then sure, the gradient in sample-based policy gradient estimation will push in that direction. But we don't ever have infinite exploration (and we certainly don't have counterfactual exploration; though we come very close in simulations with resets) so in pure non-lookahead (e.g. model free) sample-based policy gradient estimation, an action which has never been tried can not be reinforced (except as a side effect of generalisation by function approximation).

This seems right to me and it's a nuance I've raised in a few conversations in the past. On the other hand kind of half the point of RL optimisation algorithms is to do 'enough' exploration! And furthermore (as I mentioned under Steven's comment) I'm not confident that such simplistic RL is the one that will scale to AGI first. cf various impressive results from DeepMind over the years which use lots of shenanigans besides plain old sample-based policy gradient estimation (including model-based lookahead as in the Alpha and Mu gang). But maybe!

  1. This is a guess and I haven't spoken to Quintin about this - Quintin, feel free to clarify/contradict ↩︎

  1. Information inaccessibility is somehow a surmountable problem for AI alignment (and the genome surmounted it),
  2. The genome solves information inaccessibility in some way we cannot replicate for AI alignment, or
  3. The genome cannot directly address the vast majority of interesting human cognitive events, concepts, and properties. (The point argued by this essay)

In my opinion, either (1) or (3) would be enormous news for AI alignment

What do you mean by 'enormous news for AI alignment'? That either of these would be surprising to people in the field? Or that resolving that dilemma would be useful to build from? Or something else?

FWIW from my POV the trilemma isn't, because I agree that (2) is obviously not the case in principle (subject to enough research time!). And I further think it reasonably clear that both (1) and (3) are true in some measure. Granted you say 'at least one' must be true, but I think the framing as a trilemma suggests you want to dismiss (1) - is that right?

I'll bite those bullets (in devil's advocate style)...

  • I think about half of your bullets are probably (1), except via rough proxies (power, scamming, family, status, maybe cheating)
    • why? One clue is that people have quite specific physiological responses to some of these things. Another is that various of these are characterised by different behaviour in different species.
    • why proxies? It stands to reason, like you're pointing out here, it's hard and expensive to specify things exactly. Further, lots of animal research demonstrates hardwired proxies pointing to runtime-learned concepts
  • Sunk cost, framing, and goal conflation smell weird to me in this list - like they're the wrong type? I'm not sure what it would mean for these to be 'detected' and then the bias 'implemented'. Rather I think they emerge from failure of imagination due to bounded compute.
    • in the case of goals I think that's just how we're implemented (it's parsimonious)
      • with the possible exception of 'conscious self approval' as a differently-typed and differently-implemented sole terminal goal
      • other goals at various levels of hierarchy, strength, and temporal extent get installed as we go
  • ontological shifts are just supplementary world abstractions being installed which happen to overlap with preexisting abstractions
    • tentatively, I expect cells and atoms probably have similar representation to ghosts and spirits and numbers and ecosystems and whatnot - they're just abstractions and we have machinery which forms and manipulates them
      • admittedly this machinery is basically magic to me at this point
  • wireheading and reality/non-reality are unclear to me and I'm looking forward to seeing where you go with it
    • I suspect all imagined circumstances ('real' or non-real) go via basically the same circuitry, and that 'non-real' is just an abstraction like 'far away' or 'unlikely'
      • after all, any imagined circumstances is non-real to some extent

Another aesthetic similarity which my brain noted is between your concept of 'information loss' on inputs for layers-which-discriminate and layers-which-don't and the concept of sufficient statistics.

A sufficient statistic is one for which the posterior is independent of the data , given the statistic

which has the same flavour as

In the respective cases, and are 'sufficient' and induce an equivalence class between s

Regarding your empirical findings which may run counter to the question

  1. Is manifold dimensionality actually a good predictor of which solution will be found?

I wonder if there's a connection to asymptotic equipartitioning - it may be that the 'modal' (most 'voluminous' few) solution basins are indeed higher-rank, but that they are in practice so comparatively few as to contribute negligible overall volume?

This is a fuzzy tentative connection made mostly on the basis of aesthetics rather than a deep technical connection I'm aware of.

Interesting stuff! I'm still getting my head around it, but I think implicit in a lot of this is that loss is some quadratic function of 'behaviour' - is that right? If so, it could be worth spelling that out. Though maybe in a small neighbourhood of a local minimum this is approximately true anyway?

This also brings to mind the question of what happens when we're in a region with no local minimum (e.g. saddle points all the way down, or asymptoting to a lower loss, etc.)

I think the gradient descent bit is spot on. That also looks like the flavour of natural selection, with non infinitesimal (but really small) deltas. Natural selection consumes a proof that a particular (mutation) produces (fitness) to generate/propagate/multiply .

I recently did some thinking about this and found an equivalence proof under certain conditions for the natural selection case and the gradient descent case.

In general, I think the type signature here can indeed be soft or fuzzy or lossy and you still get consequentialism, and the 'better' the fidelity, the 'better' the consequentialism.

This post has also inspired some further thinking and conversations and refinement about the type of agency/consequentialism which I'm hoping to write up soon. A succinct intuitionistic-logic-flavoured summary is something like but there's obviously more to it than that.

This post is thoroughly excellent, a good summary and an important service!

However, the big caveat here is that evolution does not implement Stochastic Gradient Descent.

I came here to say that in fact they are quite analogous after all