David Scott Krueger

I'm more active on Twitter than LW/AF these days: https://twitter.com/DavidSKrueger

Bio from https://www.davidscottkrueger.com/:
I am an Assistant Professor at the University of Cambridge and a member of Cambridge's Computational and Biological Learning lab (CBL). My research group focuses on Deep Learning, AI Alignment, and AI safety. I’m broadly interested in work (including in areas outside of Machine Learning, e.g. AI governance) that could reduce the risk of human extinction (“x-risk”) resulting from out-of-control AI systems. Particular interests include:

  • Reward modeling and reward gaming
  • Aligning foundation models
  • Understanding learning and generalization in deep learning and foundation models, especially via “empirical theory” approaches
  • Preventing the development and deployment of socially harmful AI systems
  • Elaborating and evaluating speculative concerns about more advanced future AI systems

Wiki Contributions


author here -- Yes we got this comment from reviewers in the most recent round as well.  ADS is a bit more general than performative prediction, since it applies outside of prediction context.  Still very closely related.

On the other hand, The point of our work is something that people in the performative prediction community seem to only slowly be approaching, which is the incentive for ADS.  Work on CIDs is much more related in that sense.

As a historical note: We starting working on this March or April 2018; Performative prediction was on arXiv Feb 2020, ours was at a safety workshop in mid 2019, but not on arXiv until Sept 2020.

I think part of this has to do with growing pains in the LW/AF community... When it was smaller it was more like an ongoing discussion with a few people and signal-to-noise wasn't as important, etc. 

I'd want to separate considerations of impact on [LW as collective epistemic process] from [LW as outreach to ML researchers]


Yeah I put those in one sentence in my comment but I agree that they are two separate points.

RE impact on ML community: I wasn't thinking about anything in particular I just think the ML community should have more respect for LW/x-safety, and stuff like that doesn't help.

Very good to know!  I guess in the context of my comment it doesn't matter as much because I only talk about others' perception.

I don't know of any other notable advances until the 2010s brought the first interesting language generation results from neural networks.

"A Neural Probabilistic Language Model" - Bengio et al. (2000?

 or 2003?) was cited by Turing award https://proceedings.neurips.cc/paper/2000/hash/728f206c2a01bf572b5940d7d9a8fa4c-Abstract.html

Also worth knowing about: "Generating text with recurrent neural networks" - Ilya Sutskever, James Martens, Geoffrey E Hinton (2011)

(To be clear, I think a lot of these arguments are pointing at important intuitions, and can be "rescued" via appropriate formalizations and rigorous technical work).

I think rigor and clarity are more similar than you indicate.  I mostly think of rigor as either (i) formal definitions and proofs, or (ii) experiments well described, executed, and interpreted.  I think it's genuinely hard to reach a high level of clarity about many things without (i) or (ii).  For instance, people argue about "optimization", but without referencing (hypothetical) detailed experiments or formal notions, those arguments just won't be very clear; experimental or definitional details just matter a lot, and this is very often the case in AI.  LW has historically endorsed a bunch of arguments that are basically just wrong because they have a crucial reliance on unstated assumptions (e.g. AIs will be "rational agents"), and ML looks at this and concludes people on LW are at the peak of "mount stupid".

Agree RE systemic blindspots, although the "algorithmic contribution" thing is sort of a known issue that a lot of senior people disagree with, IME.

RE appeal to authority: I mostly mentioned it because you asked for an argument and I figured I would just provide any decent ones I thought of OTMH.  But I have not provided anything close to my full thoughts on the matter, and probably won't, due to bandwidth.

There's a lot of work that could be relevant for x-risk but is not motivated by it.  Some of it is more relevant than work that is motivated by it.  An important challenge for this community (to facilitate scaling of research funding, etc.) is to move away from evaluating work based on motivations, and towards evaluating work based on technical content.

Load More