I had some colleagues watch Ben Garfinkel's talk, "How sure are we about this AI stuff?", which among other things, pointed out that it's often difficult to change the long-term trajectory of some technology. For instance, electricity, the printing press, and agriculture were all transformative technologies, but even if we recognized their importance in advance, it's hard to see what we could really change about them in the long-term.
In general, when I look at technological development/adoption, I tend to see people following local economic incentives wherever they lead, and it often seems hard to change these gradients without some serious external pressures (forceful governments, cultural taboos, etc.). I don't see that many "parallel tracks" where a farsighted agent could've set things on a different track by pulling the right lever at the right time. A counterexample is the Qwerty vs. Dvorak keyboard, where someone with enough influence may well have been able to get society to adopt the better keyboard from a longtermist perspective.
This causes one to look at cases of "lock-in": times where we could have plausibly taken any one of multiple paths, and this decision:
a) could have been changed my a relatively small group of farsighted agents
b) had significant effects that lasted decades or more
A lot of the best historical examples of this aren't technological--the founding of major religions, the writing of the US constitution, the Bretton Woods agreement--which is maybe some small update towards political stuff being important from a longtermist perspective.
But nevertheless, there are examples of lock-in for technological development. In a group discussion after watching Garfinkel's talk, Lin Eadarmstadt asked what examples of lock-in there might be for AI research. I think this is a really good question, because it may be one decent way of locating things we can actually change in the longterm. (Of course, not the only way by any means, but perhaps a fruitful one).
After brainstorming this, it felt hard to come up with good examples, but here's two sort-of-examples:
First, there's the programming language that ML is done in. Right now, it's almost entirely Python. In some not-totally-implausible counterfactual, it's done in OCaml, where the type-checking is very strict, and hence certain software errors are less likely to happen. On this metric, Python is pretty much the least safe language for ML.
Of course, even if we agree the OCaml counterfactual is better in expectation, it's hard to see how anyone could've nudged ML towards it even in hindsight. Of course, this would've been much easier when ML was a smaller field than it is now, hence we can say Python's been "locked in". On the other hand, I've heard murmurs about Swift attempting to replace it, with the latter having better-than-zero type safety.
Caveats: I don't take these "murmurs" seriously, it seems very unlikely to me that AGI goes catastrophically wrong due to a lack of type safety, and I don't think it's worth the time of anyone here to worry about this. This is mostly just a hopefully illustrative example.
Currently, deep reinforcement learning (DRL) is usually done by specifying a reward function upfront, and having the agent figure out how to maximize it. As we know, reward functions are often hard to specify properly in complex domains, and this is one bottleneck on DRL capabilities research. Still, in my naive thinking, I can imagine a plausible scenario where DRL researchers get used to "fudging it": getting agents to sort-of-learn lots of things in a variety of relatively complex domains where the reward functions are hacked together by grad student descent, and after many years of hardware overhang have set in, someone finally figures out a way to stitch these together to get an AGI (or something "close enough" to do some serious damage).
The main alternatives to reward specification are imitation learning, inverse RL, and DeepMind's reward modeling (see section 7 of this paper for a useful comparison). In my estimation, either of these approaches are probably safer than the "AGI via reward specification" path.
Of course, these don't clearly form 4 distinct tech paths, and I rate it > 40% that if AGI largely comes out of DRL, no one technique will claim all the major milestones along the way. So this is a pretty weak example of "lock-in", because I think, for instance, DRL researchers will flock to reward modeling if DeepMind unambiguously demonstrates its superiority over reward specification.
Still, I think there is an extent to which researchers become "comfortable" with research techniques, and that if TensorFlow has extensive libraries for reward specification and every DRL textbook has a chapter "Heuristics for Fudging It", while other techniques are viewed as esoteric and have start-up costs to applying (and less libraries), then this may become a weak form of lock-in.
As I've said, those two are fairly weak examples. The former is a lock-in that happened a while ago that we probably can't change now, and it doesn't seem that important even if we could. The latter is a fairly weak form of lock-in, in that it can't withstand that much in the way of counter-incentives (compare with the Qwerty keyboard).
Still, I found it fun thinking about these, and I'm curious if people have any other ideas of potential "lock-in" for AI research? (Even if it doesn't have any obvious implications for safety).
Huh, that's a good point. Whereas it seems probably inevitable that AI research would've eventually converged on something similar to the current D(R)L paradigm, we can imagine a lot of different ways AI safety could have looked like instead right now. Which makes sense, since the latter is still young and in a kind of pre-paradigmatic philosophical stage, with little unambiguous feedback to dictate how things should unfold (and it's far from clear when substantially more of this feedback will show up).
I can imagine an alternate timeline whe... (read more)