Theoretical Computer Science Msc student at the University of [Redacted] in the United Kingdom.
I'm an aspiring alignment theorist; my research vibes are descriptive formal theories of intelligent systems (and their safety properties) with a bias towards constructive theories.
I think it's important that our theories of intelligent systems remain rooted in the characteristics of real world intelligent systems; we cannot develop adequate theory from the null string as input.
What do you think MIRI is currently doing wrong/what should they change about their approach/general strategy?
To be clear, I enjoyed the post and am looking forward to this sequence. A point of disagreement though:
One feasible-seeming approach is "accelerating alignment," which involves leveraging AI as it is developed to help solve the challenging problems of alignment. This is not a novel idea, as it's related to previously suggested concepts such as seed AI, nanny AI, and iterated amplification and distillation (IDA).
I disagree that using AI to accelerate alignment research is particularly load bearing for the development of a practical alignment craft or really necessary.
I think we should do it to be clear — I have used ChatGPT to aid some of my writing and plan to use it more — but it's to the same extent that we use Google/Wikipedia/Word processors to do research in general. That is, I don't expect AI assistance to be load bearing enough for alignment in general to merit special distinction.
To the extent that one does expect AI to be particularly load bearing for progress on developing useful alignment craft in particular, I think they're engaging in wishful thinking and snorting too much hopium. That sounds like shying away/avoiding the hard/difficult problems of alignment. John Wentworth has said that we shouldn't do that:
Far and away the most common failure mode among self-identifying alignment researchers is to look for Clever Ways To Avoid Doing Hard Things (or Clever Reasons To Ignore The Hard Things), rather than just Directly Tackling The Hard Things.
The most common pattern along these lines is to propose outsourcing the Hard Parts to some future AI, and "just" try to align that AI without understanding the Hard Parts of alignment ourselves. ... You can save yourself several years of time and effort by actively trying to identify the Hard Parts and focus on them, rather than avoid them. Otherwise, you'll end up burning several years on ideas which don't actually leave the field better off. That's one of the big problems with trying to circumvent the Hard Parts: when the circumvention inevitably fails, we are still no closer to solving the Hard Parts. (It has been observed both that alignment researchers mostly seem to not be tackling the Hard Parts, and that alignment research mostly doesn't seem to build on itself; I claim that the latter is a result of the former.)
Mostly, I think the hard parts are things like "understand agency in general better" and "understand what's going on inside the magic black boxes". If your response to such things is "sounds hard, man", then you have successfully identified (some of) the Hard Parts.
I don't think this point should be on the list (or at least, I don't think I endorse the position implied by explicitly placing the point on the list).
I disagree that intelligence and rationality are more fundamental than physics; the territory itself is physics, and that is all that is really there. Everything else (including the body of our phone knowledge) are models for navigating that territory.
Turing formalised computation and established the limits of computation given certain assumptions. However, those limits only apply as long as the assumptions are true. Turing did not prove that no mechanical system is superior to a Universal Turing Machine, and weird physics may enable super Turing computation.
The point I was making is that our models are only as good as their correlation with the territory. The abstract models we have aren't part of the territory itself.
I think an issue is that GPT is used to mean two things:
[See the Appendix]
The latter kind of GPT, is what I think is rightly called a "Simulator".
From @janus' Simulators (italicised by me):
It is exactly because of the existence of GPT the predictive model, that sampling from GPT is considered simulation; I don't think there's any real tension in the ontology here.
Credit for highlighting this distinction belongs to @Cleo Nardo: