I wasn't intending for a metaphor of "biomimicry" vs "modernist".
(Claim 1) Wings can't work in space because there's no air. The lack of air is a fundamental reason for why no wing design, no matter how clever it is, will ever solve space travel.
If TurnTrout is right, then the equivalent statement is something like (Claim 2) "reward functions can't solve alignment because alignment isn't maximizing a mathematical function."
The difference between Claim 1 and Claim 2 is that we have a proof of Claim 1, and therefore don't bother debating it anymore, while with Claim 2 we only have an arbitrarily long list of examples for why reward functions can be gamed, exploited, or otherwise fail in spectacular ways, but no general proof yet for why reward functions will never work, so we keep arguing about a Sufficiently Smart Reward Function That Definitely Won't Blow up as if that is a thing that can be found if we try hard enough.
As of right now, I view "shard theory" sort of like a high-level discussion of chemical propulsion without the designs for a rocket or a gun. I see the novelty of it, but I don't understand how you would build a device that can use it. Until someone can propose actual designs for hardware or software that would implement "shard theory" concepts without just becoming an obfuscated reward function prone to the same failure modes as everything else, it's not incredibly useful to me. However, I think it's worth engaging with the idea because if correct then other research directions might be a dead-end.
Does that help explain what I was trying to do with the metaphor?
To some extent, I think it's easy to pooh-pooh finding a flapping wing design (not maximally flappy, merely way better than the best birds) when you're not proposing a specific design for building a flying machine that can go to space. Not in the tone of "how dare you not talk about specifics," but more like "I bet this chemical propulsion direction would have to look more like birds when you get down to brass tacks."
James Mickens is writing comedy. He worked in distributed systems. A "distributed system" is another way to say "a scenario in which you absolutely will have to use software to deal with your broken hardware". I can 100% guarantee that this was written with his tongue in his cheek.
The modern world is built on software that works around HW failures.
If we use one AI to oversee another AI, and something goes wrong, that’s not a recoverable error; we’re using AI assistance in the first place because we can’t notice the relevant problems without it. If two AIs debate each other in hopes of generating a good plan for a human, and something goes wrong, that’s not a recoverable error; it’s the AIs themselves which we depend on to notice problems. If we use one maybe-somewhat-aligned AI to build another, and something goes wrong, that’s not a recoverable error; if we had better ways to detect misalignment in the child we’d already have used them on the parent.
Replace "AI" with "computer" in this paragraph and it is obviously wrong because every example here is under-specified. There is a dearth of knowledge on this forum of anything resembling traditional systems engineering or software system safety and it shows in this thread and in the previous thread you made about air conditioners. I commented as such here.
"If we use one computer to oversee another computer, and something goes wrong, that's not a recoverable error; we're using computer assistance in the first place because we can't notice the relevant problems without it."
Here are some examples off the top of my head where we use one computer to oversee another computer:
These aren't cherry-picked. This is the bread & butter of systems safety. We build complex, safe systems by identifying failure modes and then using redundant systems to either tolerate faults or to fail-safe. By focusing on the actual system, and the actual failure modes, and by not getting stuck with our head in the clouds considering a set of "all possible hypothetical systems", it is possible to design & implement robust, reliable solutions in reality.
To claim it is not just impossible to do that, but that it is foolhardy to even try, is the exact opposite of a safety-critical mindset.
Section 1, section 10, and section 11 cover the scenario of R&D automation via AI/ML systems that drive more productive R&D automation, resulting in a positive feedback loop, without requiring the typical "self-improving agent" -- it's the R&D system (people + AI/ML products) as a whole that is self-improving, not the individual AI/ML systems.
I highly recommend reading the entire report though. It was released in 2019 and I think it was brushed aside a little bit too easily. The past 3 years have (in my mind) provided sufficient evidence of things that CAIS directly predicted would happen, e.g. all of the AI/ML systems we've developed recently that have reached super-human competency on tasks despite a lack of generalized learning or capabilities or "self-improvement" or other recognizably "general intelligence" / "agent"-like behavior.
In 2019, we did not have Copilot, or DALL-E, or DALL-E-2, or AlphaFold, or DeepMind Ithaca, or GPT-3 -- etc.
I talk about this a little bit in my comment here.
This was one of the central points in the CAIS technical report.