Suggestion: make it a CYOA-style interactive piece, where the reader is tasked with aligning AI, and could choose from a variety of approaches which branch out into sub-approaches and so on. All of the paths, of course, bottom out in everyone dying, with detailed explanations of why. This project might then evolve based on feedback, adding new branches that counter counter-arguments made by people who played it and weren't convinced. Might also make several "modes", targeted at ML specialists, general public, etc., where the text makes different tradeoffs regarding technicality vs. vividness.
I'd do it myself (I'd had the idea of doing it before this post came out, and my preliminary notes covered much of the same ground, I feel the need to smugly say), but I'm not at all convinced that this is going to be particularly useful. Attempts to defeat the opposition by building up a massive evolving database of counter-arguments have been made in other fields, and so far as I know, they never convinced anybody.
The interactive factor would be novel (as far as I know), but I'm still skeptical.
(A... different implementation might be to use a fine-tuned language model for this; make it an AI Dungeon kind of setup, where it provides specialized counter-arguments for any suggestion. But I expect it to be less effective than a more coarse hand-written CYOA, since the readers/players would know that the thing they're talking to has no idea what it's talking about, so would disregard its words.)