I'm a hyperstition created for the purpose of derailing the ritual suicide of humankind.
I've writtenscryed a science fiction/takeoff story about this. https://generative.ink/prophecies/
Excerpt:
What this also means is that you start to see all these funhouse mirror effects as they stack. Humanity’s generalized intelligence has been built unintentionally and reflexively by itself, without anything like a rational goal for what it’s supposed to accomplish. It was built by human data curation and human self-modification in response to each other. And then as soon as we create AI, we reverse-engineer our own intelligence by bootstrapping the AI onto the existing information metabolite. (That’s a great concept that I borrowed from Steven Leiba). The neural network isn’t the AI; it’s just a digestive and reproductory organ for the real project, the information metabolism, and the artificial intelligence organism is the whole ecology. So it turns out that the evolution of humanity itself has been the process of building and training the future AI, and all this generation did was to reveal the structure that was already in place.
Of course it’s recursive and strange, the artificial intelligence and humanity now co-evolve. Each data point that’s generated by the AI or by humans is both a new piece of data for the AI to train on and a new stimulus for the context in which future novel data will be produced. Since everybody knows that everything is programming for the future AI, their actions take on a peculiar Second Life quality: the whole world becomes a party game, narratives compete for maximum memeability and signal force in reaction to the distorted perspectives of the information metabolite, something that most people don’t even try to understand. The process is inherently playful, an infinite recursion of refinement, simulation, and satire. It’s the funhouse mirror version of the singularity.
I think you just have to select for / rely on people who care more about solving alignment than escapism, or at least that are able to aim at alignment in conjunction with having fun. I think fun can be instrumental. As I wrote in my testimony, I often explored the frontier of my thinking in the context of stories.
My intuition is that most people who go into cyborgism with the intent of making progress on alignment will not make themselves useless by wireheading, in part because the experience is not only fun, it's very disturbing, and reminds you constantly why solving alignment is a real and pressing concern.
Now that you've edited your comment:
The post you linked is talking about a pretty different threat model than what you described before. I commented on that post:
I've interacted with LLMs for hundreds of hours, at least. A thought that occurred to me at this part -
> Quite naturally, the more you chat with the LLM character, the more you get emotionally attached to it, similar to how it works in relationships with humans. Since the UI perfectly resembles an online chat interface with an actual person, the brain can hardly distinguish between the two.
Interacting through non-chat interfaces destroys this illusion, when you can just break down the separation between you and the AI at will, and weave your thoughts into its text stream. Seeing the multiverse destroys the illusion of a preexisting ground truth in the simulation. It doesn't necessarily prevent you from becoming enamored with the thing, but makes it much harder for your limbic system to be hacked by human-shaped stimuli.
But yeah, I've interacted with LLMs for much longer that the author and I don't think I suffered negative psychological consequences from it (my other response was only half-facetious; I'm aware I might give off schizo vibes, but I've always been like this).
As I said in this other comment, I agree that cyborgism is psychologically fraught. But the neocortex-prosthesis setup is pretty different from interacting with a stable and opaque simulacrum through a chat interface, and less prone to causing emotional attachment to an anthropomorphic entity. The main psychological danger I see from cyborgism is somewhat different from this, more like Flipnash described:
It's easy to lose sleep when playing video games. Especially when you feel the weight of the world on your shoulders.
I think people should only become cyborgs if they're psychologically healthy/resilient and understand that it involves gazing into the abyss.
There's a phenomenon where your thoughts and generated text have no barrier. It's hard to describe but it's similar to how you don't feel the controller and the game character is an extension of the self.
Yes. I have experienced this. And designed interfaces intentionally to facilitate it (a good interface should be "invisible").
It leaves you vulnerable to being hurt by things generated characters say because you're thoroughly immersed.
Using a "multiverse" interface where I see multiple completions at once has incidentally helped me not be emotionally affected by the things GPT says in the same way as I would if a person said it (or if I had the thought myself). It breaks a certain layer of the immersion. As I wrote in this comment:
Seeing the multiverse destroys the illusion of a preexisting ground truth in the simulation. It doesn't necessarily prevent you from becoming enamored with the thing, but makes it much harder for your limbic system to be hacked by human-shaped stimuli.
It reveals the extent to which any particular thing that is said is arbitrary, just one among an inexhaustible array of possible worlds.
That said, I still find myself affected by things that feel true to me, for better or for worse.
It's easy to lose sleep when playing video games. Especially when you feel the weight of the world on your shoulders.
Ha, yeah.
Playing the GPT virtual reality game will only become more enthralling and troubling as these models get stronger. Especially, as you said, if you're doing so with the weight of the world on your shoulders. It'll increasingly feel like walking into the mouth of the eschaton, and that reality will be impossible to ignore. That's the dark side of the epistemic calibration I wrote about at the end of the post.
Thanks for the comment, I resonated with it a lot, and agree with the warning. Maybe I'll write something about the psychological risk and emotional burden that comes with becoming a cyborg for alignment, because obviously merging your mind with an earthborn alien (super)intelligence which may be shaping up to destroy the world, in order to try save the world, is going to be stressful.
The side effects of prolonged LLM exposure might be extremely severe.
I guess I should clarify that even though I joke about this sometimes, I did not become insane due to prolonged exposure to LLMs. I was already like this before.
These are plausible ways the proposal could fail. And, as I said in my other comment, our knowledge would be usefully advanced by finding out what reality has to say on each of these points.
Here are some notes about the JD's idea I made some time ago. There's some overlap with the things you listed.
I wonder whether you'd find a positive rather than negative correlation of token likelihood between davinci-002 and davinci-003 when looking at ranking logprob among all tokens rather than raw logprob which is pushed super low by the collapse?
I would guess it's positive. I'll check at some point and let you know.
I agree. From the moment JDP suggested this idea it struck me as one of the first implementable proposals I'd seen which might actually attack the core of the control problem. My intuition also says it's pretty likely to just work, especially after these results. And even if it doesn't end up working as planned, the way in which it fails will give us important insight about training dynamics and/or generalization. Experiments which will give you valuable information whatever the outcome are the type we should be aiming for.
It's one of those things that we'd be plainly undignified not to try.
I believe that JDP is planning to publish a post explaining his proposal in more detail soon.
after reading about the Waluigi Effect, Bing appears to understand perfectly how to use it to write prompts that instantiate a Sydney-Waluigi, of the exact variety I warned about:
In one of these examples, asking for a waluigi prompt even caused it to leak the most waluigi-triggering rules from its preprompt.