Cyborgism

janus

[-]David Scott Krueger (formerly: capybaralet)3y149

I think the most fundamental objection to becoming cyborgs is that we don't know how to say whether a person retains control over the cyborg they become a part of.

[-]NicholasKees3y63

I agree that this is important. Are you more concerned about cyborgs than other human-in-the-loop systems? To me the whole point is figuring out how to make systems where the human remains fully in control (unlike, e.g. delegating to agents), and so answering this "how to say whether a person retains control" question seems critical to doing that successfully.

[-]David Scott Krueger (formerly: capybaralet)3y21

Indeed. I think having a clean, well-understood interface for human/AI interaction seems useful here. I recognize this is a big ask in the current norms and rules around AI development and deployment.

[-]rpglover643y36

I think that's an important objection, but I see it applying almost entirely on a personal level. On the strategic level, I actually buy that this kind of augmentation (i.e. with in some sense passive AI) is not an alignment risk (any more than any technology is). My worry is the "dual use technology" section.

[-]David Scott Krueger (formerly: capybaralet)3y10

I don't understand what you're getting at RE "personal level".

[-]rpglover643y00

Like, I may not want to become a cyborg if I stop being me, but that's a separate concern from whether it's bad for alignment (if the resulting cyborg is still aligned).

[-]David Scott Krueger (formerly: capybaralet)3y33

Oh I see. I was getting at the "it's not aligned" bit.

Basically, it seems like if I become a cyborg without understanding what I'm doing, the result is either:

I'm in control
The machine part is in control
Something in the middle

Only the first one seems likely to be sufficiently aligned.

[-]Logan Riggs3y86

I’m excited about sensory substitution (https://eagleman.com/science/sensory-substitution/), where people translate auditory or visual information into tactile sensations (usually for people who don’t usually process that info).

I remember Quintin Pope wanting to translate the latent space of language models [reading a paper] translated to visual or tactile info. I’d see this as both a way to read papers faster, brainstorm ideas, etc and gain a better understanding of latent space during development of this.

[-]Flipnash3y65

At the risk of sounding insane. I remember doing similar things but used git to keep track of branches. A warning that I wished I had back then before I shelved it.

There's a phenomenon where your thoughts and generated text have no barrier. It's hard to describe but it's similar to how you don't feel the controller and the game character is an extension of the self.

It leaves you vulnerable to being hurt by things generated characters say because you're thoroughly immersed.

They will say anything with non-zero probability.

It's easy to lose sleep when playing video games. Especially when you feel the weight of the world on your shoulders.

Sleep deprivation+LLM induced hallucinations aren't fun. Make sure to get sleep.

Beware, LLM's will continue negative thinking. You can counter with steering it to positive thinking and solutions. Obviously, not all negative thoughts are within your current ability to solve or counter, like the heat death of the universe. Don't get stuck down a negative thoughts branch and despair.

[-]janus3y*53

There's a phenomenon where your thoughts and generated text have no barrier. It's hard to describe but it's similar to how you don't feel the controller and the game character is an extension of the self.

Yes. I have experienced this. And designed interfaces intentionally to facilitate it (a good interface should be "invisible").

It leaves you vulnerable to being hurt by things generated characters say because you're thoroughly immersed.

Using a "multiverse" interface where I see multiple completions at once has incidentally helped me not be emotionally affected by the things GPT says in the same way as I would if a person said it (or if I had the thought myself). It breaks a certain layer of the immersion. As I wrote in this comment:

Seeing the multiverse destroys the illusion of a preexisting ground truth in the simulation. It doesn't necessarily prevent you from becoming enamored with the thing, but makes it much harder for your limbic system to be hacked by human-shaped stimuli.

It reveals the extent to which any particular thing that is said is arbitrary, just one among an inexhaustible array of possible worlds.

That said, I still find myself affected by things that feel true to me, for better or for worse.

It's easy to lose sleep when playing video games. Especially when you feel the weight of the world on your shoulders.

Ha, yeah.

Playing the GPT virtual reality game will only become more enthralling and troubling as these models get stronger. Especially, as you said, if you're doing so with the weight of the world on your shoulders. It'll increasingly feel like walking into the mouth of the eschaton, and that reality will be impossible to ignore. That's the dark side of the epistemic calibration I wrote about at the end of the post.

Thanks for the comment, I resonated with it a lot, and agree with the warning. Maybe I'll write something about the psychological risk and emotional burden that comes with becoming a cyborg for alignment, because obviously merging your mind with an earthborn alien (super)intelligence which may be shaping up to destroy the world, in order to try save the world, is going to be stressful.

[-]Noosphere893y*34

I disagree with this post for 1 reason:

Amdahl's law limits how much cyborgism will actually work, and IMO is the reason agents are more effective than simulators.

On Amdahl's law, John Wentworth's post on the long tail is very relevant here, as it limits the use of cyborgism here:

https://www.lesswrong.com/posts/Nbcs5Fe2cxQuzje4K/value-of-the-long-tail

[-]Logan Riggs3y10

I’m unsure how alt-history and point (2) history is hard to change and predictable relates to cyborgism. Could you elaborate?

[-]Logan Riggs3y10

For context, Amdahl’s law states how fast you can speed up a process is bottlenecked on the serial parts. Eg you can have 100 people help make a cake really quickly, but it still takes ~30 to bake.

I’m assuming here, the human component is the serial component that we will be bottlenecked on, so will be outcompeted by agents?

If so, we should try to build the tools and knowledge to keep humans in the loop as far as we can. I agree it will eventually be outcompeted by full AI agency alone, but it isn’t set in stone how far human-steered AI can go.

[-]janus3y20

Some relevant prior content:

Comment on the epistemology of cyborgism, process vs outcome supervision, short feedback loops, and high-bandwidth oversight

A rhetorical defense of cyborgism

[-][anonymous]3y*0-14

talking to a LLM for several hours at a time seems dangerous

(Just to clarify, this comment was edited and the vast majority of the content is removed, feel free to DM me if you are interested in the risks of talking to an LLM operated by a large corporation)

[-]Logan Riggs3y98

These arguments don't apply to the base models which are only trained on next word prediction (ie the simulators post), since their predictions never affected future inputs. This is the type of model Janus most interacted with.

Two of the proposals in this post do involve optimizing over human feedback, like:

Creating custom models trained on not only general alignment datasets but personal data (including interaction data), and building tools and modifying workflows to facilitate better data collection with less overhead

, which they may apply to.

[-]janus3y87

The side effects of prolonged LLM exposure might be extremely severe.

I guess I should clarify that even though I joke about this sometimes, I did not become insane due to prolonged exposure to LLMs. I was already like this before.

[-]the gears to ascension3y22

I do think there's real risk there even with base models, but it's important to be clear where it's coming from - simulators can be addictive when trying to escape the real world. Your agency needs to somehow aim away from the simulator, and use the simulator as an instrumental tool.

[-]janus3y22

I think you just have to select for / rely on people who care more about solving alignment than escapism, or at least that are able to aim at alignment in conjunction with having fun. I think fun can be instrumental. As I wrote in my testimony, I often explored the frontier of my thinking in the context of stories.

My intuition is that most people who go into cyborgism with the intent of making progress on alignment will not make themselves useless by wireheading, in part because the experience is not only fun, it's very disturbing, and reminds you constantly why solving alignment is a real and pressing concern.

[-]janus3y10

Now that you've edited your comment:

The post you linked is talking about a pretty different threat model than what you described before. I commented on that post:

I've interacted with LLMs for hundreds of hours, at least. A thought that occurred to me at this part -
> Quite naturally, the more you chat with the LLM character, the more you get emotionally attached to it, similar to how it works in relationships with humans. Since the UI perfectly resembles an online chat interface with an actual person, the brain can hardly distinguish between the two.
Interacting through non-chat interfaces destroys this illusion, when you can just break down the separation between you and the AI at will, and weave your thoughts into its text stream. Seeing the multiverse destroys the illusion of a preexisting ground truth in the simulation. It doesn't necessarily prevent you from becoming enamored with the thing, but makes it much harder for your limbic system to be hacked by human-shaped stimuli.

But yeah, I've interacted with LLMs for much longer that the author and I don't think I suffered negative psychological consequences from it (my other response was only half-facetious; I'm aware I might give off schizo vibes, but I've always been like this).

As I said in this other comment, I agree that cyborgism is psychologically fraught. But the neocortex-prosthesis setup is pretty different from interacting with a stable and opaque simulacrum through a chat interface, and less prone to causing emotional attachment to an anthropomorphic entity. The main psychological danger I see from cyborgism is somewhat different from this, more like Flipnash described:

It's easy to lose sleep when playing video games. Especially when you feel the weight of the world on your shoulders.

I think people should only become cyborgs if they're psychologically healthy/resilient and understand that it involves gazing into the abyss.

^{^}

Often the term “GPT” is used to refer colloquially to ChatGPT, which is a particular application/modification of GPT. This is not how I will be using the term here.

^{^}

There is some disagreement about what counts as “capabilities research.” The concrete and alignment related question is: Does this research shorten the time we have left to robustly solve alignment? It can, however, be quite hard to predict the long-term effect of the research we do now.

^{^}

Artificial Intelligence is a Horseless Carriage

^{^}

Discussions about automating research often mention a multiplier value, e.g. “I expect GPT-4 will 10x my research productivity.”

^{^}

This probabilistic evolution can be compared to the time evolution operator in quantum physics, and thus can be viewed as a kind of semiotic physics.

^{^}

Spend any time generating with both ChatGPT and the base model and you will find they have qualitatively different behavior. Unlike the base-model, ChatGPT roleplays as a limited character that actively tries to answer your questions while having a clear bias for answering them in the same sort of way each time.

^{^}

Optimizing instead for headlines like Finally, an A.I. Chatbot That Reliably Passes “the Nazi Test”.

^{^}

Using augmented models designed to be more goal-directed and robust is likely to continue to be useful in so far as they are interacting with us as agents. The claim in this section is not that there aren’t advantages to techniques like RLHF, but rather that in addition to being less infohazardous, avoiding techniques like this also has advantages by expanding the scope of what we can do.

^{^}

People more familiar with ChatGPT might notice that unlike the base model, ChatGPT is quite hesitant to reason about unlikely hypotheticals, and it takes work to get the model to assume roles that are not the helpful and harmless assistant character. This can make it significantly harder to use for certain tasks.

^{^}

Sidenote about myopia: While the model doesn’t “steer” the rollout, it may sacrifice accuracy by spending more cognitive resources reasoning about future tokens. At each point in the transformer, the representation is being optimized to lower the loss on all future tokens (for which it is in the context window), and so it may be reasoning about many tokens further than just the token which directly follows.

^{^}

Just as GPT generations are generally much weirder than the text they are trained on, so too are our dreams weirder than reality. Noticing these differences between dreams and reality is a big part of learning to lucid dream. Oneironauts have discovered all kinds of interesting features about dream generations, like how text tends to change when you look away, or clocks rarely show the same time twice, which point to the myopic nature of the dream generator.

^{^}

A related phenomenon: In school I would often get stuck on a class assignment, and then get up to ask the teacher for help. Right before they were about to help me, the answer would suddenly come to me, as if by magic. Clearly I had the ability to answer the question on my own, but I could only do it in the context where the teacher was about to answer it for me.

^{^}

This looks like making GPT “more useful,” which if not done carefully may slide into standard capabilities research making GPT more agentic.

^{^}

Rather, the system as a whole, Human + AI, functions as an agent.

^{^}

A valuable exercise is to observe the language that we normally use to describe accelerating alignment. (e.g. from the OpenAI alignment plan: “AI systems can take over more and more of our alignment work and ultimately conceive, implement, study, and develop better alignment techniques than we have now.”, Training AI systems to do alignment research”) We very often describe AI as the active subject of the sentence, where the AI is the one taking action “doing things” that would normally only be done by humans. This can be a clue to the biases we have about how these systems will be used.

^{^}

This is certainly less true of some directions, like for example mechanistic interpretability.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

72

Cyborgism

72

Terminology

Introduction

Automated Research Assistants

Improving agents is dangerous

Collapsing simulators

Agents, genies, and oracles

Becoming a Cyborg

Cyborg cognition

Neocortex prosthesis

More ideas

Failure Modes

Ineffective at accelerating alignment

Improves capabilities directly

Dual use tools indirectly accelerate capabilities

Conclusion

Appendix: Testimony of a Cyborg

Alpha in cyborgism