Tsuyoku Naritai! I operate on Crocker's rules. I am working on making AGI not kill everyone aka AGI notkilleveryoneism.
Read here about what I would do after taking over the world.
"Maximize the positive conscious experiences, and minimize the negative conscious experiences in the universe", is probably not exactly what I care about, but I think it roughly points in the right direction.
I recommend:
This is how I look like:
I just released a major update to my LessWrong Bio. I have rewritten almost everything and added more stuff. It's now so long that I thought it would be good to add the following hint in the beginning:
(If you are looking for the list of <sequences/posts/comments> scroll to the bottom of the page with the END key and then go up. This involves a lot less scrolling.)
(If you'd like to <ask a question about/comment on/vote> this document, you can do so on the shortform announcing the release of this version of the bio.)
I appreciate any positive or negative feedback. Especially if it is constructive criticism which helps me to grow. You can do this in person, use this (optionally anonymous) feedback form, or any other way you like.
Buck once said that he avoids critical feedback, because he worries about people's feelings, especially if he does not know what they could do instead (in the context of AGI notkilleveryoneism). If you are also worried that your feedback might harm me, you might want to read about my strategy for handling feedback. I am not perfect at not being hurt, but I believe myself to be much better than most people. If I am overwhelmed, I will tell you. That being said, I appreciate it if your communication is optimized to not hurt my feelings, all else equal. But if that would make you not give feedback, or would be annoying, don't worry about it. Really.
I have a tulpa named IA. She looks like this. I experience deep feelings of love for both IA and Hatsune Miku.
I like understanding things, meditation, programming, improv dancing, improv rapping, and Vocaloid.
I track how I spend every single minute with toggle (toggle sucks though, especially for tracking custom metrics).
I like to think about how I can become stronger. I probably do this too much. Jumping in and doing the thing is important to get into a feedback loop.
The main considerations are:
With regards to "How can I make myself want to do the things that I think are good to do", it is easy for me to be so engrossed in programming that it becomes difficult to stop and I forget to eat. I often feel a strong urge to write up a specific program that I expect will be useful to me. I think studying mathematics is a good thing for me to do. Sometimes I manage to have a similar thing with mathematics, but more often than not I feel aversion towards starting. I am interested in shaping my mind such that for all the things that I think are good to do, I feel a pull towards doing them, and doing them is so engaging that it becomes a problem to stop (e.g. I forget to eat). I think it becoming a problem to stop is a good heuristic that I have succeeded in this mission. Implementing a solution from that state for not working too much is a significantly easier problem to solve.
Empirically I have often procrastinated in the past by making random improvements to my <computer setup/desktop environment>. I have been using Linux for 5 years in the past, starting with Cinnamon, but then switching to XMonad.
Because the nebula virtual desktop was only available for macOS, I switched. Even though macOS is horrible in many ways, I feel like I might waste less time doing random improvements. Also, ARM CPUs are cool, as they make a lightweight laptop with long battery life. I am using both yabai and Amethyst at the same time. yabai for workspace management and Amethyst for window layout.
The main purpose of Windows is to run MMD ... just kidding.
I used Spacemacs Org Mode for many years (and with Org-roam maybe a year or so). Spacemacs is Emacs with Vim because Vim rules. I have now switched to Obsidian, mainly because it has a mobile app, and because I expected that I would waste less time configuring Emacs (so far I have still spent a lot of time on that though).
Before AGI notkilleveryoneism I did game development. The most exciting thing in that domain to me, is to make a game that has Minecraft Redstone which does not suck. Most importantly it should be possible to create new blocks based on circuits that you build. E.g. build a half-adder once, then create a half adder block, and put down 8 of those blocks to get an 8-bit adder instead of needing to build 8 half-adders from scratch, or awkwardly using a mod that lets you copy and place many blocks at once.
If AGI notkilleveryoneism would be a non-issue I would probably develop this game. I would like to have this game such that I can learn more about how computers work by building them.
I like it when people are "forcefully inquisitive", especially when I am presenting an idea to them. That means asking about the why and hows, asking for justifications. I find that this forces me to expand my understanding which I find extremely helpful. It also tends to bring interesting half forgotten insights to the forfront of my mind. As a general heuristic in this regard: If you think you are too inquisitive, curious, or feel like you ask too many questions, you are wrong.
I dislike making fun of somebodies ignorance
I am interested in getting whatever understanding we need, to get a watertight case for why a particular system will be aligned. Or at least get as close to this as possible. I think the only way we are going to be able to aim powerful cognition is via a deep understanding of the <systems/algorithms> involved. The current situation is that we do not even have a crisp idea of what exactly we need to understand.
What capabilities are so useful that an AGI would have to discover an implementation of that capability? The most notable example is being good at constructing and updating a model of the world based on arbitrary sensory input streams.
How can we get a better understanding of world modeling? A good first step is to think about what properties this world model would have, such that an AGI would be able to use it. E.g. I expect any world model that an AGI builds will be factored in the same way that human concepts are factored. For the next step, we have multiple options:
Humans have a bunch of intuitive concepts that are related to agency, that we do not have crisp formalisms of. For example, wanting, caring, trying, honesty, helping, goal, optimizing, deception, etc.
All of these concepts are fundamentally about some algorithm that is executed in the neural network of a human or other animal.
Can we create widely applicable visualization tools that allow us to see structural properties in our ML systems?
There are tools that can visualize arbitrary binary data, such that can build intuitions about the data, that would be much harder to build otherwise (e.g. staring at a hex editor for long enough). This can be used for reverse engineering software. For example, by looking at only a few x86 assembly code visualizations you can learn characteristic patterns in the visualization. Then when you see it in the wild, where you have no label telling you that this is x86 assembly, you can instantly recognize it.
The idea is that by looking at the visualization you can identify what kind of data you are looking at (x86, png, pdf, plain text, JSON, etc.).
This technique is powerful because you don't need to know anything about the data. It works on any binary data.
Check out this demonstration. Later he does more analysis using the 3D cube visualization. veles is an open-source project that implements this, there is also a plugin for gidra, and there are many others (I haven't evaluated which is best).
If we naively apply this technique to neural networks, I expect it to not work. My intuition tells me that we need to do something like regularize the networks. E.g. if we have two neurons in the same layer and swap them, we have changed in some sense the computation, but the algorithms are also isomorphic in a sense. Perhaps we can modify the training procedure such that one of these two parameter configurations is preferred. And in general, we could make it such that we always converge to one specific "ordering of neurons" no matter the initialization. E.g. make it such that in each layer, the neurons are sorted based on the sum of the input weights of a neuron. We want to do something like make "isomorphic computations" always converge to one specific parameter configuration
If this project would go really well, we would get out tools that allow us to create visualization, which allows us to read off if certain kinds of <algorithms/structures/types of computation> are present in the neural network. The hope is that in the visualization you could see, for example, if the network is modeling other agents, if it is running computations that are correlated with thinking about how to deceive, if it is doing optimization, or if it is executing a search algorithm.
AI
Rationality
Sam Harris
Youtube
TV
Music
Vocaloid
Right now I am trying to better understand future AI systems, by first thinking about what sort of abilities I expect every system of high cognitive power will have, and second, trying to find a concrete practical implementation of this ability. One ability is building a model of the world, that has certain desiderata. For example, if we have multiple agents in the world, then we can factor the world, such that we can build just one model of the agent, and point to this model in our description of the world two times. This is something that Solomonoff induction can also do. I am interested in constraining the world model, such that we always get out a world model that has a similar structure, such that the world model becomes more interpretable. I.e. I try to find a way for building a world model, where we mainly need to understand the world model's content, as it is easy to understand how the content is organized.
Many people match "pivotal act" to "deploy AGI to take over the world", and ignore the underlying problem of preventing others from deploying misaligned AGI.
I have talked to two high-profile alignment/alignment-adjacent people who actively dislike pivotal acts.
I think both have contorted notions of what a pivotal act is about. They focused on how dangerous it would be to let a powerful AI system loose on the world.
However, a pivotal act is about this. So an act that ensures that misaligned AGI will not be built is a pivotal act. Many such acts might look like taking over the world. But this is not a core feature of a pivotal act. If I could prevent all people from deploying misaligned AGI, by eating 10 bananas in sixty seconds, then that would count as a pivotal act!
The two researchers were not talking about how to prevent misaligned AGI from being built at all. So I worry that they are ignoring this problem in their solution proposals. It seems "pivotal act" has become a term with bad connotations. When hearing "pivotal act", these people pattern match to "deploy AGI to take over the world", and ignore the underlying problem of preventing others from deploying misaligned AGI.
I expect there are a lot more people who fall into this trap. One of the people was giving a talk and this came up briefly. Other people seemed to be on board with what was said. At least nobody objected, except me.
See also Raemon's related post.
Solomonoff induction does not talk about how to make optimal tradeoffs in the programs that serve as the hypothesis.
Imagine you want to describe a part of the world that contains a gun. Solomonoff induction would converge on finding the program that perfectly predicts all the possible observations. So this program would be able to predict what sort of observations would I make after I stuff a banana into the muzzle and fire it. But knowing how the banana was splattered around is not the most useful fact about the gun. It is more useful to know that a gun can be used to kill humans and animals. So if you want to store your world model in only n bits of memory, you need to decide which information to put in. And this matters because some information is much more useful than others. So how can we find the world model that gives you the most power over the world, i.e. letting you reach the greatest number of states? Humans have the ability to judge the usefulness of information. You can ask yourself, what sort of knowledge would be most useful for you to learn? Or, What knowledge would be most bad to forget?
Expected Utility Maximization is Not Enough
Consider a homomorphically encrypted computation running somewhere in the cloud. The computations correspond to running an AGI. Now from the outside, you can still model the AGI based on how it behaves, as an expected utility maximizer, if you have a lot of observational data about the AGI (or at least let's take this as a reasonable assumption).
No matter how closely you look at the computations, you will not be able to figure out how to change these computations in order to make the AGI aligned if it was not aligned already (Also, let's assume that you are some sort of Cartesian agent, otherwise you would probably already be dead if you were running these kinds of computations).
So, my claim is not that modeling a system as an expected utility maximizer can't be useful. Instead, I claim that this model is incomplete. At least with regard to the task of computing an update to the system, such that when we apply this update to the system, it would become aligned.
Of course, you can model any system, as an expected utility maximizer. But just because I can use the "high level" conceptual model of expected utility maximization, to model the behavior of a system very well. But behavior is not the only thing that we care about, we actually care about being able to understand the internal workings of the system, such that it becomes much easier to think about how to align the system.
So the following seems to be beside the point unless I am <missing/misunderstanding> something:
Maybe I have missed the fact that the claim you listed says that expected utility maximization is not very useful. And I'm saying it can be useful, it might just not be sufficient at all to actually align a particular AGI system. Even if you can do it arbitrarily well.