I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality.
Longer bio: www.lesswrong.com/posts/aG74jJkiPccqdkK3c/the-lesswrong-team-page-under-construction#Ben_Pace___Benito
I've been thinking lately that picturing an AI catastrophe is helped a great deal by visualising a world where critical systems in society are performed by software. I was spending a while trying to summarise and analyse Paul's "What Failure Looks Like", which lead me this way. I think that properly imagining such a world is immediately scary, because software can deal with edge cases badly, like automated market traders causing major crashes, so that's already a big deal. Then you add ML in, and can talk about how crazy it is to hand critical systems over to code we do not understand and cannot make simple adjustments to, then you're already hitting catastrophes. Once you then argue that ML can become superintelligent then everything goes from "global catastrophe" to "obvious end of the world", but the first steps are already pretty helpful.
While Paul's post helps a lot, it still takes a fair bit of effort for me to concretely visualise the scenarios he describes, and I would be excited for people to take the time to detail what it would look like to hand critical systems over to software – for which systems would this happen, why would we do it, who would be the decision-makers, what would it feel like from the average citizen's vantage point, etc. A smaller version of Hanson's Age of Em project, just asking the question "Which core functions in society (food, housing, healthcare, law enforcement, governance, etc) are amenable to tech companies building solutions for, and what would it look like for society to transition to 1%, 10%, 50% and 90% of core functions to be automated with 1) human-coded software 2) machine learning 3) human-level general AI?"
For now I'm going to assume that COVID was a "natural" pandemic and didn't e.g. escape from a Chinese lab where it was being studied. If actually COVID was a failure of biosecurity, that would be a significant update for me.
If your epistemic state is "I'm very confident (90%+) that this was a natural pandemic" then you can make some money betting on this.
I fixed it. In our editor, use cmd-4/ctrl-4 to do LaTex, not dollar signs. (The thing you did would work in the markdown editor – you can go into settings to change to that editor if you'd like.)
Ah, I see, that makes sense.
If I think about how good the consequences of an action are, I try to think about what I expect to happen if I take that action (ie the outcome), and I think about how likely that outcome is to have various properties that I care about, since I don't know exactly what the outcome will be with certainty... I need to consider how good and how likely various consequences are, and take the expectation of the 'how good' with respect to the 'how likely'.
I don't understand JB yet, but when I introspected just now, my experience of decision-making doesn't have any separation between beliefs and values, so I think I disagree with the above. I'll try to explain why by describing my experience. (Note: Long comment below is just saying one very simple thing. Sorry for length. There's a one-line tl;dr at the end.)
Right now I'm considering doing three different things. I can go and play a videogame that my friend suggested we play together, I can do some LW work with my colleague, or I can go play some guitar/piano. I feel like the videogame isn't very fun right now because I think the one my friend suggested not that interesting of a shared experience. I feel like the work is fun because I'm excited about publishing the results of the work, and the work itself involves a kind of cognition I enjoy. And playing piano is fun because I've been skilling up a lot lately and I'm going to do accompany some of my housemates in some hamilton songs.
Now, I know some likely ways that what seems valuable to me might change. There are other videogames I've played lately that have been really fascinating and rewarding to play together, that involve problem solving where 2 people can be creative together. I can imagine the work turning out to not actuallybe the fun part but the boring parts. I can imagine that I've found no traction (skill-up) in playing piano, or that we're going to use a recorded soundtrack rather than my playing for the songs we're learning.
All of these to me feel like updates in my understanding of what events are reachable to me; this doesn't feel like changing my utility evaluation of the events. The event of "play videogame while friend watches bored" could change to "play videogame while creatively problem-solving with friend". The event of "gain skill in piano and then later perform songs well with friends" could change to "struggle to do something difficult and sound bad and that's it".
If I think about changing my utility function, I expect that would feel more like... well, I'm not sure. My straw version is "I creatively solve problems with my friend on a videogame, but somehow that's objectively bad so I will not do it". That's where some variable in the utility function changed while all the rest of the facts about my psychology and reality stay the same. This doesn't feel to me like my regular experience of decision-making.
But, maybe that's not the idea. The idea is like if I had some neurological change, perhaps I become more of a sociopath and stop feeling empathy and everyone just feels like objects to me rather than alive. Then a bunch of the social experiences above would change, they'd lose any experience of things like vicarious enjoyment and pleasure of bonding with friends. Perhaps that's what VNM is talking about in my experience.
I think that some of the standard "updates to my ethics / utility function" ideas that people discuss often don't feel like this to me. Like, some people say that reflecting onf population ethics leads them to change their utility function and start to care about the far future. That's not my experience – for me it's been things like the times in HPMOR when Harry thinks about civilizations of the future, what they'll be like/think, and how awesome they can be. It feels real to me, like a reachable state, and this is what has changed a lot of my behaviour, in contrast with changing some variable in a function of world-states that's independent from my understanding of what events are achievable.
To be clear, sometimes I describe my experience more like the sociopath example, where my fundamental interests/values change. I say things like "I don't enjoy videogames as much as I used to" or "These days I value honesty and reliability a lot more than politeness", and there is a sense there where I now experience the same events very differently. "I had a positive meeting with John" might now be "I feel like he was being evasive about the topic we were discussing". The things that are salient to me change. And I think that the language of "my values have changed" is often an effective one for communicating that – even if my experience does not match beliefs|utility, any sufficiently coherent agent can be described this way, and it is often easy to help others model me by describing my values as having changed.
But I think my internal experience is more that I made substantial updates about what events I'm moving towards, and the event "We had a pleasant interaction which will lead to use working effectively together" has changed to "We were not able to say the possibly unwelcome facts of the matter, which will lead to a world where we don't work effectively together". So internally it feels like an update about what events are reachable, even though someone from the outside who doesn't understand my internal experience might more naturally say "It seems like Ben is treating the same event differently now, so I'll model him as having changed his values".
tl;dr: While I often talk separately about what actions I/you/we could take and how valuable those actions are are, internally when when I'm 'evaluating' the actions, I'm just trying to visualise what they are, and there is no second step of running my utility function on those visualisations.
As I say, I'm not sure I understand JB, so perhaps this is also inconsistent with it. I just read your comment and noticed it didn't match my own introspective experience, so I thought I'd share my experience.
I've curated this. This seems to me like an important conceptual step in understanding agency, the subjective view is very interesting and surprising to me. This has been written up very clearly and well, I expect people to link back to this post quite a lot, and I'm really excited to read more posts on this. Thanks a lot Abram.
Thinking more, I think there are good arguments for taking actions that as a by-product induce anthropic uncertainty; these are the standard hansonian situation where you build lots of ems of yourself to do bits of work then turn them off.
But I still don't agree with the people in the situation you describe because they're optimising over their own epistemic state, I think they're morally wrong to do that. I'm totally fine with a law requiring future governments to rebuild you / an em of you and give you a nice life (perhaps as a trade for working harder today to ensure that the future world exists), but that's conceptually analogous to extending your life, and doesn't require causing you to believe false things. You know you'll be turned off and then later a copy of you will be turned on, there's no anthropic uncertainty, you're just going to get lots of valuable stuff.
I just don’t think it’s a good decision to make, regardless of the math. If I’m nearing the end of the universe, I prefer to spend all my compute instead maximising fun / searching for a way out. Trying to run simulations to make it so I no longer know if I’m about to die seems like a dumb use of compute. I can bear the thought of dying dude, there’s better uses of that compute. You’re not saving yourself, you’re just intentionally making yourself confused because you’re uncomfortable with the thought of death.
Another big reason why (a version of it) makes sense is that the simulation is designed for the purpose of inducing anthropic uncertainty in someone at some later time in the simulation. e.g. if the point of the simulation is to make our AGI worry that it is in a simulation, and manipulate it via probable environment hacking, then the simulation will be accurate and lawful (i.e. un-tampered-with) until AGI is created.
Ugh, anthropic warfare, feels so ugly and scary. I hope we never face that sh*t.
I don't buy that it makes sense to induce anthropic uncertainty. It makes sense to spend all of your compute to run emulations that are having awesome lives, but it doesn't make sense to cause yourself to believe false things.