I know very little about this area, but I suspect that a writeup like this classic explanation of Godel Incompleteness might be a step in the right direction: Godel incompleteness.
I meant this:
Shard Question: How does the human brain ensure alignment with its values, and how can we use that information to ensure the alignment of an AI with its designers' values?
which does indeed beg the question in the standard meaning of it.
My point is that there is very much no alignment between different values! They are independent at best and contradictory in many cases. There is an illusion of coherent values that is a rationalization. The difference in values sometimes leads to catastrophic Fantasia-like outcomes on the margins (e.g. people with addiction don't want to be on drugs but are), but most of the time it results in a mild akrasia (I am writing this instead of doing something that makes me money). This seems like a good analogy: http://max.mmlc.northwestern.edu/mdenner/Demo/texts/swan_pike_crawfish.htm
That seems like a useful decomposition! Point 2 seems to beg the question, why does it assume that the brain can "ensure alignment with its values", as opposed to, say, synthesizes an illusion of values by aggregating data from various shards?
Just a small remark
Open a blank google doc, set a one hour timer, and start writing out your case for why AI Safety is the most important problem to work on
Not "why", but "whether" is the first step. Otherwise you end up being a clever arguer.
Value extrapolation is thus necessary for AI alignment. It is also almost sufficient, since it allows AIs to draw correct conclusions from imperfectly defined human data.
I am missing something... The idea of correctly extrapolating human values is basically the definition of the Eliezer's original proposal, CEV. In fact, it's right there in the name. What is the progress over the last decade?
I'm confused... What you call the "Pure Reality" view seems to work just fine, no? (I think you had a different name for it, pure counterfactuals or something.) What do you need counterfactuals/Augmented Reality for? Presumably making decisions thanks to "having a choice" in this framework, right? In the pure reality framework the "student and the test" example one would dispassionately calculate what kind of a student algorithm passes the test, without talking about making a decision to study or not to study. Same with the Newcomb's, of course, one just looks at what kind of agents end up with a given payoff. So... why pick an AR view over the PR view, what's the benefit?
First, I really like this shift in thinking, partly because it moves the needle toward an anti-realist position, where you don't even need to postulate an external world (you probably don't see it that way, despite saying "Everything is a subjective preference evaluation").
Second, I wonder if you need an even stronger restriction, not just computable, but efficiently computable, given that it's the agent that is doing the computation, not some theoretical AIXI. This would probably also change "too easily" in "those expectations aren't (too easily) exploitable to Dutch-book." to efficiently. Maybe it should be even more restrictive to avoid diminishing returns trying to squeeze every last bit of utility by spending a lot of compute.
Feel free to let me know either way, even if you find that the posts seem totally wrong or missing the point.
My answer is a rather standard compatibilist one, the algorithm in your brain produces the sensation of free will as an artifact of an optimization process.
There is nothing you can do about it (you are executing an algorithm, after all), but your subjective perception of free will may change as you interact with other algorithms, like me or Jessica or whoever. There aren't really any objective intentional "decisions", only our perception of them. Therefore there the decision theories are just byproducts of all these algorithms executing. It doesn't matter though, because you have no choice but to feel that decision theories are important.
So, watch the world unfold before your eyes, and enjoy the illusion of making decisions.
I wrote about this over the last few years:
Well written. Do you have a few examples of pivoting when it becomes apparent that the daily grind no longer optimizes for solving the problem?