A few thoughts:
So economic progress may not accurately represent technological progress, meaning that if we use this framing we may get caught up in a bunch of economic debates instead of debates about capacity.
Yeah, sorry, that's a typo, fixed now.
Hey Vojta, thanks so much for your thoughts.
I feel slightly worried about going too deep into discussions along the lines of "Vojta reacts to Chris' claims about what other LW people argue against hypothetical 1-boxing CDT researchers from classical academia that they haven't met" :D.
Fair enough. Especially since this post isn't so much about the way people currently frame their arguments but attempt to persuade people to reframe the discussion around comparability.
My take on how to do counterfactuals correctly is that this is not a property of the world, but of your mental models
I feel similarly. I've explained my reasons for believing this in the Co-operation Game, Counterfactuals are an Answer, not a Question and Counterfactuals as a matter of Social Convention.
According to this view, counterfactuals only make sense if your model contains uncertainty...
I would frame this slightly differently and say that this is the paradigmatic case which forms the basis of our initial definition. I think the example of numbers can be constructive here. The first numbers to be defined are the counting numbers: 1, 2, 3, 4... It is then convenient to add fractions, then zero, then negative numbers and eventually we extend to the complex numbers. In each case we've slightly shifted the definition of what a number is and this choice is solely determined by convention. Of course, convention isn't arbitrary, but determined by what is natural.
Similarly, the cases where there is actual uncertainty provides the initial domain over which we define counterfactuals. And we can then try to extend this as you are doing above. I see this as a very promising approach.
A lot of what you are saying there aligns with my most recent research direction (Counterfactuals as a matter of Social Convention), although it's unfortunately stalled with coronavirus and my focus being mostly on attempting to write up my ideas from the AI safety program. There seem to be a bunch of properties that make a situation more or less likely to be accepted by humans as a valid counterfactual. I think it would be viable to identify the main factors, with the actual weighting being decided by each human. This would acknowledge both the subjective, constructed nature of counterfactuals, but also the objective elements with real implications that doesn't make this a completely arbitrary choice. I would be keen to discuss further/bounce ideas of each other if you'd be up for it.
Finally, when some counterfactual would be inconsistent with our model, we might take it for granted that we are supposed to relax M in some manner
This sounds very similar to the erasure approach I was previously promoting, but have shifted away from. Basically, I when I started thinking about it, I realised that only allowing counterfactuals to be constructed by erasing information didn't match how humans actually use counterfactuals.
Second, when doing counterfactuals, we might take it for granted that you are to replace the actual observation history o by some alternative o′
This is much more relevant to how I think now.
I think that "a typical AF reader" uses a model in which "a typical CDT adherent" can deliberate, come to the one-boxing conclusion, and find 1M in the box, making the options comparable for "typical AF readers". I think that "a typical CDT adherent" uses a model in which "CDT adherents" find the box empty while one-boxers find it full, thus making the options incomparable
I think that's an accurate framing of where they are coming from.
The third question I didn't understand.
What was unclear? I made one typo where I said an EDT agent would smoke when I meant they wouldn't smoke. Is it clearer now?
Ah, I think I now get where you are coming from
I guess what is confusing me is that you seem to have provided a reason why we shouldn't just care about high-level functional behaviour (because this might miss correlations between the low-level components), then in the next sentence you're acting as though this is irrelevant?
"First and foremost: why do we care about validity of queries on correlations between the low-level internal structures of the two agent-instances? Isn’t the functional behavior all that’s relevant to the outcome? Why care about anything irrelevant to the outcome?" - I don't follow what you are saying here
Yes (see thread with Abram Demski).
Hmm, yeah this could be a viable theory. Anyway to summarise the argument I make in Is Backwards Causation Necessarily Absurd?, I point out that since physics is pretty much reversible, instead of A causing B, it seems as though we could also imagine B causing A and time going backwards. In this view, it would be reasonable to say that one-boxing (backwards-)caused the box to be full in Newcombs. I only sketched the theory because I don't have enough physics knowledge to evaluate it. But the point is that we can give justification for a non-standard model of causality.
So my usage (of free will) seems pretty standard.
Not quite. The way you are using it doesn't necessarily imply real control, it may be imaginary control.
All word definitions are determined in large part by social convention
True. Maybe I should clarify what I'm suggesting. My current theory is that there are multiple reasonable definitions of counterfactual and it comes down to social norms as to what we accept as a valid counterfactual. However, it is still very much a work in progress, so I wouldn't be able to provide more than vague details.
The "possible worlds" represented by this uncertainty may be logically inconsistent, in ways the agent can't determine before making the decision.
I guess my point was that this notion of counterfactual isn't strictly a material conditional due to the principle of explosion. It's a "para-consistent material conditional" by which I mean the algorithm is limited in such a way as to prevent this explosion.
Policy-dependent source code does this; one's source code depends on one's policy.
Hmm... good point. However, were you flowing this all the way back in time? Such as if you change someone's source code, you'd also have to change the person who programmed them.
I think UDT makes sense in "dualistic" decision problems'\
What do you mean by dualistic?
I found parts of your framing quite original and I'm still trying to understand all the consequences.
Firstly, I'm also opposed to characterising the problem in terms of logical counterfactuals. I've argued before that Counterfactuals are an Answer Not a Question, although maybe it would have been clearer to say that they are a Tool Not a Question instead. If we're talking strictly, it doesn't make sense to ask what maths would. be like if 1+1=3 as it doesn't, but we can construct a para-consistent logic where it makes sense to do something analogous to pretending 1+1=3. And so maybe one form of "logical counterfactual" could be useful for solving these problems, but that doesn't mean asking what logical counterfactuals are, as though they were ontologically basic, as though they were in the map not the territory, as though they were a single unified concept, makes sense.
Secondly, "free will" is such a loaded word that using it in a non-standard fashion simply obscures and confuses the discussion. Nonetheless, I think you are touching upon an important point here. I have a framing which I believe helps clarify the situation. If there's only one possible decision, this gives us a Trivial Decision Problem. So to have a non-trivial decision problem, we'd need a model containing at least two decisions. If we actually did have libertarian free will, then our decision problems would always be non-trivial. However, in the absence of this, the only way to avoid triviality would be to augment the factual with at least one counterfactual.
Counterfactual non-realism: Hmm... I see how this could be a useful concept, but the definition given feels a bit vague. For example, recently I've been arguing in favour of what counts as a valid counterfactual being at least partially a matter of social convention. Is that counterfactual non-realism?
Further, it seems a bit strange to associate material conditions with counterfactual non-realism. Material conditions only provide the outcome when we have a consistent counterfactual. So, either a) we believe in libertarian free will b) we use something like the erasure approach to remove information such that we have multiple consistent possibilities (see https://www.lesswrong.com/posts/BRuWm4GxcTNPn4XDX/deconfusing-logical-counterfactuals). Proof-based UDT doesn't quite use material conditionals, it uses a paraconsistent version of them instead. Although, maybe I'm just being too pedantic here. In any case, we can find ways of making paraconsistent logic behave as expected in any scenario, however it would require a seperate ground. That is, it isn't enough that the logic merely seems to work, but we should be able to provide a separate reason for why using a paraconsistent logic in that way is good.
Also, another approach which kind of aligns with counterfactual non-realism is to say that given the state of the universe at any particular time we can determine the past and future and that there are no counterfactuals beyond those we generate by imagining state Y at time T instead of state X. So, to imagine counterfactually taking action Y we replace the agent doing X with another agent doing Y and flow causation both forwards and backwards. (See this post for more detail). It could be argued that these count as counterfactuals, but I'd align it with counterfactual non-realism as it doesn't have decision counterfactual as seperate ontological elements.
Policy-dependent source code - this is actually a pretty interesting framing. I've always defaulted to thinking about counterfactuals in terms of actions, but when we're talking about things in terms of problem's like Counterfactual Mugging, characterising counterfactuals in terms of policy might be more natural. It's strange why this feels fresh to me - I mean UDT takes this approach - but I never considered the possibility of non-UDT policy counterfactuals. I guess from a philosophical perspective it makes sense to first consider whether policy-dependent source code makes sense and then if it does further ask whether UDT makes sense.
I would be keen run a webinar on Logical Counterfactuals