Counterfactability

[-]Alex Flint3y50

If I understand you correctly, the reason that this notion of counterfactable connects with what we normally call a counterfactual is that when an event screens of its own history, it's easy to consider other "values" of the "variable" underlying that event without coming into any logical contradictions with other events ("values of other variables") that we're holding fixed.

For example if I try to consider what would have happened if there had been a snow storm in Vermont last night, while holding fixed the particular weather patterns observed in Vermont and surrounding areas on the preceding day, then I'm in kind of a tricky spot, because on the one hand I'm considering the weather patterns from the previous day as fixed (which did not in fact give rise to a snow storm in Vermont last night), and yet I'm also trying to "consider" a snow storm in Vermont. The closer I look into this the more confused I'm going to get, and in the end I'll find that this notion of "consider a snow storm took place in Vermont last night" is a bit ill-defined.

What I would like to say is: let's consider a snow storm in Vermont last night; in order to do that let's forget everything that would mess with that consideration.

My question for you is: in the world we live in, the full causal history of any real event contains almost the whole history of Earth from the time of the event backwards, because the Earth is so small relative to the speed of light, and everything that could have interacted with the event is part of the history of that event. So in practice, won't all counterfactable events need to be a more-or-less a full specification of the whole state of the world at a certain point in time?

[-]Scott Garrabrant3y20

Yeah, remember the above is all for updateless agents, which are already computationally intractable. For updateful agents, we will want to talk about conditional counterfactability. For example, if you and I are in a prisoners dilemma, we could would conditional on all the stuff that happened prior to us being put in separate cells, and given this condition, the histories are much smaller.

Also, we could do all of our reasoning up to a high level world model that makes histories more reasonably sized.

Also, if we could think of counterfactability as a spectrum. Some events are especially hard to reason about, because there are lots of different ways we could have done it, and we can selectively add details to make it more and more counterfactable, meaning it approximately screens off its history from that which you care about.

[-]Alex Flint3y40

Regarding your point on ELK: to make the output of the opaque machine learning system counterfactable, wouldn't it be sufficient to include the whole program trace? Program trace means the results of all the intermediate computations computed along the way. Yet including a program trace wouldn't help us much if we don't know what function of that program trace will tell us, for example, whether the machine learning system is deliberately deceiving us.

So yes it's necessary to have an information set that includes the relevant information, but isn't the main part of the (ELK) problem to determine what function of that information corresponds to the particular latent variable that we're looking for?

[-]Scott Garrabrant3y20

I agree, this is why I said I am being sloppy with conflating the output and our understanding of the output. We want our understanding of the output to screen off the history.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

17

17

Nontechnical Summary

Counterfactability

Extending Beyond the Counterfactable

Defining Decision Theories

Evidential Counterfactuals

Causal Counterfactuals

CDT=EDT (for Counterfactable Events)

Other Counterfactuals

Counterfactability and ELK