All of gjm's Comments + Replies

The linked article is interesting, and also suggests that it's not as simple as

The good solution is to add more Black people to the training dataset.

because the issue isn't simply "our system sometimes misclassifies people as animals", it's "our system sometimes misclassifies people as animals, and one not-so-rare case of this happens to line up with an incredibly offensive old racist slur" -- and that last bit is a subtle fact about human affairs that there's no possible way the system could have learned from looking at labelled samples of images. The dat... (read more)

Thanks! (I would not have guessed correctly.)

It would add some possibly-useful context to this review if you explained why you came to it with an axe to grind. (Just as race is both possibly-useful information and a possible source of prejudice to correct for, so also with your prior prejudices about this book.)

Much of the dialogue about AI Safety I encounter in off-the-record conversations seems to me like it's not grounded in reality. I repeatedly hear (what I feel to be) a set of shaky arguments that both shut down conversation and are difficult to validate empirically. The shaky argument is as follows: 1. Machine learning is rapidly growing more powerful. If trends continue it will soon eclipse human performance. 2. Machine learning equals artificial intelligence equals world optimizer. 3. World optimizers can easily turn the universe into paperclips by accident. 4. Therefore we need to halt machine learning advancement until the abstract philosophical + mathematical puzzle of AI alignment is solved. I am not saying this line of reasoning is what AI researchers believe or that it's mainstream (among the rationality/alignment communities)―or even that it's wrong. The argument annoys me for the same reason a popular-yet-incoherent political platform annoys me; I have encountered badly-argued versions of the idea too many times. I agree with #1, though I quibble "absolute power" should be distinguished from "sample efficiency" [] as well as how we'll get to superintelligence. (I am bearish on applying the scaling hypothesis to existing architectures.) I agree with #3 in theory. Theory is often very different from practice. I disagree with #2 because it relies on the tautological equivalence of two definitions. I can imagine superintelligent machines that aren't world optimizers. [] Without #2 the argument falls apart. It might be easy to build a superintelligence but hard to build a world optimizer. I approached The Alignment Problem with the (incorrect) prior that it would be more vague abstract arguments untethered from technical reality. Instead, the book was dominated by ideas that

OK, I get it. (Or at least I think I do.) And, duh, indeed it turns out (as you were too polite to say in so many words) that I was distinctly confused.

So: Using ordinary conditionals in planning your actions commits you to reasoning like "If (here in the actual world it turns out that) I choose to smoke this cigarette, then that makes it more likely that I have the weird genetic anomaly that causes both desire-to-smoke and lung cancer, so I'm more likely to die prematurely and horribly of lung cancer, so I shouldn't smoke it", which makes wrong decisions.... (read more)

2Abram Demski2y
I'm glad I pointed out the difference between linguistic and DT counterfactuals, then! I'm not at all suggesting that we use proof-based DT in this way. It's just a model. I claim that it's a pretty good model -- that we can often carry over results to other, more complex, decision theories. However, if we wanted to, then yes, I think we could... I agree that if we add beliefs as axioms, the axioms have to be perfectly consistent. But if we use probabilistic beliefs, those probabilities don't have to be perfectly consistent; just the axioms saying which probabilities we have. So, for example, I could use a proof-based agent to approximate a logical-induction-based agent, by looking for proofs about what the market expectations are. This would be kind of convoluted, though.

I agree that much of what's problematic about the example I gave is that the "inner" counterfactuals are themselves unclear. I was thinking that this makes the nested counterfactual harder to make sense of (exactly because it's unclear what connection there might be between them) but on reflection I think you're right that this isn't really about counterfactual nesting and that if we picked other poorly-defined (non-counterfactual) propositions we'd get a similar effect: "If it were morally wrong to eat shellfish, would humans Really Truly Have Free Will?"... (read more)

2Abram Demski2y
All the various reasoning behind a decision could involve material conditionals, probabilistic conditionals, logical implication, linguistic conditionals (whatever those are), linguistic counterfactuals, decision-theoretic counterfactuals (if those are indeed different as I claim), etc etc etc. I'm not trying to make the broad claim that counterfactuals are somehow involved. The claim is about the decision algorithm itself. The claim is that the way we choose an action is by evaluating a counterfactual ("what happens if I take this action?"). Or, to be a little more psychologically realistic, the cashed values which determine which actions we take are estimated counterfactual values. What is the content of this claim? A decision procedure is going to have (cashed-or-calculated) value estimates which it uses to make decisions. (At least, most decision procedures work that way.) So the content of the claim is about the nature of these values. If the values act like Bayesian conditional expectations, then the claim that we need counterfactuals to make decisions is considered false. This is the claim of evidential decision theory (EDT). If the values are still well-defined for known-false actions, then they're counterfactual. So, a fundamental reason why MIRI-type decision theory uses counterfactuals is to deal with the case of known-false actions. However, academic decision theorists have used (causal) counterfactuals for completely different reasons (IE because they supposedly give better answers). This is the claim of causal decision theory (CDT). My claim in the post, of course, is that the estimated values used to make decisions should match the EDT expected values almost all of the time, but, should not be responsive to the same kinds of reasoning which the EDT values are responsive to, so should not actually be evidential. It sounds like you've kept a really strong assumption of EDT in your head; so strong that you couldn't even imagine why non-evidential

I never found Stalnaker's thesis at all plausible, not because I'd thought of the ingenious little calculation you give but because it just seems obviously wrong intuitively. But I suppose if you don't have any presuppositions about what sort of notion an implication is allowed to be, you don't get to reject it on those grounds. So I wasn't really entitled to say "Pr(A|B) is not the same thing as Pr(B=>A) for any particular notion of implication", since I hadn't thought of that calculation.

Anyway, I have just the same sense of obvious wrongness about th... (read more)

2Abram Demski2y
Yeah, interesting. I don't share your intuition that nested counterfactuals seem funny. The example you give doesn't seem ill-defined due to the nesting of counterfactuals. Rather, the antecedent doesn't seem very related to the consequent, which generally has a tendency to make counterfactuals ambiguous. If you ask "if calcium were always ionic, would Nixon have been elected president?" then I'm torn between three responses: 1. "No" because if we change chemistry, everything changes. 2. "Yes" because counterfactuals keep everything the same as much as possible, except what has to change; maybe we're imagining a world where history is largely the same, but some specific biochemistry is different. 3. "I don't know" because I am not sure what connection between the two you are trying to point at with the question, so, I don't know how to answer. In the case of your Bach example, I'm similarly torn. On the one hand, if we imagine some weird connection between the ages of Back and Mozart, we might have to change a lot of things. On the other hand, counterfactuals usually try to keep thing fixed if there's not a reason to change them. So the intention of the question seems pretty unclear. Which, in my mind, has little to do with the specific nested form of your question. More importantly, perhaps, I think Stalnaker and other philosophers can be said to be investigating linguistic counterfactuals; their chief concern is formalizing the way humans naively talk about things, in a way which gives more clarity but doesn't lose something important.  My chief concern is decision-theoretic counterfactuals, which are specifically being used to plan/act. This imposes different requirements. The philosophy of linguistic counterfactuals is complex, of course, but personally I really feel that I understand fairly well what linguistic counterfactuals are and how they work. My picture probably requires a little exposition to be comprehensible, but to state it as si

How confident are you that the "right" counterfactual primitive is something like your C(A|B) meaning (I take it) "if B were the case then A would be the case"?

The alternative I have in mind assimilates counterfactual conditionals to conditional probabilities rather than to logical implications, so in addition to your existing Pr(A|B)=... meaning "if B is the case, then here's how strongly I expect A to be the case" there's Prc(A|B)=... meaning "if B were the case -- even though that might require the world to be different from how it actually is -- then h... (read more)

2Abram Demski2y
Ah, I wasn't strongly differentiating between the two, and was actually leaning toward your proposal in my mind. The reason I was not differentiating between the two was that the probability of C(A|B) behaves a lot like the probabilistic value of Prc(A|B). I wasn't thinking of nearby-world semantics or anything like that (and would contrast my proposal with such a proposal), so I'm not sure whether the C(A|B) notation carries any important baggage beyond that. However, I admit it could be an important distinction; C(A|B) is itself a proposition, which can feature in larger compound sentences, whereas Prc(A|B) is not itself a proposition and cannot feature in larger compound sentences. I believe this is the real crux of your question; IE, I believe there aren't any other important consequences of the choice, besides whether we can build larger compound expressions out of our counterfactuals. Part of why I was not strongly differentiating the two was because I was fond of Stalnaker's Thesis, according to which P(A|B) can itself be regarded as the probability of some proposition, namely a nonstandard notion of implication (IE, not material conditional, but rather 'indicative conditional'). If this were the case, then we could safely pun between P(A->B) and P(B|A), where "->" is the nonstandard implication. Thus, I analogously would like for P(C(A|B)) to equal Prc(A|B). HOWEVER, Stalnaker's thesis is dead in philosophy, for the very good reason that it seemingly supports the chain of reasoning Pr(B|A) = Pr(A->B) = Pr(A->B|B)Pr(B) + Pr(A->B|~B)Pr(~B) = Pr(B|A&B)Pr(B) + Pr(B|A&~B)Pr(~B) = Pr(B). Some attempts to block this chain of reasoning (by rejecting bayes) have been made, but, it seems pretty damning overall. So, similarly, my idea that P(C(A|B))=Prc(A|B) is possibly deranged, too.

I'm not 100% sure I am understanding your terminology. What does it mean to "play stag against (stag,stag)" or to "defect against cooperate/cooperate"?

If your opponent is not in any sense a utility-maximizer then I don't think it makes sense to talk about your opponent's utilities, which means that it doesn't make sense to have a payout matrix denominated in utility, which means that we are not in the situation of my second paragraph above ("The meaning generally assumed in game theory...").

We might be in the situation of my last-but-two paragraph ("Or may... (read more)

2Alex Turner2y
Let πi(σ)=σ′i be player i's response function to strategy profile σ. Given some strategy profile (like stag/stag), player i selects a response. I mean "response" in terms of "best response []" - I don't necessarily mean that there's an iterated game. This captures all the relevant "outside details" for how decisions are made. I don't think I understand where this viewpoint is coming from. I'm not equating payoffs with VNM-utility [], and I don't think game theory usually does either - for example, the maxmin [] payoff solution concept does not involve VNM-rational expected utility maximization. I just identify payoffs with "how good is this outcome for the player", without also demanding that πi always select a best response. Maybe it's Boltzmann rational, or maybe it just always selects certain actions (regardless of their expected payouts). There exist two payoff functions. I think I want to know how impact-aligned [] one player is with another: how do the player's actual actions affect the other player (in terms of their numerical payoff values). I think (c) is closest to what I'm considering, but in terms of response functions - not actual iterated games.  Sorry, I'm guessing this probably still isn't clear, but this is the reply I have time to type right now and I figured I'd send it rather than nothing.

I think "X and Y are playing a game of stag hunt" has multiple meanings.

The meaning generally assumed in game theory when considering just a single game is that the outcomes in the game matrix are utilities. In that case, I completely agree with Dagon: if on some occasion you prefer to pick "hare" even though you know I will pick "stag", then we are not actually playing the stag hunt game. (Because part of what it means to be playing stag hunt rather than some other game is that we both consider (stag,stag) the best outcome.)

But there are some other situat... (read more)

2Alex Turner2y
Thanks for the thoughtful response. It seems to me like you're assuming that players must respond rationally, or else they're playing a different game, in some sense. But why? The stag hunt game is defined by a certain set of payoff inequalities holding in the game. Both players can consider (stag,stag) the best outcome, but that doesn't mean they have to play stag against (stag, stag). That requires further rationality assumptions (which I don't think are necessary in this case). If I'm playing against someone who always defects against cooperate/cooperate, versus against someone who always cooperates against cooperate/cooperate, am I "not playing iterated PD" in one of those cases?

Inappropriately highbrow proof of #4 (2d Sperner's lemma):

This proves a generalization: any number of dimensions, and any triangulation of the simplex in question. So, the setup is as follows. We have an n-dimensional simplex, defined by n+1 points in n-dimensional space. We colour the vertices with n+1 different colours. Then we triangulate it -- chop it up into smaller simplexes -- and we extend our colouring somehow in such a way that the vertices on any face (note: a face is the thing spanned by any subset of the vertices) of the big simplex are c... (read more)

This doesn't (I think) really have much to do with randomness as such. The relevant thing about R is that it's shared information that a hypothetical adversary doesn't get to see.

If isn't chosen adversarially, then our players don't care about pessimizing over but about something like an average over , and then R isn't needed. Or, if they are ultra-cautious people who universally care about worst cases, then they don't care about expectation w.r.t. R but about the worst case as R varies, and then R doesn't he... (read more)

3Paul Christiano5y
Sometimes you want to prove a theorem like "The algorithm works well." You generally need randomness if you want to find algorithms that work without strong assumptions on the environment, whether or not there is really an adversary (who knows what kinds of correlations exist in the environment, whether or not you call them an "adversary"). A bayesian might not like this, because they'd prefer prove theorems like "The algorithm works well on average for a random environment drawn from the prior the agents use," for which randomness is never useful. But specifying the true prior is generally hideously intractable. So a slightly more wise Bayesian might want to prove statements like "The algorithm well on average for a random environment drawn from the real prior" where the "real prior" is some object that we can talk about but have no explicit access to. And now the wiser Bayesian is back to needing randomness.
2Jessica Taylor5y
This seems basically right. As discussed in the conclusion, there are reasons to care about worst-case performance other than literal adversaries.