# All of Diffractor's Comments + Replies

Stuart_Armstrong's Shortform

Sounds like a special case of crisp infradistributions (ie, all partial probability distributions have a unique associated crisp infradistribution)

Given some , we can consider the (nonempty) set of probability distributions equal to  where  is defined. This set is convex (clearly, a mixture of two probability distributions which agree with  about the probability of an event will also agree with  about the probability of an event).

Introduction To The Infra-Bayesianism Sequence

You're completely right that hypotheses with unconstrained Murphy get ignored because you're doomed no matter what you do, so you might as well optimize for just the other hypotheses where what you do matters. Your "-1,000,000 vs -999,999 is the same sort of problem as 0 vs 1" reasoning is good.

Again, you are making the serious mistake of trying to think about Murphy verbally, rather than thinking of Murphy as the personification of the "inf" part of the  definition of expected value, and writing actual equations. &nb... (read more)

Introduction To The Infra-Bayesianism Sequence

There's actually an upcoming post going into more detail on what the deal is with pseudocausal and acausal belief functions, among several other things, I can send you a draft if you want. "Belief Functions and Decision Theory" is a post that hasn't held up nearly as well to time as "Basic Inframeasure Theory".

1DanielFilan17dThanks for the offer, but I don't think I have room for that right now.
Introduction To The Infra-Bayesianism Sequence

If you use the Anti-Nirvana trick, your agent just goes "nothing matters at all, the foe will mispredict and I'll get -infinity reward" and rolls over and cries since all policies are optimal. Don't do that one, it's a bad idea.

For the concave expectation functionals: Well, there's another constraint or two, like monotonicity, but yeah, LF duality basically says that you can turn any (monotone) concave expectation functional into an inframeasure. Ie, all risk aversion can be interpreted as having radical uncertainty over some aspects of how the environment... (read more)

2Rohin Shah18dSorry, I meant the combination of best-case reasoning (sup instead of inf) and the anti-Nirvana trick. In that case the agent goes "Murphy won't mispredict, since then I'd get -infinity reward which can't be the best that I do". Hmm, that makes sense, I think? Perhaps I just haven't really internalized the learning aspect of all of this.
Introduction To The Infra-Bayesianism Sequence

Maximin, actually. You're maximizing your worst-case result.

It's probably worth mentioning that "Murphy" isn't an actual foe where it makes sense to talk about destroying resources lest Murphy use them, it's just a personification of the fact that we have a set of options, any of which could be picked, and we want to get the highest lower bound on utility we can for that set of options, so we assume we're playing against an adversary with perfectly opposite utility function for intuition. For that last paragraph, translating it back out from the "Murphy" t... (read more)

0awenonian11dI'm glad to hear that the question of what hypotheses produce actionable behavior is on people's minds. I modeled Murphy as an actual agent, because I figured a hypothesis like "A cloaked superintelligence is operating the area that will react to your decision to do X by doing Y" is always on the table, and is basically a template for allowing Murphy to perform arbitrary action Y. I feel like I didn't quite grasp what you meant by "a constraint on Murphy is picked according to this probability distribution/prior, then Murphy chooses from the available options of the hypothesis they picked" But based on your explanation after, it sounds like you essentially ignore hypotheses that don't constrain Murphy, because they act as an expected utility drop on all states, so it just means you're comparing -1,000,000 and -999,999, instead of 0 and 1. For example, there's a whole host of hypotheses of the form "A cloaked superintelligence converts all local usable energy into a hellscape if you do X", and since that's a possibility for every X, no action X is graded lower than the others by its existence. That example is what got me thinking, in the first place, though. Such hypotheses don't lower everything equally, because, given other Laws of Physics, the superintelligence would need energy to hell-ify things. So arbitrarily consuming energy would reduce how bad the outcomes could be if a perfectly misaligned superintelligence was operating in the area. And, given that I am positing it as a perfectly misaligned superintelligence, we should both expect it to exist in the environment Murphy chooses (what could be worse?) and expect any reduction of its actions to be as positive of changes as a perfectly aligned superintelligence's actions could be, since preventing a maximally detrimental action should match, in terms of Utility, enabling a maximally beneficial action. Therefore, entropy-bombs. Thinking about it more, assuming I'm not still making a mistake, this might jus
Belief Functions And Decision Theory

So, first off, I should probably say that a lot of the formalism overhead involved in this post in particular feels like the sort of thing that will get a whole lot more elegant as we work more things out, but "Basic inframeasure theory" still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up.

Yes, your current understanding is correct, it's rebuilding probability theory in more generality to be sui... (read more)

2Alex Flint3moAh this is helpful, thank you. So let's say I'm estimating the position of a train on a straight section of track as a single real number and I want to do an update each time I receive a noisy measurement of the train's position. Under the theory you're laying out here I might have, say, three Gaussians N(0, 1), N(1, 10), N(4, 6), and rather than updating a single pdf over the position of the train, I'm updating measures associated with each of these three pdf. Is that roughly correct? (I realize this isn't exactly a great example of how to use this theory since train positions are perfectly realizable, but I just wanted to start somewhere familiar to me.) Do you by chance have any worked examples where you go through the update procedure for some concrete prior and observation? If not, do you have any suggestions for what would be a good toy problem where I could work through an update at a very concrete level?
Less Basic Inframeasure Theory

So, we've also got an analogue of KL-divergence for crisp infradistributions.

We'll be using  and  for crisp infradistributions, and  and  for probability distributions associated with them.  will be used for the KL-divergence of infradistributions, and  will be used for the KL-divergence of probability distributions. For crisp infradistributions, the KL-divergence is defined as

I'm not entirely sure why it's like this, but it has the basic properties yo... (read more)

John_Maxwell's Shortform

Potential counterargument: Second-strike capabilities are still relevant in the interstellar setting. You could build a bunch of hidden ships in the oort cloud to ram the foe and do equal devastation if the other party does it first, deterring a first strike even with tensions and an absence of communication. Further, while the "ram with high-relativistic objects" idea works pretty well for preemptively ending a civilization confined to a handful of planets, AI's would be able to colonize a bunch of little asteroids and KBO's and comets in the oort cloud, and the higher level of dispersal would lead to preemptive total elimination being less viable.

1John Maxwell5moThat's possible, but I'm guessing that it's not hard for a superintelligent AI to suddenly swallow an entire system using something like gray goo.
Introduction to Cartesian Frames

I will be hosting a readthrough of this sequence on MIRIxDiscord again, PM for a link.

Needed: AI infohazard policy

So, here's some considerations (not an actual policy)

It's instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.

First, a chunk from Wikipedia

Most of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine ar

2Ofer Givoli7moPublishing under a pseudonym may end up being counterproductive due to the Streisand effect. Identities behind many pseudonyms may suddenly be publicly revealed following a publication on some novel method for detecting similarities in writing style between texts.
1Vanessa Kosoy7moRegarding making a policy ahead of time, I think we can have an evolving model of what ingredients are missing to get transformative AI, and some rule of thumb that says how dangerous your result is, given how much progress it makes towards each ingredient (relevant but clearly insufficient < might or might not be sufficient < plausibly a full solution), how concrete/actionable it is (abstract idea < impractical method < practical method) and how original/surprising it is (synthesis of ideas in the field < improvement on idea in the field < application of idea outside the field < completely out of the blue). One problem is, the model itself might be an infohazard. This consideration pushes towards making the guidelines secret in themselves, but that would make it much harder to debate and disseminate them. Also, the new result might have major implications for the model. So, yes, certainly there is no replacement for the inside view, but I still feel that we can have guidelines that help focusing on the right considerations.
0Davidmanheim7moOpenAI's phased release of GPT2 seems like a clear example of exactly this. And there is a forthcoming paper looking at the internal deliberations around this from Toby Shevlane [https://www.law.ox.ac.uk/people/toby-shevlane], in addition to his extant work [https://dl.acm.org/doi/10.1145/3375627.3375815] on the question of how disclosure potentially affects misuse.
Introduction To The Infra-Bayesianism Sequence

Maximin over outcomes would lead to the agent devoting all its efforts towards avoiding the worst outcomes, sacrificing overall utility, while maximin over expected value pushes towards policies that do acceptably on average in all of the environments that it may find itself in.

Regarding "why listen to past me", I guess to answer this question I'd need to ask about your intuitions on Counterfactual mugging. What would you do if it's one-shot? What would you do if it's repeated? If you were told about the problem beforehand, would you pay money for a commitment mechanism to make future-you pay up the money if asked? (for +EV)

Basic Inframeasure Theory

Yeah, looking back, I should probably fix the m- part and have the signs being consistent with the usual usage where it's a measure minus another one, instead of the addition of two signed measures, one a measure and one a negative measure. May be a bit of a pain to fix, though, the proof pages are extremely laggy to edit.

Wikipedia's definition can be matched up with our definition by fixing a partial order where  iff there's a  that's a sa-measure s.t. , and this generalizes to any closed c... (read more)

Basic Inframeasure Theory

We go to the trouble of sa-measures because it's possible to add a sa-measure to an a-measure, and get another a-measure where the expectation values of all the functions went up, while the new a-measure we landed at would be impossible to make by adding an a-measure to an a-measure.

Basically, we've gotta use sa-measures for a clean formulation of "we added all the points we possibly could to this set", getting the canonical set in your equivalence class.

Admittedly, you could intersect with the cone of a-measures again at the end (as we do in the next post... (read more)

(A -> B) -> A

I found a paper about this exact sort of thing. Escardo and Olivia call that type signature a "selection functional", and the type signature is called a "quantification functional", and there's several interesting things you can do with them, like combining multiple selection functionals into one in a way that looks reminiscent of game theory. (ie, if has type signature , and has type signature , then has type signature ... (read more)

Counterfactual Induction

Oh, I see what the issue is. Propositional tautology given means , not . So yeah, when A is a boolean that is equivalent to via boolean logic alone, we can't use that A for the exact reason you said, but if A isn't equivalent to via boolean logic alone (although it may be possible to infer by other means), then the denominator isn't necessarily small.

Counterfactual Induction

Yup, a monoid, because and , so it acts as an identitity element, and we don't care about the order. Nice catch.

You're also correct about what propositional tautology given A means.

1Gurkenglas1yThen that minimum does not make a good denominator because it's always extremely small. It will pick phi to be as powerful as possible to make L small, aka set phi to bottom. (If the denominator before that version is defined at all, bottom is a propositional tautology given A.)
Dutch-Booking CDT

(lightly edited restatement of email comment)

Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".

There will be 0 utili... (read more)

1Abram Demski2y(lightly edited version of my original email reply to above comment; note that Diffractor was originally replying to a version of the Dutch-book which didn't yet call out the fact that it required an assumption of nonzero probability on actions.) I agree that this Dutch-book argument won't touch probability zero actions, but my thinking is that it really should apply in general to actions whose probability is bounded away from zero (in some fairly broad setting). I'm happy to require an epsilon-exploration assumption to get the conclusion. Your thought experiment raises the issue of how to ensure in general that adding bets to a decision problem doesn't change the decisions made. One thought I had was to make the bets always smaller than the difference in utilities. Perhaps smaller Dutch-books are in some sense less concerning, but as long as they don't vanish to infinitesimal, seems legit. A bet that's desirable at one scale is desirable at another. But scaling down bets may not suffice in general. Perhaps a bet-balancing scheme to ensure that nothing changes the comparative desirability of actions as the decision is made? For your cosmic ray problem, what about: You didn't specify the probability of a cosmic ray. I suppose it should have probability higher than the probability of exploration. Let's say 1/million for cosmic ray, 1/billion for exploration. Before the agent makes the decision, it can be given the option to lose .01 util if it goes right, in exchange for +.02 utils if it goes right & cosmic ray. This will be accepted (by either a CDT agent or EDT agent), because it is worth approximately +.01 util conditioned on going right, since cosmic ray is almost certain in that case. Then, while making the decision, cosmic ray conditioned on going right looks very unlikely in terms of CDT's causal expectations. We give the agent the option of getting .001 util if it goes right, if it also agrees to lose .02 conditioned on going right & cosmic ray. CDT agr
Cooperative Oracles

It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that more changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality.

The definition of an almost stratified Pareto optimum was adapted from this , and was... (read more)

1Vanessa Kosoy2y"the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality." Why? Condition 3 implies that U_{RO,j} \leq U_{RO',j}. So, together with condition 2, we get that U_{RO,j} \leq U_{RO',j} for any j. That precisely means that this is a Pareto improvement in the usual sense.
Beliefs at different timescales

My initial inclination is to introduce as the space of events on turn , and define and then you can express it as .

Beliefs at different timescales

The notation for the sum operator is unclear. I'd advise writing the sum as and using an subscript inside the sum so it's clearer what is being substituted where.

1Nisan2yThe sum isn't over i, though, it's over all possible tuples of length n−1. Any ideas for how to make that more clear?
Asymptotic Decision Theory (Improved Writeup)

Wasn't there a fairness/continuity condition in the original ADT paper that if there were two "agents" that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if , then ) This would mean that it'd be impossible to have be low while is high, so the argument still goes through.

Although, after this whole line of discussion, I'm realizing that there are enough substantial differences between the ori... (read more)

1Jessica Taylor2yYes, the continuity condition on embedders in the ADT paper would eliminate the embedder I meant. Which means the answer might depend on whether ADT considers discontinuous embedders. (The importance of the continuity condition is that it is used in the optimality proof; the optimality proof can't apply to chicken for this reason).
Asymptotic Decision Theory (Improved Writeup)
in the ADT paper, the asymptotic dominance argument is about the limit of the agent's action as epsilon goes to 0. This limit is not necessarily computable, so the embedder can't contain the agent, since it doesn't know epsilon. So the evil problem doesn't work.

Agreed that the evil problem doesn't work for the original ADT paper. In the original ADT paper, the agents are allowed to output distributions over moves. I didn't like this because it implicitly assumes that it's possible for the agent to perfectly randomize, an... (read more),

2Jessica Taylor2yThe fact that we take the limit as epsilon goes to 0 means the evil problem can't be constructed, even if randomization is not allowed. (The proof in the ADT paper doesn't work, but that doesn't mean something like it couldn't possibly work) You're right, this is an error in the proof, good catch. Re chicken: The interpretation of the embedder that I meant is "opponent only uses the embedder where it is up against [whatever policy you plugged in]". This embedder does not get knocked down by the reality filter. Let Et be the embedder. The logical inductor expects Ut to equal the crash/crash utility, and it also expects Et(⌈ADTϵ⌉) to equal the crash/crash utility. The expressions Ut and Et(⌈ADTϵ⌉) are provably equal, so of course the logical inductor expects them to be the same, and the reality check passes. The error in your argument is that you are embedding actions rather than agents. The fact that NeverSwerveBot and ADT both provably always take the straight action does not mean the embedder assigns them equal utilities.
Asymptotic Decision Theory (Improved Writeup)

I got an improved reality-filter that blocks a certain class of environments that lead conjecture 1 to fail, although it isn't enough to deal with the provided chicken example and lead to a proof of conjecture 1. (the subscripts will be suppressed for clarity)

Instead of the reality-filter for being

it is now

This doesn't just check whether reality is recovered on average, it also checks whether all the "plausible conditionals" line up as well. Some of the con... (read more)

Reflective AIXI and Anthropics

I figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating.

Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing som... (read more)

1interstice3yWell, it COULD be the case that the K-complexity of the memory-erased AIXI environment is lower, even when it learns that this happened. The reason for this is that there could be many possible past AIXI's who have their memory erased/altered and end up in the same subjective situation. Then the memory-erasure hypothesis can use the lowest K-complexity AIXI who ends up with these memories. As the AIXI learns more it can gradually piece together which of the potential past AIXI's it actually was and the K-complexity will go back up again. EDIT: Oh, I see you were talking about actually having a RANDOM memory in the sense of a random sequence of 1s and 0s. Yeah, but this is no different than AIXI thinking that any random process is high K-complexity. In general, and discounting merging, the memory-altering subroutine will increase the complexity of the environment by a constant plus the complexity of whatever transformation you want to apply to the memories.
Reflective AIXI and Anthropics

Not quite. If taking bet 9 is a prerequisite to taking bet 10, then AIXI won't take bet 9, but if bet 10 gets offered whether or not bet 9 is accepted, then AIXI will be like "ah, future me will take the bet, and wind up with 10+ in the heads world and -20+2 in the tails world. This is just a given. I'll take this +15/-15 bet as it has net positive expected value, and the loss in the heads world is more than counterbalanced by the reduction in the magnitude of loss for the tails world"

Something else feels slightly off, but I can'... (read more)

1Diffractor3yI figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating. Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing some other simple environment, you would intuitively expect the AIXI to act as if it's in the simple environment for a brief time before gaining enough information to conclude that things have changed and rederive the new rules of where it is.
Asymptotic Decision Theory (Improved Writeup)

Yup, I meant counterfactual mugging. Fixed.

Asymptotic Decision Theory (Improved Writeup)

I think I remember the original ADT paper showing up on agent foundations forum before a writeup on logical EDT with exploration, and my impression of which came first was affected by that. Also, the "this is detailed in this post" was referring to logical EDT for exploration. I'll edit for clarity.

3Jessica Taylor3yOK, I helped invent ADT so I know it conceptually came after. (I don't think it was "shortly after"; logical EDT was invented very shortly after logical inductors, in early 2016, and ADT was in late 2016). I think you should link to the ADT paper in the intro section so people know what you're talking about.
Reflective AIXI and Anthropics

I actually hadn't read that post or seen the idea anywhere before writing this up. It's a pretty natural resolution, so I'd be unsurprised if it was independently discovered before. Sorry about being unable to assist.

The extra penalty to describe where you are in the universe corresponds to requiring sense data to pin down *which* star you are near, out of the many stars, even if you know the laws of physics, so it seems to recover desired behavior.

Cooperative Oracles

Giles Edkins coded up a thing which lets you plug in numbers for a 2-player, 2-move game payoff matrix and it automatically displays possible outcomes in utility-space. It may be found here. The equilibrium points and strategy lines were added later in MS Paint.

Probabilistic Tiling (Preliminary Attempt)

Ah, the formal statement was something like "if the policy A isn't the argmax policy, the successor policy B must be in the policy space of the future argmax, and the action selected by policy A is computed so the relevant equality holds"

Yeah, I am assuming fast feedback that it is resolved on day .

What I meant was that the computation isn't extremely long in the sense of description length, not in the sense of computation time. Also, we aren't doing policy search over the set of all turing machines, we're doing policy searc... (read more)

1Alex Mennen3yWouldn't the set of all action sequences have lower description length than some large finite set of policies? There's also the potential problem that all of the policies in the large finite set you're searching over could be quite far from optimal.
Probabilistic Tiling (Preliminary Attempt)

First: That notation seems helpful. Fairness of the environment isn't present by default, it still needs to be assumed even if the environment is purely action-determined, as you can consider an agent in the environment that is using a hardwired predictor of what the argmax agent would do. It is just a piece of the environment, and feeding a different sequence of actions into the environment as input gets a different score, so the environment is purely action-determined, but it's still unfair in the sense that the expected utility of feeding acti... (read more)

1Alex Mennen3yOk, understood on the second assumption. U is not a function to [0,1], but a function to the set of [0,1]-valued random variables, and your assumption is that this random variable is uncorrelated with certain claims about the outputs of certain policies. The intuitive explanation of the third condition made sense; my complaint was that even with the intended interpretation at hand, the formal statement made no sense to me. I'm pretty sure you're assuming that ϕ is resolved on day n, not that it is resolved eventually. Searching over the set of all Turing machines won't halt in a reasonably short amount of time, and in fact won't halt ever, since the set of all Turing machines is non-compact. So I don't see what you mean when you say that the computation is not extremely long.
Complete Class: Consequentialist Foundations

Pretty much that, actually. It doesn't seem too irrational, though. Upon looking at a mathematical universe where torture was decided upon as a good thing, it isn't an obvious failure of rationality to hope that a cosmic ray flips the sign bit of the utility function of an agent in there.

The practical problem with values that care about other mathematical worlds, however, is that if the agent you built has a UDT prior over values, it's an improvement (from the perspective of the prior) for the nosy neigbors/values that care about other world... (read more)

1Vladimir Slepnev3yAre you talking about something like this? "I'm grateful to HAL for telling me that cows have feelings. Now I'm pretty sure that, even if HAL had a glitch and mistakenly told me that cows are devoid of feeling, eating them would still be wrong." That's valid reasoning. The right way to formalize it is to have two worlds, one where eating cows is okay and another where eating cows is not okay, without any "nosy preferences". Then you receive probabilistic evidence about which world you're in, and deal with it in the usual way.
An environment for studying counterfactuals

If exploration is a hack, then why do pretty much all multi-armed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?

2Alex Mennen3yThe multi-armed bandit problem is a many-round problem in which actions in early rounds provide information that is useful for later rounds, so it makes sense to explore to gain this information. That's different from using exploration in one-shot problems to make the counterfactuals well-defined, which is a hack.
Optimal and Causal Counterfactual Worlds

Yeah, when I went back and patched up the framework of this post to be less logical-omniscence-y, I was able to get , but 2 is a bit too strong to be proved from 1, because my framing of 2 is just about probability disagreements in general, while 1 requires to assign probability 1 to .

A Loophole for Self-Applicative Soundness

I found an improved version by Pavel, that gives a way to construct a proof of from that has a length of . The improved version is here.

There are restrictions to this result, though. One is that the C-rule must apply to the logic. This is just the ability to go from to instantiating a such that . Pretty much all reasonable theorem provers have this.

The second restriction is that the theory must be finitely axiomatizable. No axiom schemas allowed. Again, this isn't much of a restriction in practice, because NBG set theory, which prov

A Loophole for Self-Applicative Soundness

Caught a flaw with this proposal in the currently stated form, though it is probably patchable.

When unpacking a proof, at some point the sentence will be reached as a conclusion, which is a false statement.

0Sam Eisenstat3yI misunderstood your proposal, but you don't need to do this work to get what you want. You can just take each sentence □nϕ→ϕ as an axiom, but declare that this axiom takes n symbols to invoke. This could be done by changing the notion of length of a proof, or by taking axioms ψϕ,n→(□nϕ→ϕ) and ψϕ,n with ψϕ,n very long.
Smoking Lesion Steelman

I think that in that case, the agent shouldn't smoke, and CDT is right, although there is side-channel information that can be used to come to the conclusion that the agent should smoke. Here's a reframing of the provided payoff matrix that makes this argument clearer. (also, your problem as stated should have 0 utility for a nonsmoker imagining the situation where they smoke and get killed)

Let's say that there is a kingdom which contains two types of people, good people and evil people, and a person doesn't necessarily know which type they are. There is a

Musings on Exploration

A: While that is a really interesting note that I hadn't spotted before, the standard formulation of exploration steps in logical inductor decision theory involve infinite exploration steps over all time, so even though an agent of this type would be able to inductively learn from what other agents do in different decision problems in less time than it naively appears, that wouldn't make it explore less.

B: What I intended with the remark about Thompson sampling was that troll bridge functions on there being two distinct causes of "attempting to cross the b

A Difficulty With Density-Zero Exploration

Update: This isn't really an issue, you just need to impose an assumption that there is some function such that , and is computable in time polynomial in , and you always find out whether exploration happened on turn after days.

This is just the condition that there's a subsequence where good feedback is possible, and is discussed significantly in section 4.3 of the logical induction paper.

If there's a subsequence B (of your subsequence of interest, A) where you can get good feedback, then there's infinite exploration st

Distributed Cooperation

If you drop the Pareto-improvement condition from the cell rank, and just have "everyone sorts things by their own utility", then you won't necessarily get a Pareto-optimal outcome (within the set of cell center-points), but you will at least get a point where there are no strict Pareto improvements (no points that leave everyone better off).

The difference between the two is... let's say we've got a 2-player 2-move game that in utility-space, makes some sort of quadrilateral. If the top and right edges join at 90 degrees, the Pareto-frontier would be the p

Further Progress on a Bayesian Version of Logical Uncertainty

Intermediate update:

The handwavy argument about how you'd get propositional inconsistency in the limit of imposing the constraint of "the string cannot contain and and and... and "

is less clear than I thought. The problem is that, while the prior may learn that that constraint applies as it updates on more sentences, that particular constraint can get you into situations where adding either or leads to a violation of the constraint.

So, running the prior far enough forward leads to the probability distribution being nearly certain that

Delegative Inverse Reinforcement Learning

A summary that might be informative to other people: Where does the requirement on the growth rate of the "rationality parameter" come from?

Well, the expected loss of the agent comes from two sources. Making a suboptimal choice on its own, and incurring a loss from consulting a not-fully-rational advisor. The policy of the agent is basically "defer to the advisor when the expected loss over all time of acting (relative to the optimal move by an agent who knew the true environment) is too high". Too high, in this case, cashes out as "higher than

Delegative Inverse Reinforcement Learning

I don't believe that was defined anywhere, but we "use the definition" in the proof of Lemma 1.

As far as I can tell, it's a set of (j,y) pairs, where j is the index of a hypothesis, and y is an infinite history string, rather like the set .

How do the definitions of and differ?

0Vanessa Kosoy3yHi Alex! The definition of h!k makes sense for any h, that is, the superscript !k in this context is a mapping from finite histories to sets of pairs as you said. In the line in question we just apply this mapping to x:n where x is a bound variable coming from the expected value. I hope this helps?
Delegative Inverse Reinforcement Learning

What is , in the context of the proof of Lemma A? I don't believe it was defined anywhere else.

Delegative Inverse Reinforcement Learning

By the stated definitions, "v-avoidable event" is pretty much trivial when the event doesn't lead to lasting utility loss. The conditions on "v-avoidable event" are basically:

The agent's policy converges to optimality.

There's a sublinear function D(t) where the agent avoids the event with probability 1 for D(t) time, in the limit as t goes to infinity.

By this definition, "getting hit in the face with a brick before round 3" is an avoidable event, even when the sequence of policies lead to the agent getting hit in the face with a brick on round 2 with certa

Predictable Exploration

Hm, I got the same result from a different direction.

It's possible to view a policy of the form "I'll compute X and respond based on what X outputs" as... tying your output to X, in a sense. Logical link formation, if you will.

And policies of the form "I'll compute X and respond in a way that makes that output of X impossible/improbable" (can't always do this) correspond to logical link cutting.

And with this, we see what the chicken rule in MUDT/exploration in LIDT is doing. It's systematically cutting

0Abram Demski3yThinking about this more, I think there's an important disanalogy between trying to make policy decisions with earlier market states vs smaller proof-searches. In Agent Simulates Predictor, we can use an earlier market state to decide our policy, because the earlier market state can trust the predictor to make the right predictions, even if the predictor is using a more powerful logic (since logical inductors can learn to boundedly trust more powerful logics). However, with proof-based DTs, no analogous move is possible. Consider a version of Agent Simulates Predictor in which Omega searches for a proof that you one-box in PA+Con(PA); if one is found, Omega fills the $1m box. Otherwise, not. Omega has T1 time to think. The agent has T2 time to think, T2>> T1. The agent reasons in PA. If the agent refused to use all its time, and only ran for T0<<T1 time, but still had enough time to find interesting proofs, then it could reason as follows: "If I one-box, then there is a short proof that I one-box which Omega can find. So I get$1M." It may not know if PA+Con(PA) is sound, but that doesn't matter; the agent just has to ensure that there is a proof which Omega will find. It wouldn't find any proofs leading to higher utility that this, so it would one-box and get $1M. Unfortunately, I don't see any way to harness the shorter proof-search to choose a policy which would get the$1M in this case but choose to think longer in other cases where that's beneficial. We might want the agent to reason: "If I stop and one-box right now, Omega will be able to prove that I one-box, and I'll get $1M. If I wait longer, Omega won't be able to prove what I do, so I'll at most be able to get$100. So, I'll stop now and one-box." However, this reasoning would have to take place at a proof-length in which several things hold at once: * The agent can prove that it's still "early" enough that its action would be provable to Omega if it acted now. * It's "late" enough that the ag
0Abram Demski3yAgreed! I'm reading this as "You want to make decisions as early as you can, because when you decide one of the things you can do is decide to put the decision off for later; but when you make a decision later, you can't decide to put it earlier." And "logical time" here determines whether others can see your move when they decide to make theirs. You place yourself upstream of more things if you think less before deciding. Here's where I'm saying "just use the chicken rule again, in this stepped-back reasoning". It likely re-introduces versions the same problems at the higher level, but perhaps iterating this process as many times as we can afford is in some sense the best we can do.
Smoking Lesion Steelman III: Revenge of the Tickle Defense

What does the Law of Logical Causality say about CON(PA) in Sam's probabilistic version of the troll bridge?

My intuition is that in that case, the agent would think CON(PA) would be causally downstream of itself, because the distribution of actions conditional on CON(PA) and CON(PA) are different.

Can we come up with any example where the agent thinking it can control CON(PA) (or any other thing that enables accurate predictions of its actions) actually gets it into trouble?

0Abram Demski3yI agree, my intuition is that LLC asserts that the troll, and even CON(PA), is downstream. And, it seems to get into trouble because it treats it as downstream. I also suspect that Troll Bridge will end up formally outside the realm where LLC can be justified by the desire to make ratifiability imply CDT=EDT. (I'm working on another post which will go into that more.)
Cooperative Oracles: Stratified Pareto Optima and Almost Stratified Pareto Optima

It looks legitimate, actually.

Remember, is set-valued, so if , . In all other cases, . is a nonempty convex set-valued function, so all that's left is to show the closed graph property. If the limiting value of is something other than 0, the closed graph property holds, and if the limiting value of is 0, the closed graph property holds because .

0Vanessa Kosoy4yHi Alex! I agree that the multimap you described is Kakutani and gives the correct fair set, but in the OP it says that if ri−1=0 then f(r)=ri, not f(r)=[0,1]. Maybe I am missing something about the notation?
All Mathematicians are Trollable: Divergence of Naturalistic Logical Updates

Quick question: It is possible to drive the probability of x down arbitrarily far by finding a bunch of proofs of the form "x implies y" where y is a theorem. But the exact same argument applies to not x.

If the theorem-prover always finds a proof of the form "not x implies y" immediately afterwards, the probability wouldn't converge, but it would fluctuate within a certain range, which looks good enough.

What, if any, conditions need to be imposed on the theorem prover to confine the probabilities assigned to an unprovable statement to a range that is narrower than (0, 1)?