You're completely right that hypotheses with unconstrained Murphy get ignored because you're doomed no matter what you do, so you might as well optimize for just the other hypotheses where what you do matters. Your "-1,000,000 vs -999,999 is the same sort of problem as 0 vs 1" reasoning is good.
Again, you are making the serious mistake of trying to think about Murphy verbally, rather than thinking of Murphy as the personification of the "inf" part of the definition of expected value, and writing actual equations. &nb... (read more)
There's actually an upcoming post going into more detail on what the deal is with pseudocausal and acausal belief functions, among several other things, I can send you a draft if you want. "Belief Functions and Decision Theory" is a post that hasn't held up nearly as well to time as "Basic Inframeasure Theory".
If you use the Anti-Nirvana trick, your agent just goes "nothing matters at all, the foe will mispredict and I'll get -infinity reward" and rolls over and cries since all policies are optimal. Don't do that one, it's a bad idea.
For the concave expectation functionals: Well, there's another constraint or two, like monotonicity, but yeah, LF duality basically says that you can turn any (monotone) concave expectation functional into an inframeasure. Ie, all risk aversion can be interpreted as having radical uncertainty over some aspects of how the environment... (read more)
Maximin, actually. You're maximizing your worst-case result.
It's probably worth mentioning that "Murphy" isn't an actual foe where it makes sense to talk about destroying resources lest Murphy use them, it's just a personification of the fact that we have a set of options, any of which could be picked, and we want to get the highest lower bound on utility we can for that set of options, so we assume we're playing against an adversary with perfectly opposite utility function for intuition. For that last paragraph, translating it back out from the "Murphy" t... (read more)
So, first off, I should probably say that a lot of the formalism overhead involved in this post in particular feels like the sort of thing that will get a whole lot more elegant as we work more things out, but "Basic inframeasure theory" still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up.
Yes, your current understanding is correct, it's rebuilding probability theory in more generality to be sui... (read more)
So, we've also got an analogue of KL-divergence for crisp infradistributions.
We'll be using and for crisp infradistributions, and and for probability distributions associated with them. will be used for the KL-divergence of infradistributions, and will be used for the KL-divergence of probability distributions. For crisp infradistributions, the KL-divergence is defined as
I'm not entirely sure why it's like this, but it has the basic properties yo... (read more)
Potential counterargument: Second-strike capabilities are still relevant in the interstellar setting. You could build a bunch of hidden ships in the oort cloud to ram the foe and do equal devastation if the other party does it first, deterring a first strike even with tensions and an absence of communication. Further, while the "ram with high-relativistic objects" idea works pretty well for preemptively ending a civilization confined to a handful of planets, AI's would be able to colonize a bunch of little asteroids and KBO's and comets in the oort cloud, and the higher level of dispersal would lead to preemptive total elimination being less viable.
I will be hosting a readthrough of this sequence on MIRIxDiscord again, PM for a link.
So, here's some considerations (not an actual policy)
It's instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.
First, a chunk from Wikipedia
... (read more)Most of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine ar
Maximin over outcomes would lead to the agent devoting all its efforts towards avoiding the worst outcomes, sacrificing overall utility, while maximin over expected value pushes towards policies that do acceptably on average in all of the environments that it may find itself in.
Regarding "why listen to past me", I guess to answer this question I'd need to ask about your intuitions on Counterfactual mugging. What would you do if it's one-shot? What would you do if it's repeated? If you were told about the problem beforehand, would you pay money for a commitment mechanism to make future-you pay up the money if asked? (for +EV)
Yeah, looking back, I should probably fix the m- part and have the signs being consistent with the usual usage where it's a measure minus another one, instead of the addition of two signed measures, one a measure and one a negative measure. May be a bit of a pain to fix, though, the proof pages are extremely laggy to edit.
Wikipedia's definition can be matched up with our definition by fixing a partial order where iff there's a that's a sa-measure s.t. , and this generalizes to any closed c... (read more)
We go to the trouble of sa-measures because it's possible to add a sa-measure to an a-measure, and get another a-measure where the expectation values of all the functions went up, while the new a-measure we landed at would be impossible to make by adding an a-measure to an a-measure.
Basically, we've gotta use sa-measures for a clean formulation of "we added all the points we possibly could to this set", getting the canonical set in your equivalence class.
Admittedly, you could intersect with the cone of a-measures again at the end (as we do in the next post... (read more)
I found a paper about this exact sort of thing. Escardo and Olivia call that type signature a "selection functional", and the type signature is called a "quantification functional", and there's several interesting things you can do with them, like combining multiple selection functionals into one in a way that looks reminiscent of game theory. (ie, if has type signature , and has type signature , then has type signature ... (read more)
Oh, I see what the issue is. Propositional tautology given means , not . So yeah, when A is a boolean that is equivalent to via boolean logic alone, we can't use that A for the exact reason you said, but if A isn't equivalent to via boolean logic alone (although it may be possible to infer by other means), then the denominator isn't necessarily small.
Yup, a monoid, because and , so it acts as an identitity element, and we don't care about the order. Nice catch.
You're also correct about what propositional tautology given A means.
(lightly edited restatement of email comment)
Let's see what happens when we adapt this to the canonical instance of "no, really, counterfactuals aren't conditionals and should have different probabilities". The cosmic ray problem, where the agent has the choice between two paths, it slightly prefers taking the left path, but its conditional on taking the right path is a tiny slice of probability mass that's mostly composed of stuff like "I took the suboptimal action because I got hit by a cosmic ray".
There will be 0 utili... (read more)
It actually is a weakening. Because all changes can be interpreted as making some player worse off if we just use standard Pareto optimality, the second condition mean that more changes count as improvements, as you correctly state. The third condition cuts down on which changes count as improvements, but the combination of conditions 2 and 3 still has some changes being labeled as improvements that wouldn't be improvements under the old concept of Pareto Optimality.
The definition of an almost stratified Pareto optimum was adapted from this , and was... (read more)
My initial inclination is to introduce as the space of events on turn , and define and then you can express it as .
The notation for the sum operator is unclear. I'd advise writing the sum as and using an subscript inside the sum so it's clearer what is being substituted where.
Wasn't there a fairness/continuity condition in the original ADT paper that if there were two "agents" that converged to always taking the same action, then the embedder would assign them the same value? (more specifically, if , then ) This would mean that it'd be impossible to have be low while is high, so the argument still goes through.
Although, after this whole line of discussion, I'm realizing that there are enough substantial differences between the ori... (read more)
in the ADT paper, the asymptotic dominance argument is about the limit of the agent's action as epsilon goes to 0. This limit is not necessarily computable, so the embedder can't contain the agent, since it doesn't know epsilon. So the evil problem doesn't work.
Agreed that the evil problem doesn't work for the original ADT paper. In the original ADT paper, the agents are allowed to output distributions over moves. I didn't like this because it implicitly assumes that it's possible for the agent to perfectly randomize, an... (read more),
I got an improved reality-filter that blocks a certain class of environments that lead conjecture 1 to fail, although it isn't enough to deal with the provided chicken example and lead to a proof of conjecture 1. (the subscripts will be suppressed for clarity)
Instead of the reality-filter for being
it is now
This doesn't just check whether reality is recovered on average, it also checks whether all the "plausible conditionals" line up as well. Some of the con... (read more)
I figured out what feels slightly off about this solution. For events like "I have a long memory and accidentally dropped a magnet on it", it intuitively feels like describing your spot in the environment and the rules of your environment is much lower K-complexity than finding a turing machine/environment that starts by giving you the exact (long) scrambled sequence of memories that you have, and then resumes normal operating.
Although this also feels like something nearby is actually desired behavior. If you rewrite the tape to be describing som... (read more)
Not quite. If taking bet 9 is a prerequisite to taking bet 10, then AIXI won't take bet 9, but if bet 10 gets offered whether or not bet 9 is accepted, then AIXI will be like "ah, future me will take the bet, and wind up with 10+ in the heads world and -20+2 in the tails world. This is just a given. I'll take this +15/-15 bet as it has net positive expected value, and the loss in the heads world is more than counterbalanced by the reduction in the magnitude of loss for the tails world"
Something else feels slightly off, but I can'... (read more)
Yup, I meant counterfactual mugging. Fixed.
I think I remember the original ADT paper showing up on agent foundations forum before a writeup on logical EDT with exploration, and my impression of which came first was affected by that. Also, the "this is detailed in this post" was referring to logical EDT for exploration. I'll edit for clarity.
I actually hadn't read that post or seen the idea anywhere before writing this up. It's a pretty natural resolution, so I'd be unsurprised if it was independently discovered before. Sorry about being unable to assist.
The extra penalty to describe where you are in the universe corresponds to requiring sense data to pin down *which* star you are near, out of the many stars, even if you know the laws of physics, so it seems to recover desired behavior.
Giles Edkins coded up a thing which lets you plug in numbers for a 2-player, 2-move game payoff matrix and it automatically displays possible outcomes in utility-space. It may be found here. The equilibrium points and strategy lines were added later in MS Paint.
Ah, the formal statement was something like "if the policy A isn't the argmax policy, the successor policy B must be in the policy space of the future argmax, and the action selected by policy A is computed so the relevant equality holds"
Yeah, I am assuming fast feedback that it is resolved on day .
What I meant was that the computation isn't extremely long in the sense of description length, not in the sense of computation time. Also, we aren't doing policy search over the set of all turing machines, we're doing policy searc... (read more)
First: That notation seems helpful. Fairness of the environment isn't present by default, it still needs to be assumed even if the environment is purely action-determined, as you can consider an agent in the environment that is using a hardwired predictor of what the argmax agent would do. It is just a piece of the environment, and feeding a different sequence of actions into the environment as input gets a different score, so the environment is purely action-determined, but it's still unfair in the sense that the expected utility of feeding acti... (read more)
Pretty much that, actually. It doesn't seem too irrational, though. Upon looking at a mathematical universe where torture was decided upon as a good thing, it isn't an obvious failure of rationality to hope that a cosmic ray flips the sign bit of the utility function of an agent in there.
The practical problem with values that care about other mathematical worlds, however, is that if the agent you built has a UDT prior over values, it's an improvement (from the perspective of the prior) for the nosy neigbors/values that care about other world... (read more)
If exploration is a hack, then why do pretty much all multi-armed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?
Yeah, when I went back and patched up the framework of this post to be less logical-omniscence-y, I was able to get , but 2 is a bit too strong to be proved from 1, because my framing of 2 is just about probability disagreements in general, while 1 requires to assign probability 1 to .
I found an improved version by Pavel, that gives a way to construct a proof of from that has a length of . The improved version is here.
There are restrictions to this result, though. One is that the C-rule must apply to the logic. This is just the ability to go from to instantiating a such that . Pretty much all reasonable theorem provers have this.
The second restriction is that the theory must be finitely axiomatizable. No axiom schemas allowed. Again, this isn't much of a restriction in practice, because NBG set theory, which prov
... (read more)Caught a flaw with this proposal in the currently stated form, though it is probably patchable.
When unpacking a proof, at some point the sentence will be reached as a conclusion, which is a false statement.
I think that in that case, the agent shouldn't smoke, and CDT is right, although there is side-channel information that can be used to come to the conclusion that the agent should smoke. Here's a reframing of the provided payoff matrix that makes this argument clearer. (also, your problem as stated should have 0 utility for a nonsmoker imagining the situation where they smoke and get killed)
Let's say that there is a kingdom which contains two types of people, good people and evil people, and a person doesn't necessarily know which type they are. There is a
... (read more)A: While that is a really interesting note that I hadn't spotted before, the standard formulation of exploration steps in logical inductor decision theory involve infinite exploration steps over all time, so even though an agent of this type would be able to inductively learn from what other agents do in different decision problems in less time than it naively appears, that wouldn't make it explore less.
B: What I intended with the remark about Thompson sampling was that troll bridge functions on there being two distinct causes of "attempting to cross the b
... (read more)Update: This isn't really an issue, you just need to impose an assumption that there is some function such that , and is computable in time polynomial in , and you always find out whether exploration happened on turn after days.
This is just the condition that there's a subsequence where good feedback is possible, and is discussed significantly in section 4.3 of the logical induction paper.
If there's a subsequence B (of your subsequence of interest, A) where you can get good feedback, then there's infinite exploration st
... (read more)If you drop the Pareto-improvement condition from the cell rank, and just have "everyone sorts things by their own utility", then you won't necessarily get a Pareto-optimal outcome (within the set of cell center-points), but you will at least get a point where there are no strict Pareto improvements (no points that leave everyone better off).
The difference between the two is... let's say we've got a 2-player 2-move game that in utility-space, makes some sort of quadrilateral. If the top and right edges join at 90 degrees, the Pareto-frontier would be the p
... (read more)Intermediate update:
The handwavy argument about how you'd get propositional inconsistency in the limit of imposing the constraint of "the string cannot contain and and and... and "
is less clear than I thought. The problem is that, while the prior may learn that that constraint applies as it updates on more sentences, that particular constraint can get you into situations where adding either or leads to a violation of the constraint.
So, running the prior far enough forward leads to the probability distribution being nearly certain that
... (read more)A summary that might be informative to other people: Where does the requirement on the growth rate of the "rationality parameter" come from?
Well, the expected loss of the agent comes from two sources. Making a suboptimal choice on its own, and incurring a loss from consulting a not-fully-rational advisor. The policy of the agent is basically "defer to the advisor when the expected loss over all time of acting (relative to the optimal move by an agent who knew the true environment) is too high". Too high, in this case, cashes out as "higher than
... (read more)I don't believe that was defined anywhere, but we "use the definition" in the proof of Lemma 1.
As far as I can tell, it's a set of (j,y) pairs, where j is the index of a hypothesis, and y is an infinite history string, rather like the set .
How do the definitions of and differ?
What is , in the context of the proof of Lemma A? I don't believe it was defined anywhere else.
By the stated definitions, "v-avoidable event" is pretty much trivial when the event doesn't lead to lasting utility loss. The conditions on "v-avoidable event" are basically:
The agent's policy converges to optimality.
There's a sublinear function D(t) where the agent avoids the event with probability 1 for D(t) time, in the limit as t goes to infinity.
By this definition, "getting hit in the face with a brick before round 3" is an avoidable event, even when the sequence of policies lead to the agent getting hit in the face with a brick on round 2 with certa
... (read more)Hm, I got the same result from a different direction.
(probably very confused/not-even-wrong thoughts ahead)
It's possible to view a policy of the form "I'll compute X and respond based on what X outputs" as... tying your output to X, in a sense. Logical link formation, if you will.
And policies of the form "I'll compute X and respond in a way that makes that output of X impossible/improbable" (can't always do this) correspond to logical link cutting.
And with this, we see what the chicken rule in MUDT/exploration in LIDT is doing. It's systematically cutting
... (read more)What does the Law of Logical Causality say about CON(PA) in Sam's probabilistic version of the troll bridge?
My intuition is that in that case, the agent would think CON(PA) would be causally downstream of itself, because the distribution of actions conditional on CON(PA) and CON(PA) are different.
Can we come up with any example where the agent thinking it can control CON(PA) (or any other thing that enables accurate predictions of its actions) actually gets it into trouble?
It looks legitimate, actually.
Remember, is set-valued, so if , . In all other cases, . is a nonempty convex set-valued function, so all that's left is to show the closed graph property. If the limiting value of is something other than 0, the closed graph property holds, and if the limiting value of is 0, the closed graph property holds because .
Quick question: It is possible to drive the probability of x down arbitrarily far by finding a bunch of proofs of the form "x implies y" where y is a theorem. But the exact same argument applies to not x.
If the theorem-prover always finds a proof of the form "not x implies y" immediately afterwards, the probability wouldn't converge, but it would fluctuate within a certain range, which looks good enough.
What, if any, conditions need to be imposed on the theorem prover to confine the probabilities assigned to an unprovable statement to a range that is narrower than (0, 1)?
Sounds like a special case of crisp infradistributions (ie, all partial probability distributions have a unique associated crisp infradistribution)
Given some Q, we can consider the (nonempty) set of probability distributions equal to Q where Q is defined. This set is convex (clearly, a mixture of two probability distributions which agree with Q about the probability of an event will also agree with Q about the probability of an event).
Convex (compact) sets of probability distributions = crisp infradistributions.... (read more)