...the problem of how to choose one's IBH prior. (If the solution was something like "it's subjective/arbitrary" that would be pretty unsatisfying from my perspective.)
It seems clear to me that the prior is subjective. Like with Solomonoff induction, I expect there to exist something like the right asymptotic for the prior (i.e. an equivalence class of priors under the equivalence relation where and are equivalent when there exists some s.t. and ), but not a unique correct prior, just...
...I'm still comfortable sticking with "most are wide open".
Allow me to rephrase. The problems are open, that's fair enough. But, the gist of your post seems to be: "Since coming up with UDT, we ran into these problems, made no progress, and are apparently at a dead end. Therefore, UDT might have been the wrong turn entirely." On the other hand, my view is: Since coming up with those problems, we made a lot of progress on agent theory within the LTA, which has implications on those problems among other things, and so far this progress seems to only r...
I'll start with Problem 4 because that's the one where I feel closest to the solution. In your 3-player Prisoner's Dilemma, infra-Bayesian hagglers[1] (IBH agents) don't necessarily play CCC. Depending on their priors, they might converge to CCC or CCD or other Pareto-efficient outcome[2]. Naturally, if the first two agents have identical priors then e.g. DCC is impossible, but CCD still is. Whereas, if all 3 have the same prior they will necessarily converge to CCC. Moreover, there is no "best choice of prior": different choices do better in differen...
The way I see it, all of these problems are reducible to (i) understanding what's up with the monotonicity principle in infra-Bayesian physicalism and (ii) completing a new and yet unpublished research direction (working title: "infra-Bayesian haggling") which shows that IB agents converge to Pareto efficient outcomes[1]. So, I wouldn't call them "wide open".
Sometimes, but there are assumptions, see child comment for more details.
First, I think that the theory of agents is a more useful starting point than metaphilosophy. Once we have a theory of agents, we can build models, within that theory, of agents reasoning about philosophical questions. Such models would be answers to special cases of metaphilosophy. I'm not sure we're going to have a coherent theory of "metaphilosophy" in general, distinct from the theory of agents, because I'm not sure that "philosophy" is an especially natural category[1].
Some examples of what that might look like:
Here is the sketch of a simplified model for how a metacognitive agent deals with traps.
Consider some (unlearnable) prior over environments, s.t. we can efficiently compute the distribution over observations given any history . For example, any prior over a small set of MDP hypotheses would qualify. Now, for each , we regard as a "program" that the agent can execute and form beliefs about. In particular, we have a "metaprior" consisting of metahypotheses: hypotheses-about-programs.
For ...
Jobst Heitzig asked me whether infra-Bayesianism has something to say about the absent-minded driver (AMD) problem. Good question! Here is what I wrote in response:
...Philosophically, I believe that it is only meaningful to talk about a decision problem when there is also some mechanism for learning the rules of the decision problem. In ordinary Newcombian problems, you can achieve this by e.g. making the problem iterated. In AMD, iteration doesn't really help because the driver doesn't remember anything that happened before. We can consider a version of iter
Physicalist agents see themselves as inhabiting an unprivileged position within the universe. However, it's unclear whether humans should be regarded as such agents. Indeed, monotonicity is highly counterintuitive for humans. Moreover, historically human civilization struggled a lot with accepting the Copernican principle (and is still confused about issues such as free will, anthropics and quantum physics which physicalist agents shouldn't be confused about). This presents a problem for superimitation.
What if humans are actually cartesian agents? Then, it...
Until now I believed that a straightforward bounded version of the Solomonoff prior cannot be the frugal universal prior because Bayesian inference under such a prior is NP-hard. One reason it is NP-hard is the existence of pseudorandom generators. Indeed, Bayesian inference under such a prior distinguishes between a pseudorandom and a truly random sequence, whereas a polynomial-time algorithm cannot distinguish between them. It also seems plausible that, in some sense, this is the only obstacle: it was established that if one-way functions don't exist (wh...
I have a question about the conjecture at the end of Direction 17.5. Let be a utility function with values in and let be a strictly monotonous function. Then and have the same maxima. can be non-linear, e.g. . Therefore, I wonder if the condition should be weaker.
No, because it changes the expected value of the utility function under various distributions.
...Moreover, I ask myself if it is possible to modify by a smal
...Here’s a plausible human circular preference. You won a prize! Your three options are: (A) 5 lovely plates, (B) 5 lovely plates and 10 ugly plates, (C) 5 OK plates.
No one has done this exact experiment to my knowledge, but plausibly (based on discussion of a similar situation in Thinking Fast And Slow chapter 15) this is a circular preference in at least some people: When people see just A & B, they'll pick B because "it's more stuff, I can always keep the ugly ones as spares or use them for target practice or whatever". When they see just B & C, t
I propose the axioms A1-A3 together with
B2. If then for any we have
B3. If and , then for any we have
I suspect that these imply C4.
Maybe I am confused by what you mean by . I thought it was the state space, but that isn't consistent with in your post which was defined over ?
I'm not entirely sure what you mean by the state space. is a state space associated specifically with the utility function. It has nothing to do with the state space of the environment. The reward function in the OP is , not . I slightly abused notation by defining in the parent comment. Let's say it's and is...
Good idea!
Fix some alphabet . Here's how you make an automaton that checks that the input sequence (an element of ) is a subsequence of some infinite periodic sequence with period . For every in , let be an automaton that checks whether the symbols in the input sequences at places s.t. are all equal (its number of states is ). We can modify it to make a transducer that produces its unmodified input sequence if the test passes and if the test fails. It also produces when the input is . We then chain ...
This is not a typo.
I'm imagining that we have a program that outputs (i) a time discount parameter , (ii) a circuit for the transition kernel of an automaton and (iii) a circuit for a reward function (and, ii+iii are allowed to have a shared component to save computation time complexity). The utility function is defined by
where is defined recursively by
For the contrived reward function you suggested, we would never have . But for other reward functions, it is possible that . Which is exactly why this framework rejects the contrived reward function in favor of those other reward functions. And also why this framework considers some policies unintelligent (despite the availability of the contrived reward function) and other policies intelligent.
Up to light editing, the following was written by me during the "Finding the Right Abstractions for healthy systems" research workshop, hosted by Topos Institute in January 2023. However, I invented the idea before.
In order to allow (the set of programs) to be infinite in IBP, we need to define the bridge transform for infinite . At first, it might seem can be allowed to be any compact Polish space, and the bridge transform should only depend on the topology on , but that runs into problems. Instead, the right structure on for defining the bridge t...
The following was written by me during the "Finding the Right Abstractions for healthy systems" research workshop, hosted by Topos Institute in January 2023. However, I invented the idea before.
Here's an elegant diagrammatic notation for constructing new infrakernels out of given infrakernels. There is probably some natural category-theoretic way to think about it, but at present I don't know what it is.
By “infrakernel” we will mean a continuous mapping of the form , where and are compact Polish spaces and is the space of credal sets (i.e. close...
My framework discards such contrived reward functions because it penalizes for the complexity of the reward function. In the construction you describe, we have . This corresponds to (no/low intelligence). On the other hand, policies with (high intelligence) have the property that for the which "justifies" this . In other words, your "minimal" overhead is very large from my point of view: to be acceptable, the "overhead" should be substantially negative.
The post is still largely up-to-date. In the intervening year, I mostly worked on the theory of regret bounds for infra-Bayesian bandits, and haven't made much progress on open problems in infra-Bayesian physicalism. On the other hand, I also haven't found any new problems with the framework.
The strongest objection to this formalism is the apparent contradiction between the monotonicity principle and the sort of preferences humans have. While my thinking about this problem evolved a little, I am still at a spot where every solution I know requires biting a...
First, the notation makes no sense. The prior is over hypotheses, each of which is an element of . is the notation used to denote a single hypothesis.
Second, having a prior just over doesn't work since both the loss function and the counterfactuals depend on .
Third, the reason we don't just start with a prior over , is because it's important which prior we have. Arguably, the correct prior is the image of a simplicity prior over physicalist hypotheses by the bridge transform. But, come to think about it, it might be about the sa...
deserves a little more credit than you give it. To interpret the claim correctly, we need to notice and are classes of decision problems, not classes of proof systems for decision problems. You demonstrate that for a fixed proof system it is possible that generating proofs is easier than verifying proofs. However, if we fix a decision problem and allow any valid (i.e. sound and complete) proof system, then verifying cannot be harder than generating. Indeed, let be some proof system and an algorithm for generating proofs (i.e. an algorithm t...
First, no, the AGI is not going to "employ complex heuristics to ever-better approximate optimal hypotheses update". The AGI is going to be based on an algorithm which, as a mathematical fact (if not proved then at least conjectured), converges to the correct hypothesis with high probability. Just like we can prove that e.g. SVMs converge to the optimal hypothesis in the respective class, or that particular RL algorithms for small MDPs converge to the correct hypothesis (assuming realizability).
Second, there's the issue of non-cartesian attacks ("hacking t...
I don't think the argument on hacking relied on the ability to formally verify systems. Formally verified systems could potentially skew the balance of power to the defender side, but even if they don't exist, I don't think balance is completely skewed to the attacker.
My point was not about the defender/attacker balance. My point was that even short-term goals can be difficult to specify, which undermines the notion that we can easily empower ourselves by short-term AI.
...Of course we need to understand how to define "long term" and "short term" here. O
Thanks for the responses Boaz!
Our claim is that one can separate out components - there is the predictable component which is non stationary, but is best approximated with a relatively simple baseline, and the chaotic component, which over the long run is just noise.In general, highly complex rules are more sensitive to noise (in fact, there are theorems along these lines in the field of Analysis of Boolean Functions), and so in the long run, the simpler component will dominate the accuracy.
I will look into analysis of boolean functions, thank you. How...
IIUC the thesis of this article rest on several interrelated claims:
I wish to address these claims one by one.
This is an erroneous application of chaos theory IMO. The core observation of chaos theory is, that in many dynamical systems with compa...
Hi Vanessa,
Let me try to respond (note the claim numbers below are not the same as in the essay, but rather as in Vanessa's comment):
Claim 1: Our claim is that one can separate out components - there is the predictable component which is non stationary, but is best approximated with a relatively simple baseline, and the chaotic component, which over the long run is just noise.In general, highly complex rules are more sensitive to noise (in fact, there are theorems along these lines in the field of Analysis of Boolean Functions), and so in the long run, the...
Even if we did make a goal program, it's still unknown how to build an AGI that is motivated to compute it, or to follow the goals it outputs.
Actually, it is (to a 0th approximation) known how to build an AGI that is motivated to compute it: use infra-Bayesian physicalism. The loss function in IBP already has the semantics "which programs should run". Following the goal it outputs is also formalizable within IBP, but even without this step we can just have utopia inside the goal program itself[1].
We should be careful to prevent the inhabitants of th
P.S.
I think that in your example, if a person is given a button that can save a person on a different planet from being tortured, they will have a direct incentive to press the button, because the button is a causal connection in itself, and consciously reasoning about the person on the other planet is a causal[1] connection in the other direction. That said, a person still has a limited budget of such causal connections (you cannot reason about a group of arbitrarily many people, with fixed non-zero amount of paying attention to the individual details of ...
I'm curious what is the evidence you see that this is false as a description of the values of just about every human, given that
First, you can consider preferences that are impartial but sublinear in the number of people. So, you can disagree with Nate's room analogy without the premise "stuff only matters if it adds to my own life and experiences".
Second, my preferences are indeed partial. But even that doesn't mean "stuff only matters if it adds to my own life and experiences". I do think that stuff only matters (to me) if it's in some sense causally connected to my life and experiences. More details here.
Third, I don't know what do you mean by "good". The questions that I unders...
and, i'd guess that one big universe is more than twice as Fun as two small universes, so even if there were no transaction costs it wouldn't be worth it. (humans can have more fun when there's two people in the same room, than one person each in two separate rooms.)
This sounds astronomically wrong to me. I think that my personal utility function gets close to saturation with a tiny fraction of the resources in universe-shard. Two people is one room is better than two people in separate rooms, yes. But, two rooms with trillion people each is virtually t...
But, two rooms with trillion people each is virtually the same as one room with two trillion. The returns on interactions with additional people fall off exponentially past the Dunbar number.
You're conflating "would I enjoy interacting with X?" with "is it good for X to exist?". Which is almost understandable given that Nate used the "two people can have more fun in the same room" example to illustrate why utility isn't linear in population. But this comment has an IMO bizarre amount of agreekarma (26 net agreement, with 11 votes), which makes me wonder if...
A major impediment in applying RL theory to any realistic scenario is that even the control problem[1] is intractable when the state space is exponentially large (in general). Real-life agents probably overcome this problem by exploiting some special properties of real-life environments. Here are two strong candidates for such properties:
A question that often comes up in discussion of IRL: are agency and values purely behavioral concepts, or do they depend on how the system produces its behavior? The cartesian measure of agency I proposed seems purely behavioral, since it only depends on the policy. The physicalist version seems less so since it depends on the source code, but this difference might be minor: this role of the source is merely telling the agent "where" it is in the universe. However, on closer examination, the physicalist is far from purely behaviorist, and this is true e...
The spectrum you're describing is related, I think, to the spectrum that appears in the AIT definition of agency where there is dependence on the cost of computational resources. This means that the same system can appear agentic from a resource-scarce perspective but non-agentic from a resource-abundant perspective. The former then corresponds to the Vingean regime and the latter to the predictable regime. However, the framework does have a notion of prior and not just utility, so it is possible to ascribe beliefs to Vingean agents. I think it makes sense...
There seems to be an even more elegant way to define causal relationships between agents, or more generally between programs. Starting from a hypothesis , for , we consider its bridge transform . Given some subset of programs we can define then project to [1]. We can then take bridge transform again to get some . The factor now tells us which programs causally affect the manifestation of programs in . Notice that by Proposition 2.8 in the IBP article, when we just get all pro...
The problem of future unaligned AI leaking into human imitation is something I wrote about before. Notice that IDA-style recursion help a lot, because instead of simulating a process going deep into the external timeline's future, you're simulating a "groundhog day" where the researcher wakes up over and over at the same external time (more realistically, the restart time is drifting forward with the time outside the simulation) with a written record of all their previous work (but no memory of it). There can still be a problem if there is a positive proba...
I think it's a terrible idea to automatically adopt an equilibrium notion which incentivises the players to come up with increasingly nasty threats as fallback if they don't get their way. And so there seems to be a good chunk of remaining work to be done, involving poking more carefully at the CoCo value and seeing which assumptions going into it can be broken.
I'm not convinced there is any real problem here. The intuitive negative reaction we have to this "ugliness" is because of (i) empathy and (ii) morality. Empathy is just a part of the utility fun...
This is a fascinating result, but there is a caveat worth noting. When we say that e.g. AlphaGo is "superhuman at go" we are comparing it humans who (i) spent years training on the task and (ii) were selected for being the best at it among a sizable population. On the other hand, with next token prediction we're nowhere near that amount of optimization on the human side. (That said, I also agree that optimizing a model on next token prediction is very different from optimizing it for text coherence would be, if we could accomplish the latter.)
The short answer is, I don't know.
The long answer is, here are some possibilities, roughly ordered from "boring" to "weird":
The problem is that if implies that creates but you consider a counterfactual in which doesn't create then you get an inconsistent hypothesis i.e. a HUC which contains only 0. It is not clear what to do with that. In other words, the usual way of defining counterfactuals in IB (I tentatively named it "hard counterfactuals") only makes sense when the condition you're counterfactualizing on is something you have Knightian uncertainty about (which seems safe to assume if this condition is about your own future action but not safe to assume in genera...
it would be the best possible model of this type, at the task of language modeling on data sampled from the same distribution as MassiveText
Transformers a Turing complete, so "model of this type" is not much of a constraint. On the other hand, I guess it's theoretically possible that some weight matrices are inaccessible to current training algorithms no matter how much compute and data we have. It seems also possible that the scaling law doesn't go on forever, but phase-transitions somewhere (maybe very far) to a new trend which goes below the "irreducible" term.
Here is a way to construct many learnable undogmatic ontologies, including such with finite state spaces.
A deterministic partial environment (DPE) over action set A and observation set O is a pair (D,ϕ) where D⊆(O×A)∗ and ϕ:D→O s.t.
DPEs are equipped with a natural partial order. Namely, (D,ϕ)≤(E,ψ) when D⊆E and ϕ=ψ|D.
Let S ... (read more)