I think the way I would rule out my counterexample is by strengthening A3 to if and then there is ...
Q2: No. Counterexample: Suppose there's one outcome such that all lotteries are equally good, except for the lottery than puts probability 1 on , which is worse than the others.
It would kind of use assumption 3 inside step 1, but inside the syntax, rather than in the metalanguage. That is, step 1 involves checking that the number encoding "this proof" does in fact encode a proof of C. This can't be done if you never end up proving C.
One thing that might help make clear what's going on is that you can follow the same proof strategy, but replace "this proof" with "the usual proof of Lob's theorem", and get another valid proof of Lob's theorem, that goes like this: Suppose you can prove that []C->C, and let n be the number encodi...
If that's how it works, it doesn't lead to a simplified cartoon guide for readers who'll notice missing steps or circular premises; they'd have to first walk through Lob's Theorem in order to follow this "simplified" proof of Lob's Theorem.
It sounds to me like, in the claim "deep learning is uninterpretable", the key word in "deep learning" that makes this claim true is "learning", and you're substituting the similar-sounding but less true claim "deep neural networks are uninterpretable" as something to argue against. You're right that deep neural networks can be interpretable if you hand-pick the semantic meanings of each neuron in advance and carefully design the weights of the network such that these intended semantic meanings are correct, but that's not what deep learning is. The other t...
This seems related in spirit to the fact that time is only partially ordered in physics as well. You could even use special relativity to make a model for concurrency ambiguity in parallel computing: each processor is a parallel worldline, detecting and sending signals at points in spacetime that are spacelike-separated from when the other processors are doing these things. The database follows some unknown worldline, continuously broadcasts its contents, and updates its contents when it receives instructions to do so. The set of possible ways that the pro...
This post claims that having the necessary technical skills probably means grad-level education, and also that you should have a broad technical background. While I suppose these claims are probably both true, it's worth pointing out that there's a tension between them, in that PhD programs typically aim to develop narrow skillsets, rather than broad ones. Often the first year of a PhD program will focus on acquiring a moderately broad technical background, and then rapidly get progressively more specialized, until you're writing a thesis, at which point w...
the still-confusing revised slogan that all computable functions are continuous
For anyone who still finds this confusing, I think I can give a pretty quick explanation of this.
The reason I'd imagine it might sound confusing is that you can think of what seem like simple counterexamples. E.g. you can write a short function in your favorite programming language that takes a floating-point real as input, and returns 1 if the input is 0, and returns 0 otherwise. This appears to be a computation of the indicator function for {0}, which is discontinuous. But it ...
I think the assumption that multiple actions have nonzero probability in the context of a deterministic decision theory is a pretty big problem. If you come up with a model for where these nonzero probabilities are coming from, I don't think your argument is going to work.
For instance, your argument fails if these nonzero probabilities come from epsilon exploration. If the agent is forced to take every action with probability epsilon, and merely chooses which action to assign the remaining probability to, then the agent will indeed purchase the contract fo...
OK, here's my position.
As I said in the post, the real answer is that this argument simply does not apply if the agent knows its action. More generally: the argument applies precisely to those actions to which the agent ascribes positive probability (directly before deciding). So, it is possible for agents to maintain a difference between counterfactual and evidential expectations. However, I think it's rarely normatively correct for an agent to be in such a position.
Even though the decision procedure of CDT is deterministic, this does not mean that agents...
I don't see the connection to the Jeffrey-Bolker rotation? There, to get the shouldness coordinate, you need to start with the epistemic probability measure, and multiply it by utility; here, utility is interpreted as a probability distribution without reference to a probability distribution used for beliefs.
For individual ML models, sure, but not for classes of similar models. E.g. GPT-3 presumably was more expensive to train than GPT-2 as part of the cost to getting better results. For each of the proposals in the OP, training costs constrain how complex a model you can train, which in turn would affect performance.
I'm concerned about Goodhart's law on the acceptability predicate causing severe problems when the acceptability predicate is used in training. Suppose we take some training procedure that would otherwise result in an unaligned AI, and modify the training procedure by also including the acceptability predicate in the loss function during training. This results the end product that has been trained to appear to satisfy the intended version of the acceptability predicate. One way that could happen is if it actually does satisfy what was intended by...
Is there a difference between training competitiveness and performance competitiveness? My impression is that, for all of these proposals, however much resources you've already put into training, putting more resources into training will continue to improve performance. If this is the case, then whether a factor influencing competitiveness is framed as affecting the cost of training or as affecting the performance of the final product, either way it's just affecting the efficiency with which putting resources towards training leads to good perfor...
My impression is that, for all of these proposals, however much resources you've already put into training, putting more resources into training will continue to improve performance.
I think this is incorrect. Most training setups eventually flatline, or close to it (e.g. see AlphaZero's ELO curve), and need algorithmic or other improvements to do better.
This makes Savage a better comparison point, since the Savage axioms are more similar to the VNM framework while also trying to construct probability and utility together with one representation theorem.
Sure, I guess I just always talk about VNM instead of Savage because I never bothered to learn how Savage's version works. Perhaps I should.
As a representation theorem, this makes VNM weaker and JB stronger: VNM requires stronger assumptions (it requires that the preference structure include information about all these probability-distribution compari...
In the Savage framework, an outcome already encodes everything you care about.
Yes, but if you don't know which outcome is the true one, so you're considering a probability distribution over outcomes instead of a single outcome, then it still makes sense to speak of the probability that the true outcome has some feature. This is what I meant.
So the computation which seems to be suggested by Savage is to think of these maximally-specified outcomes, assigning them probability and utility, and then combining those to get expected utility. This seems...
I agree that the considerations you mentioned in your example are not changes in values, and didn't mean to imply that that sort of thing is a change in values. Instead, I just meant that such shifts in expectations are changes in probability distributions, rather than changes in events, since I think of such things in terms of how likely each of the possible outcomes are, rather than just which outcomes are possible and which are ruled out.
It seems to me that the Jeffrey-Bolker framework is a poor match for what's going on in peoples' heads when they make value judgements, compared to the VNM framework. If I think about how good the consequences of an action are, I try to think about what I expect to happen if I take that action (ie the outcome), and I think about how likely that outcome is to have various properties that I care about, since I don't know exactly what the outcome will be with certainty. This isn't to say that I literally consider probability distributions ...
I think we're going to have to back up a bit. Call the space of outcomes and the space of Turing machines . It sounds like you're talking about two functions, and . I was thinking of as the utility function we were talking about, but it seems you were thinking of .
You suggested should be computable but should not be. It seems to me that should certainly be computable (with the caveat that it might be a partial function, rather than a total function), as computation is the only thing Turing...
I'm not sure what it would mean for a real-valued function to be enumerable. You could call a function enumerable if there's a program that takes as input and enumerates the rationals that are less than , but I don't think this is what you want, since presumably if a Turing machine halting can generate a positive amount of utility that doesn't depend on the number of steps taken before halting, then it could generate a negative amount of utility by halting as well.
I think accepting the type of reasoning you g...
we need not assume there are "worlds" at all. ... In mathematics, it brings to mind pointless topology.
I don't think the motivation for this is quite the same as the motivation for pointless topology, which is designed to mimic classical topology in a way that Jeffrey-Bolker-style decision theory does not mimic VNM-style decision theory. In pointless topology, a continuous function of locales is a function from the lattice of open sets of to the lattice of open sets of . So a similar thing here would be to treat a utility funct...
Theorem: Fuzzy beliefs (as in https://www.alignmentforum.org/posts/Ajcq9xWi2fmgn8RBJ/the-credit-assignment-problem#X6fFvAHkxCPmQYB6v ) form a continuous DCPO. (At least I'm pretty sure this is true. I've only given proof sketches so far)
The relevant definitions:
A fuzzy belief over a set is a concave function such that (where is the space of probability distributions on ). Fuzzy beliefs are partially ordered by ...
Ok, I see what you mean about independence of irrelevant alternatives only being a real coherence condition when the probabilities are objective (or otherwise known to be equal because they come from the same source, even if there isn't an objective way of saying what their common probability is).
But I disagree that this makes VNM only applicable to settings in which all sources of uncertainty have objectively correct probabilities. As I said in my previous comment, you only need there to exist some source of objective probabilities, and you can then ...
I think you're underestimating VNM here.
only two of those four are relevant to coherence. The main problem is that the axioms relevant to coherence (acyclicity and completeness) do not say anything at all about probability
It seems to me that the independence axiom is a coherence condition, unless I misunderstand what you mean by coherence?
correctly point out problems with VNM
I'm curious what problems you have in mind, since I don't think VNM has problems that don't apply to similar coherence theorems.
VNM utility stipulates that agents h...
I do, however, believe that the single step cooperate-defect game which they use to come up with their factors seems like a very simple model for what will be a very complex system of interactions. For example, AI development will take place over time, and it is likely that the same companies will continue to interact with one another. Iterated games have very different dynamics, and I hope that future work will explore how this would affect their current recommendations, and whether it would yield new approaches to incentivizing cooperation.
It may be diff...
I object to the framing of the bomb scenario on the grounds that low probabilities of high stakes are a source of cognitive bias that trip people up for reasons having nothing to do with FDT. Consider the following decision problem: "There is a button. If you press the button, you will be given $100. Also, pressing the button has a very small (one in a trillion trillion) chance of causing you to burn to death." Most people would not touch that button. Using the same payoffs and probabilies in a scenario to challenge FDT thus exploits cognitive bi...
I don't know if I'm a simulation or a real person.
A possible response to this argument is that the predictor may be able to accurately predict the agent without explicitly simulating them. A possible counter-response to this is to posit that any sufficiently accurate model of a conscious agent is necessarily conscious itself, whether the model takes the form of an explicit simulation or not.
I think the counterfactuals used by the agent are the correct counterfactuals for someone else to use while reasoning about the agent from the outside, but not the correct counterfactuals for the agent to use while deciding what to do. After all, knowing the agent's source code, if you see it start to cross the bridge, it is correct to infer that it's reasoning is inconsistent, and you should expect to see the troll blow up the bridge. But while deciding what to do, the agent should be able to reason about purely causal effects of its counterfact...
The agent could be programmed to have a certain hard-coded ontology rather than searching through all possible hypotheses weighted by description length.
Are you worried about leaks from the abstract computational process into the real world, leaks from the real world into the abstract computational process, or both? (Or maybe neither and I'm misunderstanding your concern?)
There will definitely be tons of leaks from the abstract computational process into the real world; just looking at the result is already such a leak. The point is that the AI should have no incentive to optimize such leaks, not that the leaks don't exist, so the existence of additional leaks that we didn't know about shoul...
What I meant was that the computation isn't extremely long in the sense of description length, not in the sense of computation time. Also, we aren't doing policy search over the set of all turing machines, we're doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)
Wouldn't the set of all action sequences have lower description length than some large finite set of policies? There's also the potential problem that all of the policies in the large finite set you're searching over could be quite far from optimal.
Ok, understood on the second assumption. is not a function to , but a function to the set of -valued random variables, and your assumption is that this random variable is uncorrelated with certain claims about the outputs of certain policies. The intuitive explanation of the third condition made sense; my complaint was that even with the intended interpretation at hand, the formal statement made no sense to me.
I'm pretty sure you're assuming that is resolved on day , not that it is resolved eventually.
Searching over the set of all ...
This model seems very fatalistic, I guess? It seems somewhat incompatible with an agent that has preferences. (Perhaps you're suggesting we build an AI without preferences, but it doesn't sound like that.)
Ok, here's another attempt to explain what I meant. Somewhere in the platonic realm of abstract mathematical structures, there is a small world with physics quite a lot like ours, containing an AI running on some idealized computational hardware, and trying to arrange the rest of the small world so that it has some desired property. Human...
I suggest stating the result you're proving before giving the proof.
You have some unusual notation that I think makes some of this unnecessarily confusing. Instead of this underlined vs non-underlined thing, you should have different functions $ and , where the first maps action sequences to utilities, and the second maps a pair consisting of an action and a future policy to the utility of the action sequence beginning with , followed by , followed by the action sequence generated by . Your first assumption ...
The model I had in mind was that the AI and the toy world are both abstract computational processes with no causal influence from our world, and that we are merely simulating/spectating on both the AI itself and the toy world it optimizes. If the AI messes with people simulating it so that they end up simulating a similar AI with more compute, this can give it more influence over these peoples' simulation of the toy world the AI is optimizing, but it doesn't give the AI any more influence over the abstract computational process that it (another a...
I agree. I didn't mean to imply that I thought this step would be easy, and I would also be interested in more concrete ways of doing it. It's possible that creating a hereditarily restricted optimizer along the lines I was suggesting could end up being approximately as difficult as creating an aligned general-purpose optimizer, but I intuitively don't expect this to be the case.
There should be a chat icon on the bottom-right of the screen on Alignment Forum that you can use to talk to the admins (unless only people who have already been approved can see this?). You can also comment on LW (Alignment Forum posts are automatically crossposted to LW), and ask the admins to make it show up on Alignment Forum afterwards.
I don't think that specifying the property of importance is simple and helps narrow down S. I think that in order for predicting S to be important, S must be generated by a simple process. Processes that take large numbers of bits to specify are correspondingly rarely occurring, and thus less useful to predict.
Suppose that I just specify a generic feature of a simulation that can support life + expansion (the complexity of specifying "a simulation that can support life" is also paid by the intended hypothesis, so we can factor it out). Over a long enough time such a simulation will produce life, that life will spread throughout the simulation, and eventually have some control over many features of that simulation.
Oh yes, I see. That does cut the complexity overhead down a lot.
Once you've specified the agent, it just samples randomly from the di...
I didn't mean that an agenty Turing machine would find S and then decide that it wants you to correctly predict S. I meant that to the extent that predicting S is commonly useful, there should be a simple underlying reason why it is commonly useful, and this reason should give you a natural way of computing S that does not have the overhead of any agency that decides whether or not it wants you to correctly predict S.
This reasoning seems to rely on there being such strings S that are useful to predict far out of proportion to what you would expect from their complexity. But a description of the circumstance in which predicting S is so useful should itself give you a way of specifying S, so I doubt that this is possible.
I think decision problems with incomplete information are a better model in which to measure optimization power than deterministic decision problems with complete information are. If the agent knows exactly what payoffs it would get from each action, it is hard to explain why it might not choose the optimal one. In the example I gave, the first agent could have mistakenly concluded that the .9-utility action was better than the 1-utility action while making only small errors in estimating the consequences of each of its actions, while the second agent would need to make large errors in estimating the consequences of its actions in order to think that the .1-utility action was better than the 1-utility action.
I'm not convinced that the probability of S' could be pushed up to anything near the probability of S. Specifying an agent that wants to trick you into predicting S' rather than S with high probability when you see their common prefix requires specifying the agency required to plan this type of deception (which should be quite complicated), and specifying the common prefix of S and S' as the particular target for the deception (which, insofar as it makes sense to say that S is the "correct" continuation of the prefix, should h...
The multi-armed bandit problem is a many-round problem in which actions in early rounds provide information that is useful for later rounds, so it makes sense to explore to gain this information. That's different from using exploration in one-shot problems to make the counterfactuals well-defined, which is a hack.
Some undesirable properties of C-score:
It depends on how the space of actions are represented. If a set of very similar actions that achieve the same utility for the agent are merged into one action, this will change the agent's C-score.
It does not depend on the magnitudes of the agent's preferences, only on their orderings. Compare 2 agents: the first has 3 available actions, which would give it utilities 0, .9, and 1, respectively, and it picks the action that would give it utility .9. The second has 3 available actions, which would give it uti...
A related question is, whether it is possible to design an algorithm for strong AI based on simple mathematical principles, or whether any strong AI will inevitably be an enormous kludge of heuristics designed by trial and error. I think that we have some empirical support for the former, given that humans evolved to survive in a certain environment but succeeded to use their intelligence to solve problems in very different environments.
I don't understand this claim. It seems to me that human brains appear to be "an enormous kludge of heuristics designed by trial and error". Shouldn't the success of humans be evidence for the latter?
Oh, derp. You're right.