All Posts

Sorted by Magic (New & Upvoted)

Sunday, March 29th 2020
Sun, Mar 29th 2020

No posts for March 29th 2020

Friday, March 27th 2020
Fri, Mar 27th 2020

No posts for March 27th 2020
Shortform
3Ben Pace2dHot take: The actual resolution to the simulation argument is that most advanced civilizations don't make loads of simulations. Two things make this make sense: * Firstly, it only matters if they make unlawful simulations. If they make lawful simulations, then it doesn't matter whether you're in a simulation or a base reality, all of your decision theory and incentives are essentially the same, you want to take the same decisions in all of the universes. So you can make lots of lawful simulations, that's fine. * Secondly, they will strategically choose to not make too many unlawful simulations (to the level where the things inside are actually conscious). This is because to do so would induce anthropic uncertainty over themselves. Like, if the decision-theoretical answer is to not induce anthropic uncertainty over yourself about whether you're in a simulation, then by TDT everyone will choose not to make unlawful simulations. I think this is probably wrong in lots of ways but I didn't stop to figure them out.

Tuesday, March 24th 2020
Tue, Mar 24th 2020

No posts for March 24th 2020
Shortform
4Vanessa Kosoy5dLearning theory distinguishes between two types of settings: realizable and agnostic (non-realizable). In a realizable setting, we assume that there is a hypothesis in our hypothesis class that describes the real environment perfectly. We are then concerned with the sample complexity and computational complexity of learning the correct hypothesis. In an agnostic setting, we make no such assumption. We therefore consider the complexity of learning the best approximation of the real environment. (Or, the best reward achievable by some space of policies.) In offline learning and certain varieties of online learning, the agnostic setting is well-understood. However, in more general situations it is poorly understood. The only agnostic result for long-term forecasting that I know is Shalizi 2009 [https://projecteuclid.org/euclid.ejs/1256822130], however it relies on ergodicity assumptions that might be too strong. I know of no agnostic result for reinforcement learning. Quasi-Bayesianism was invented to circumvent the problem. Instead of considering the agnostic setting, we consider a "quasi-realizable" setting: there might be no perfect description of the environment in the hypothesis class, but there are some incomplete descriptions. But, so far I haven't studied quasi-Bayesian learning algorithms much, so how do we know it is actually easier than the agnostic setting? Here is a simple example to demonstrate that it is. Consider a multi-armed bandit, where the arm space is [0,1]. First, consider the follow realizable setting: the reward is a deterministic function r:[0,1]→[0,1] which is known to be a polynomial of degree d at most. In this setting, learning is fairly easy: it is enough to sample d+1 arms in order to recover the reward function and find the optimal arm. It is a special case of the general observation that learning is tractable when the hypothesis space is low-dimensional in the appropriate sense. Now, consider a closely related agnostic setting.

Monday, March 23rd 2020
Mon, Mar 23rd 2020

Shortform
9Vanessa Kosoy6dI have [https://www.alignmentforum.org/posts/Ajcq9xWi2fmgn8RBJ/the-credit-assignment-problem#X6fFvAHkxCPmQYB6v] repeatedly [https://www.alignmentforum.org/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform#TzkG7veQAMMRNh3Pg] argued [https://www.alignmentforum.org/posts/3qXE6fK47JhSfkpnB/do-sufficiently-advanced-agents-use-logic#fEKc88NbDWZavkW9o] for a departure from pure Bayesianism that I call "quasi-Bayesianism". But, coming from a LessWrong-ish background, it might be hard to wrap your head around the fact Bayesianism is somehow deficient. So, here's another way to understand it, using Bayesianism's own favorite trick: Dutch booking! Consider a Bayesian agent Alice. Since Alice is Bayesian, ey never randomize: ey just follow a Bayes-optimal policy for eir prior, and such a policy can always be chosen to be deterministic. Moreover, Alice always accepts a bet if ey can choose which side of the bet to take: indeed, at least one side of any bet has non-negative expected utility. Now, Alice meets Omega. Omega is very smart so ey know more than Alice and moreover ey can predict Alice. Omega offers Alice a series of bets. The bets are specifically chosen by Omega s.t. Alice would pick the wrong side of each one. Alice takes the bets and loses, indefinitely. Alice cannot escape eir predicament: ey might know, in some sense, that Omega is cheating em, but there is no way within the Bayesian paradigm to justify turning down the bets. A possible counterargument is, we don't need to depart far from Bayesianism to win here. We only need to somehow justify randomization, perhaps by something like infinitesimal random perturbations of the belief state (like with reflective oracles). But, in a way, this is exactly what quasi-Bayesianism does: a quasi-Bayes-optimal policy is in particular Bayes-optimal when the prior is taken to be in Nash equilibrium of the associated zero-sum game. However, Bayes-optimality underspecifies the policy: not every optimal reply to a Nash equil

Sunday, March 22nd 2020
Sun, Mar 22nd 2020

No posts for March 22nd 2020

Load More Days