I operate by Crocker's rules.
I won't deliberately, derisively spread something just because you tried to point out an infohazard.
it is very interpretable to humans
Misunderstanding: I expect we can't construct a counterfactual planner because we can't pick out the compute core in the black-box learned model.
And my Eliezer's problem with counterfactual planning is that the plan may start by unleashing a dozen memetic, biological, technological, magical, political and/or untyped existential hazards on the world which then may not even be coordinated correctly when one of your safeguards takes out one of the resulting silicon entities.
If you don't wish to reply to Eliezer, I'm an other and also ask what incoherence allows what corrigibility. I expect counterfactual planning to fail for want of basic interpretability. It would also coherently plan about the planning world - my Eliezer says we might as well equivalently assume superintelligent musings about agency to drive human readers mad.
Someone needs to check if we can use ML to guess activations in one set of neurons from activations in another set of neurons. The losses would give straightforward estimates of such statistical quantities as mutual information. Generating inputs that have the same activations in a set of neurons illustrates what the set of neurons does. I might do this myself if nobody else does.
Suppose the bridge is safe iff there's a proof that the bridge is safe. Then you would forbid the reasoning "Suppose I cross. I must have proven it's safe. Then it's safe, and I get 10. Let's cross.", which seems sane enough in the face of Löb.
Translating to a tree of natural language descriptions and back lets you
Having the thing write papers is merely an existence proof of embedded agency being irrelevant except for deconfusion.
Intelligent agents causally responsible for your existence.
What do you mean you can think of this, I told this to you :D
Suppose instead of a timeline with probabilistic events, the coalition experiences the full tree of all possible futures - but we translate everything to preserve behavior. Then beliefs encode which timelines each member cares about, and bets trade influence (governance tokens) between timelines.
In category theory, one learns that good math is like kabbalah, where nothing is a coincidence. All short terms ought to mean something, and when everything fits together better than expected, that is a sign that one is on the right track, and that there is a pattern to formalize. and can be replaced by and . I expect that the latter formation is better because it is shorter. Its only direct effect would be that you would write instead of , so the previous sentence must cash out as this being a good thing. Indeed, it points out a direction in which to generalize. How does your math interact with quantilization? I plan to expand when I've had time to read all links.
I am not sure if I understand the question.
pi has form afV, V has form mfV, f is a long reused term. Expand recursion to get afmfmf... and mfmfmf.... Define E=fmE and you get pi=aE without writing f twice. Sure, you use V a lot but my intuition is that there should be some a priori knowable argument for putting the definitions your way or your theory is going to end up with the wrong prior.
Damn it, I came to write about the monad1 then saw the edit. You may want to add it to this list, and compare it with the other entries.
Here's a dissertation and blog post by Jared Tobin on using (X -> R) -> R
with flat reals to represent usual distributions in Haskell. He appears open to get hired.
Maybe you want a more powerful type system? I think Coq allows constructing that subtype of a type which satisfies a property. Agda's cubical type theory places a lot of emphasis for its for the unit interval. Might dependent types be enough express lipschitzness and concavity?
1: Spotted it during literature search on pushforwards to measure the distribution of the vector of all activations in a neural network for one input, given the known distribution of inputs to a GAN generator that outputs inputs to the first network. Which I started modeling (as ((Neurons -> R) -> R) -> R
) between you giving me that DT book and first reading about the technique in this post :).
Oh, I wasn't expecting you to have addressed the issue! 10.2.4 says L wouldn't be S if it were calculated from projected actions instead of given actions. How so? Mightn't it predict the given actions correctly?
You're right on all counts in your last paragraph.