Summary
We now resume your regularly scheduled LessWrong tradition of decision theory posting.
Just the first and last post will be on Alignment Forum, and the whole thing will be linked together.
Epistemic Status: This is mostly just recapping old posts so far. If you're a decision-theory veteran, new stuff only starts arising in the "Computational Intractability" section and further down.
You may have heard of a thing called Updateless Decision Theory. It's been discussed for over a decade by now, but progress on rendering it down into something that could maybe someday be run on a computer has been very slow. Last winter, I decided to try helping out Abram and Martin with their efforts at actually formalizing it into some sort of algorithm that nice things could be...
I don't expect this post to contain anything novel. But from talking to others it seems like some of what I have to say in this post is not widely known, so it seemed worth writing.
In this post I'm defining superposition as: A representation with more features than neurons, achieved by encoding the features as almost orthogonal vectors in neuron space.
One reason to expect superposition in neural nets (NNs), is that for large , has many more than almost orthogonal directions. On the surface, this seems obviously useful for the NN to exploit. However, superposition is not magic. You don't actually get to put in more information, the gain you get from having more feature directions has to be paid for some other way.
All the math in this post is...
The math in the post is super hand-wavey, so I don't expect the result to be exactly correct. However in your example, l up to 100 should be ok, since there is no super position. 2.7 is almost 2 orders of magnitude off, which is not great.
Looking into what is going on: I'm basing my results on the Johnson–Lindenstrauss lemma, which gives an upper bound on the interference. In the post I'm assuming that the actual interference is order of magnitude the same as the this upper bound. This assumption is clearly fails in your example since the interference betw...
Recently someone either suggested to me (or maybe told me they or someone where going to do this?) that we should train AI on legal texts, to teach it human values. Ignoring the technical problem of how to do this, I'm pretty sure legal text are not the right training data. But at the time, I could not clearly put into words why. Todays SMBC explains this for me:
Saturday Morning Breakfast Cereal - Law (smbc-comics.com)
Law is not a good representation or explanation of most of what we care about, because it's not trying to be. Law is mainly focused on the c...
(Note: I wrote this with editing help from Rob and Eliezer. Eliezer's responsible for a few of the paragraphs.)
A common confusion I see in the tiny fragment of the world that knows about logical decision theory (FDT/UDT/etc.), is that people think LDT agents are genial and friendly for each other.[1]
One recent example is Will Eden’s tweet about how maybe a molecular paperclip/squiggle maximizer would leave humanity a few stars/galaxies/whatever on game-theoretic grounds. (And that's just one example; I hear this suggestion bandied around pretty often.)
I'm pretty confident that this view is wrong (alas), and based on a misunderstanding of LDT. I shall now attempt to clear up that confusion.
To begin, a parable: the entity Omicron (Omega's little sister) fills box A with $1M and box B with...
If you can put uploaded human-level agents with evolved-organism preferences in your simulations, you can just win outright (eg by having them spend subjective millennia doing FAI research for you). If you can’t, that will be a very obvious difference between your simulations and the real world.
I disagree. If your simulation is perfectly realistic, the simulated humans might screw up at alignment and create an unfriendly superintelligence, for much the same reason real humans might.
Also, if the space of goals that evolution + culture can...
Ok, I misunderstood. (See also my post on the relation between local and global optimality, and another post on coordinating local decisions using MCMC)