Week Of Sunday, November 3rd 2019

Week Of Sun, Nov 3rd 2019

Frontpage Posts

Shortform [Beta]

1[comment deleted]16d (13/?)
Global phase.
Consider a quantum system that's even smaller than a qubit. Something with only
one possible measurement outcome, instead of two. Let's call it "quantum nit",
or "qunit", by analogy with "unit" in typed programming, which means a type with
one element.
A qunit has only one possible mixed state, with the density matrix ((1)). But
its pure states are more complicated: a one-qunit pure state is a number whose
norm is 1, so either 1 or -1 (or more if we allow complex numbers). The same is
true for an N-qunit pure state - it's also a single number whose norm is 1! The
reason is that the tensor product of N 1-dimensional spaces is 1-dimensional,
not N-dimensional. So the joint state of all qunits in the world is a single
number, which is called "global phase".
Now imagine we have a system with a bunch of qubits, and take its tensor product
with a qunit. We can move a scalar factor from one side of a tensor product to
the other, and that shouldn't change anything. So global phase can't affect any
measurement, it's just an extra degree of freedom in the theory.

116d (12/?)
The uncertainty principle, or something like it.
When you measure a qubit cos(φ)|0> + sin(φ)|1>, the result has variance
(1-cos(4φ))/4. (I'm skipping over the trig calculations here and below.) If you
have a choice between either measuring the qubit, or rotating it by an angle ψ
and then measuring it, then the sum of variances of the two operations is at
least (1-|cos(2ψ)|)/2. In particular, if ψ=π/4, the sum of variances is at least
1/2.
So no matter what state a qubit is in, and how precisely you know its state,
there are two possible measurements such that the sum of variances of their
results is at least 1/2. The reason is that the bases corresponding to these
measurements are at an angle to each other, not aligned. Apparently in the real
world, an object's "position" and "momentum" are two such measurements, so
there's a limit on their joint precision.
You can also carry out one measurement and then the other, but that doesn't help
- after the first measurement you have variance in the second. Moreover, after
the second you have fresh variance in the first. This lets you get an infinite
stream of fair coinflips from a single qubit: start with |0>, rotate by π/4,
measure, rotate by π/4, measure...

112d It seems useful to consider agents that reason in terms of an unobservable
ontology, and may have uncertainty over what this ontology is. In particular, in
Dialogic RL
[https://www.alignmentforum.org/posts/dPmmuaz9szk26BkmD/vanessa-kosoy-s-shortform#Wi65Ahs9abL63gPSe]
, the user's preferences are probably defined w.r.t. an ontology that is
unobservable by the AI (and probably unobservable by the user too) which the AI
has to learn (and the user is probably uncertain about emself). However,
onotlogies are more naturally thought of as objects in a category than as
elements in a set. The formalization of an "ontology" should probably be a POMDP
or a suitable Bayesian network. A POMDP involves an arbitrary set of states, so
it's not an element in a set, and the class of POMDPs can be naturally made into
a category. Therefore, there is need for defining the notion of a probability
measure over a category. Of course we can avoid this by enumerating the states,
considering the set of all possible POMDPs w.r.t. this enumeration and then
requiring the probability measure to be invariant w.r.t. state relabeling.
However, the category theoretic point of view seems more natural, so it might be
worth fleshing out.
Ordinary probably measures are defined on measurable spaces. So, first we need
to define the analogue of "measurable structure" (σ-algebra) for categories. Fix
a category C. Denote Meas the category of measurable spaces. A measurable
structure on C is then specified by providing a Grothendick fibration
[https://ncatlab.org/nlab/show/Grothendieck+fibration] B:MFC→Meas and an
equivalence E:B−1(pt)→C. Here, B−1(pt) stands for the essential fiber
[https://ncatlab.org/nlab/show/essential+fiber] of B over the one point space
pt∈Meas. The intended interpretation of MFC is, the category of families of
objects in C indexed by measurable spaces. The functor B is supposed to extract
the base (index space) of the family. We impose the following conditions on MFC
and B:
Given A∈

Week Of Sunday, October 27th 2019

Week Of Sun, Oct 27th 2019

Frontpage Posts

Shortform [Beta]Load More (5/11)

620d I've been thinking a lot about 'parallel economies' recently. One of the main
differences between 'slow takeoff' and 'fast takeoff' predictions is whether AI
is integrated into the 'human civilization' economy or constructing a separate
'AI civilization' economy. Maybe it's worth explaining a bit more what I mean by
this: you can think of 'economies' as collections of agents who trade with each
other. Often it will have a hierarchical structure, and where we draw the lines
are sort of arbitrary. Imagine a person who works at a company and participates
in its internal economy, and the company participates in national and global
economies, and the person participates in those economies as well. A better
picture has a very dense graph with lots of nodes and links between groups of
nodes whose heaviness depends on the number of links between nodes in those
groups.
As Adam Smith argues, the ability of an economy to support specialization of
labor depends on its size. If you have an island with a single inhabitant, it
doesn't make sense to fully employ a farmer (since a full-time farmer can
generate much more food than a single person could eat), for a village with 100
inhabitants it doesn't make sense to farm more than would feed a hundred mouths,
and so on. But as you make more and more of a product, investments that have a
small multiplicative payoff become better and better, to the point that a planet
with ten billion people will have massive investment in farming specialization
that make it vastly more efficient per unit than the village farming system. So
for much of history, increased wealth has been driven by this increased
specialization of labor, which was driven by the increased size of the economy
(both through population growth and decreased trade barriers widening the links
between economies until they effectively became one economy).
One reason to think economies will remain integrated is because increased size
benefits all actors in the economy on net; a

522d One challenge for theories of embedded agency over Cartesian theories is that
the 'true dynamics' of optimization (where a function defined over a space
points to a single global maximum, possibly achieved by multiple inputs) are
replaced by the 'approximate dynamics'. But this means that by default we get
the hassles associated with numerical approximations, like when integrating
differential equations. If you tell me that you're doing Euler's Method on a
particular system, I need to know lots about the system and about the particular
hyperparameters you're using to know how well you'll approximate the true
solution. This is the toy version of trying to figure out how a human reasons
through a complicated cognitive task; you would need to know lots of details
about the 'hyperparameters' of their process to replicate their final result.
This makes getting guarantees hard. We might be able to establish what the
'sensible' solution range for a problem is, but establishing what algorithms can
generate sensible solutions under what parameter settings seems much harder.
Imagine trying to express what the set of deep neural network parameters are
that will perform acceptably well on a particular task (first for a particular
architecture, and then across all architectures!).

219d (11/?)
Superdense coding.
Alice is told two classical bits of information and sends a qubit to Bob, who
can then recover the two classical bits. Again, it relies on Alice and Bob
sharing a prepared pair beforehand. It's the opposite of quantum teleportation,
where Alice sends two classical bits and Bob can recover a qubit.
First let's talk about bases. This is the usual basis: |00>, |01>, |10>, |11>.
This is the Bell basis: (|00> + |11>)/√2, (|00> - |11>)/√2, (|10> + |01>)/√2,
(|10> - |01>)/√2. Check for yourself that each two vectors are orthogonal to
each other. To "measure a state in a different basis" means to apply a rotation
from one basis into another, then measure. For example, if you have a state and
you know that it's one of the Bell basis states, you can figure out which one,
by rotating into the usual basis and measuring.
One cool thing about the Bell basis is that you can change any basis vector into
any other basis vector by operations on the first qubit only. For example,
rotating the first qubit by 90 degrees turns (|00> + |11>)/√2 into (|10> -
|01>)√2. Flipping the first qubit gives (|10> + |01>)√2, and so on.
Now superdense coding is easy. Alice and Bob start by sharing two halves of a
Bell state. Then depending on which two classical bits Alice needs to send, she
either leaves the state alone or rotates into one of the other three, by
operating only on her qubit. Then she sends her qubit to Bob, who now has both
qubits and can rotate them into the usual basis and measure them, recovering the
two classical bits.

220d (10/?)
Quantum teleportation.
It's not a very good name, maybe we should call it "sending qubits over a
classical channel using a quantum one-time pad". The idea is that Alice and Bob
are far apart from each other and have a prepared pair of qubits, and Alice also
has some qubit in unknown state. She can measure it in a clever way and send
some classical bits to Bob, which allows him to recreate the unknown qubit. Both
qubits of the prepared pair are spent in the process.
Let's say the unknown qubit is a|0>+b|1>, and the prepared pair is (|00> +
|11>)/√2. So the whole state is (a|0> + b|1>) ⊗ (|00> + |11>)/√2 = (a|000> +
a|011> + b|100> + b|111>)/√2. Alice can access the first two qubits and Bob can
access the third.
1. Alice flips the second qubit depending on the first, leading to (a|000> +
a|011> + b|110> + b|101>)/√2.
2. Alice applies a combined reflection and rotation to the first qubit, which
is known as Hadamard gate: |0> becomes (|0> + |1>)/√2, while |1> becomes
(|0> - |1>)/√2. So the whole state becomes (a|000> + a|100> + a|011> +
a|111> + b|010> - b|110> + b|001> - b|101>)/2.
3. We can rewrite that as ((|00> ⊗ (a|0> + b|1>)) + (|01> ⊗ (a|1> + b|0>)) +
(|10> ⊗ (a|0> - b|1>)) + (|11> ⊗ (a|1> - b|0>)))/2.
4. Notice something funny? In the above state, measuring the first two qubits
gives you exactly enough information to know which operation to apply to the
third qubit to get a|0> + b|1>. So if Alice measures both her qubits and
sends the two classical bits to Bob, he can reconstruct the unknown qubit on
his end. And if Carol eavesdrops on the message without having access to
Bob's qubit, she won't learn anything, because the four possible messages
have probability 1/4 each, regardless of a and b.
If the unknown qubit was entangled with something else, that also survives the
transfer, though proving that requires a bit more calculation.

220d (9/?)
Black box functions and Grover's algorithm.
Say we want our quantum computer to make calls to a black box function f(x) from
bit strings to bits. There's a couple ways to formalize that. Phase oracle: |x>
becomes -|x> if f(x), otherwise |x>. Binary oracle: |x> ⊗ |b> becomes |x> ⊗
|f(x) xor b>. These definitions only say what happens to basis vectors, and are
extended to other vectors by linearity.
We can convert from a binary oracle to a phase oracle by using an extra qubit.
Start with a state |x> ⊗ (|0> - |1>)/√2 and apply a binary oracle, we end up
with (1/√2) ⋅ (|x1> - |x0> if f(x), otherwise |x0> - |x1>). But that's exactly
(-|x> if f(x), otherwise |x>) ⊗ (|0> - |1>)/√2. So |x> has been changed as if by
a phase oracle, and the extra qubit was left unchanged.
Grover's algorithm: we have a black box function from N-bit strings to bits,
which returns true for only one possible input. We want to find that input.
Classically that requires O(2^N) calls to the black box, because you can't do
much better than brute force. With a quantum algorithm we can get O(√2^N) which
is much faster.
Let's say X is the input we're trying to find, which is one of the basis vectors
of the space. Let's also define a vector S = (|0> + |1>)/√2 ⊗ ... ⊗ (|0> +
|1>)/√2, which is the sum of all 2^N basis vectors with equal amplitudes.
1. Initialize state as S
2. Apply phase oracle, corresponding to reflection through the plane
perpendicular to X
3. Apply "Grover operator", corresponding to reflection through the plane
perpendicular to S and then central symmetry around 0
4. Repeat steps 2 and 3 a bunch of times
To understand it geometrically, draw the 2D plane containing the vectors X and
S. Their dot product is 1/√2^N, which is a small positive number, so the angle
between them is a bit less than 90 degrees. If our state starts out as S, the
net effect of (2) and (3) is a slight rotation toward X, and each time we repeat
it, we