Alice is told two classical bits of information and sends a qubit to Bob, who can then recover the two classical bits. Again, it relies on Alice and Bob sharing a prepared pair beforehand. It's the opposite of quantum teleportation, where Alice sends two classical bits and Bob can recover a qubit.

First let's talk about bases. This is the usual basis: |00>, |01>, |10>, |11>. This is the Bell basis: (|00> + |11>)/√2, (|00> - |11>)/√2, (|10> + |01>)/√2, (|10> - |01>)/√2. Check for yourself that each two vectors are orthogonal to each other. To "measure a state in a different basis" means to apply a rotation from one basis into another, then measure. For example, if you have a state and you know that it's one of the Bell basis states, you can figure out which one, by rotating into the usual basis and measuring.

One cool thing about the Bell basis is that you can change any basis vector into any other basis vector by operations on the first qubit only. For example, rotating the first qubit by 90 degrees turns (|00> + |11>)/√2 into (|10> - |01>)√2. Flipping the first qubit gives (|10> + |01>)√2, and so on.

Now superdense coding is easy. Alice and Bob start by sharing two halves of a Bell state. Then depending on which two classical bits Alice needs to send, she either leaves the state alone or rotates into one of the other three, by operating only on her qubit. Then she sends her qubit to Bob, who now has both qubits and can rotate them into the usual basis and measure them, recovering the two classical bits.

It's not a very good name, maybe we should call it "sending qubits over a classical channel using a quantum one-time pad". The idea is that Alice and Bob are far apart from each other and have a prepared pair of qubits, and Alice also has some qubit in unknown state. She can measure it in a clever way and send some classical bits to Bob, which allows him to recreate the unknown qubit. Both qubits of the prepared pair are spent in the process.

Let's say the unknown qubit is a|0>+b|1>, and the prepared pair is (|00> + |11>)/√2. So the whole state is (a|0> + b|1>) ⊗ (|00> + |11>)/√2 = (a|000> + a|011> + b|100> + b|111>)/√2. Alice can access the first two qubits and Bob can access the third.

Alice flips the second qubit depending on the first, leading to (a|000> + a|011> + b|110> + b|101>)/√2.

Alice applies a combined reflection and rotation to the first qubit, which is known as Hadamard gate: |0> becomes (|0> + |1>)/√2, while |1> becomes (|0> - |1>)/√2. So the whole state becomes (a|000> + a|100> + a|011> + a|111> + b|010> - b|110> + b|001> - b|101>)/2.

We can rewrite that as ((|00> ⊗ (a|0> + b|1>)) + (|01> ⊗ (a|1> + b|0>)) + (|10> ⊗ (a|0> - b|1>)) + (|11> ⊗ (a|1> - b|0>)))/2.

Notice something funny? In the above state, measuring the first two qubits gives you exactly enough information to know which operation to apply to the third qubit to get a|0> + b|1>. So if Alice measures both her qubits and sends the two classical bits to Bob, he can reconstruct the unknown qubit on his end. And if Carol eavesdrops on the message without having access to Bob's qubit, she won't learn anything, because the four possible messages have probability 1/4 each, regardless of a and b.

If the unknown qubit was entangled with something else, that also survives the transfer, though proving that requires a bit more calculation.

Say we want our quantum computer to make calls to a black box function f(x) from bit strings to bits. There's a couple ways to formalize that. Phase oracle: |x> becomes -|x> if f(x), otherwise |x>. Binary oracle: |x> ⊗ |b> becomes |x> ⊗ |f(x) xor b>. These definitions only say what happens to basis vectors, and are extended to other vectors by linearity.

We can convert from a binary oracle to a phase oracle by using an extra qubit. Start with a state |x> ⊗ (|0> - |1>)/√2 and apply a binary oracle, we end up with (1/√2) ⋅ (|x1> - |x0> if f(x), otherwise |x0> - |x1>). But that's exactly (-|x> if f(x), otherwise |x>) ⊗ (|0> - |1>)/√2. So |x> has been changed as if by a phase oracle, and the extra qubit was left unchanged.

Grover's algorithm: we have a black box function from N-bit strings to bits, which returns true for only one possible input. We want to find that input. Classically that requires O(2^N) calls to the black box, because you can't do much better than brute force. With a quantum algorithm we can get O(√2^N) which is much faster.

Let's say X is the input we're trying to find, which is one of the basis vectors of the space. Let's also define a vector S = (|0> + |1>)/√2 ⊗ ... ⊗ (|0> + |1>)/√2, which is the sum of all 2^N basis vectors with equal amplitudes.

Initialize state as S

Apply phase oracle, corresponding to reflection through the plane perpendicular to X

Apply "Grover operator", corresponding to reflection through the plane perpendicular to S and then central symmetry around 0

Repeat steps 2 and 3 a bunch of times

To understand it geometrically, draw the 2D plane containing the vectors X and S. Their dot product is 1/√2^N, which is a small positive number, so the angle between them is a bit less than 90 degrees. If our state starts out as S, the net effect of (2) and (3) is a slight rotation toward X, and each time we repeat it, we rotate again by the same angle. The angle is 2⋅arcsin(1/√2^N), which is proportional to 2/√2^N, so after (pi/4)⋅√2^N iterations we're pretty close to X. Then we can measure and get X with high probability.

I glossed over implementing the Grover operator, but that shouldn't be too hard, since from the first half of this comment it should be kinda clear how to reflect a vector through a plane.

The last comment didn't have enough rules for calculating with mixed states. Here's some more:

To measure a mixed state, the probabilities of all bit strings are on the main diagonal of the density matrix. For example, the pure state |0> has density matrix ((1,0),(0,0)), so measuring it leads to 0 with certainty. The mixed state “|0> or |1> with probability 1⁄2 each” has density matrix ((1/2,0),(0,1/2)), so the probability of each outcome is 1/2. And the pure state ( |0> + |1> ) / √2 has density matrix ((1/2,1/2),(1/2,1/2)), so the probability of each outcome is also 1/2.

To apply a reversible logical operation to a mixed state, recall that each possible bit string corresponds to a basis vector. So applying a permutation to the possible bit strings is as simple as swapping around some rows and columns in the density matrix. For example, if we have a qubit |0> with density matrix ((1,0),(0,0)) and want to negate it, we need to swap the two rows and swap the two columns, leading to density matrix ((0,0),(0,1)).

You also can apply plain old probabilistic operations to a mixed state, like flip a coin and do something depending on the result. To do that, take a weighted sum of density matrices. For example, if we start with a qubit |0> and want to negate it depending on a coinflip, we take a weighted sum of ((1,0),(0,0)) and ((0,0),(0,1)), which leads to ((1/2,0),(0,1/2)).

To rotate a single qubit in a mixed state by an angle, the calculation is a bit involved. First you divide the density matrix into 2x2 blocks, according to which rows and columns correspond to bit strings that differ on only that qubit. Then in each block you perform a change of basis to ((cos(phi), -sin(phi)), (sin(phi), cos(phi))). For the simple case of phi = 45 degrees, each block ((a,b),(c,d)) becomes ((a+b+c+d,a+b-c-d),(a-b+c-d,a-b-c+d))/2. For example, the density matrix ((1,0),(0,0)) representing the pure state |0> becomes the matrix ((1/2,1/2),(1/2,1/2)) representing the pure state ( |0> + |1> ) / √2, as expected.

I just thought of a simple way to explain tensors. Imagine a linear function that accepts two numbers and returns a number, let's call it f(x,y). Except there are two ways to imagine it:

Linear in both arguments combined: f(1,2)+f(1,3)=f(2,5). Every such function has the form f(x,y)=ax+by for some a and b, so the space of such functions is 2-dimensional. We say that the Cartesian product of R^1 and R^1 is R^2, because 1+1=2.

Linear in each argument when the other is fixed: f(1,2)+f(1,3)=f(1,5). Every such function has the form f(x,y)=axy for some a, so the space of such functions is 1-dimensional. We say that the tensor product of R^1 and R^1 is R^1, because 1*1=1.

In this case the tensor product is lower dimensional than the Cartesian product. But if we take say R^3 and R^3, then the Cartesian product will be R^6 and the tensor product will be R^9, because it will have separate coefficients a_(ij)*x_i*y_j.

When you measure a qubit cos(φ)|0> + sin(φ)|1>, the result has variance (1-cos(4φ))/4. (I'm skipping over the trig calculations here and below.) If you have a choice between either measuring the qubit, or rotating it by an angle ψ and then measuring it, then the sum of variances of the two operations is at least (1-|cos(2ψ)|)/2. In particular, if ψ=π/4, the sum of variances is at least 1/2.

So no matter what state a qubit is in, and how precisely you know its state, there are two possible measurements such that the sum of variances of their results is at least 1/2. The reason is that the bases corresponding to these measurements are at an angle to each other, not aligned. Apparently in the real world, an object's "position" and "momentum" are two such measurements, so there's a limit on their joint precision.

You can also carry out one measurement and then the other, but that doesn't help - after the first measurement you have variance in the second. Moreover, after the second you have fresh variance in the first. This lets you get an infinite stream of fair coinflips from a single qubit: start with |0>, rotate by π/4, measure, rotate by π/4, measure...

Let's say you have a state ( |0> + |1> ) / √2 and want to rotate it by 45 degrees and measure it, expecting to get "1" with certainty. But a bug in your equipment affects some other unrelated qubit, which was |0> before and becomes |0 xor your qubit> afterward. So now you have ( |00> + |11> ) / √2. Now rotating your qubit leads to ( |00> + |10> + |11> - |01> ) / 2, and measuring gives "0" or "1" with probability 1/2 each. Some unrelated qubit being xor'ed with yours caused your qubit to become mixed.

In theory, this is recoverable: find the stray qubit and xor it with yours again. But what if you accidentally measured it? Then you'd have to xor your memory with the qubit, which sounds impractical. Worse, what if you measured it and learned about the problem only in case of "1"? This is why quantum computation is so fragile.

As a warmup, imagine the state ( |00> + |01> ) / √2 gets changed by a quantum operation into ( |00> + |11> ) / √2. The operation is "flip the first bit if the second is 1". After that, if we have only the first qubit to play with, it's statistically indistinguishable from a single qubit in the mixed state "|0> or |1> with probability 1/2 each". Parts of a pure state, considered in isolation, are mixed states.

Mixed states involve probabilities, which makes us think of measurement. And indeed, the operation above can be seen as either "use the first qubit to measure the second" or "entangle the first qubit with the second". When you're inside a quantum system, it looks like measurement; when you're outside and something inside is doing the measuring, it looks like entanglement. This lets us prove theorems using only linear operations, without worrying about measurement as a special case.

Moreover, every mixed state is a part of some pure state. Recall that every mixed state is an ellipsoid. We can rotate it to be axis-aligned, leading to a density matrix with nonnegative entries on the diagonal and zero everywhere else. These entries are the probabilities of getting each bit string, a.k.a. basis vector. Let's say each basis vector |x_i> has probability p_i. Now we can make a bigger system with twice the dimensions, and build a superposition of basis vectors |x_i> ⊗ |x_i> with amplitudes √p_i, all other vectors get zero. That's a pure state, and the first half of its qubits give our original mixed state, just like the first qubit of the pure state ( |00> + |11> ) / √2 gave us the mixed state "|0> or |1> with probability 1/2 each". This lets us prove theorems using only pure states, without worrying about mixed states.

With the preliminaries out of the way...

No-cloning theorem: let's say we have a qubit in unknown pure state and another qubit prepared as |0>. Then we can't copy the first into the second. Proof: assume we can. Let's say |x> is the rest of the universe. Then |00x> and |10x> should become |00y> and |11z> for some y and z. Then by linearity, (( |0> + |1> ) / √2) ⊗ |0x> becomes ( |00y> + |11z> ) / √2. But then measuring our qubits always gives either "00" or "11", so they aren't independent copies of ( |0> + |1> ) / √2 as we wanted. Done.

No-deleting theorem: let's say we have two qubits in identical unknown pure states. Then we can't erase one of them to |0> while leaving the other and the rest of the universe unchanged. Proof: if we could, we could do it in reverse and achieve cloning. Done.

I'm not sure these theorems and proofs are exactly right. Just looked up the rough theorem statements on Wikipedia and worked out the proofs here in the comment draft. If there are errors, I'm sure people will point them out.

(11/?)

Superdense coding.

Alice is told two classical bits of information and sends a qubit to Bob, who can then recover the two classical bits. Again, it relies on Alice and Bob sharing a prepared pair beforehand. It's the opposite of quantum teleportation, where Alice sends two classical bits and Bob can recover a qubit.

First let's talk about bases. This is the usual basis: |00>, |01>, |10>, |11>. This is the Bell basis: (|00> + |11>)/√2, (|00> - |11>)/√2, (|10> + |01>)/√2, (|10> - |01>)/√2. Check for yourself that each two vectors are orthogonal to each other. To "measure a state in a different basis" means to apply a rotation from one basis into another, then measure. For example, if you have a state and you know that it's one of the Bell basis states, you can figure out which one, by rotating into the usual basis and measuring.

One cool thing about the Bell basis is that you can change any basis vector into any other basis vector by operations on the first qubit only. For example, rotating the first qubit by 90 degrees turns (|00> + |11>)/√2 into (|10> - |01>)√2. Flipping the first qubit gives (|10> + |01>)√2, and so on.

Now superdense coding is easy. Alice and Bob start by sharing two halves of a Bell state. Then depending on which two classical bits Alice needs to send, she either leaves the state alone or rotates into one of the other three, by operating only on her qubit. Then she sends her qubit to Bob, who now has both qubits and can rotate them into the usual basis and measure them, recovering the two classical bits.

(10/?)

Quantum teleportation.

It's not a very good name, maybe we should call it "sending qubits over a classical channel using a quantum one-time pad". The idea is that Alice and Bob are far apart from each other and have a prepared pair of qubits, and Alice also has some qubit in unknown state. She can measure it in a clever way and send some classical bits to Bob, which allows him to recreate the unknown qubit. Both qubits of the prepared pair are spent in the process.

Let's say the unknown qubit is a|0>+b|1>, and the prepared pair is (|00> + |11>)/√2. So the whole state is (a|0> + b|1>) ⊗ (|00> + |11>)/√2 = (a|000> + a|011> + b|100> + b|111>)/√2. Alice can access the first two qubits and Bob can access the third.

Alice flips the second qubit depending on the first, leading to (a|000> + a|011> + b|110> + b|101>)/√2.

Alice applies a combined reflection and rotation to the first qubit, which is known as Hadamard gate: |0> becomes (|0> + |1>)/√2, while |1> becomes (|0> - |1>)/√2. So the whole state becomes (a|000> + a|100> + a|011> + a|111> + b|010> - b|110> + b|001> - b|101>)/2.

We can rewrite that as ((|00> ⊗ (a|0> + b|1>)) + (|01> ⊗ (a|1> + b|0>)) + (|10> ⊗ (a|0> - b|1>)) + (|11> ⊗ (a|1> - b|0>)))/2.

Notice something funny? In the above state, measuring the first two qubits gives you exactly enough information to know which operation to apply to the third qubit to get a|0> + b|1>. So if Alice measures both her qubits and sends the two classical bits to Bob, he can reconstruct the unknown qubit on his end. And if Carol eavesdrops on the message without having access to Bob's qubit, she won't learn anything, because the four possible messages have probability 1/4 each, regardless of a and b.

If the unknown qubit was entangled with something else, that also survives the transfer, though proving that requires a bit more calculation.

(9/?)

Black box functions and Grover's algorithm.

Say we want our quantum computer to make calls to a black box function f(x) from bit strings to bits. There's a couple ways to formalize that. Phase oracle: |x> becomes -|x> if f(x), otherwise |x>. Binary oracle: |x> ⊗ |b> becomes |x> ⊗ |f(x) xor b>. These definitions only say what happens to basis vectors, and are extended to other vectors by linearity.

We can convert from a binary oracle to a phase oracle by using an extra qubit. Start with a state |x> ⊗ (|0> - |1>)/√2 and apply a binary oracle, we end up with (1/√2) ⋅ (|x1> - |x0> if f(x), otherwise |x0> - |x1>). But that's exactly (-|x> if f(x), otherwise |x>) ⊗ (|0> - |1>)/√2. So |x> has been changed as if by a phase oracle, and the extra qubit was left unchanged.

Grover's algorithm: we have a black box function from N-bit strings to bits, which returns true for only one possible input. We want to find that input. Classically that requires O(2^N) calls to the black box, because you can't do much better than brute force. With a quantum algorithm we can get O(√2^N) which is much faster.

Let's say X is the input we're trying to find, which is one of the basis vectors of the space. Let's also define a vector S = (|0> + |1>)/√2 ⊗ ... ⊗ (|0> + |1>)/√2, which is the sum of all 2^N basis vectors with equal amplitudes.

Initialize state as S

Apply phase oracle, corresponding to reflection through the plane perpendicular to X

Apply "Grover operator", corresponding to reflection through the plane perpendicular to S and then central symmetry around 0

Repeat steps 2 and 3 a bunch of times

To understand it geometrically, draw the 2D plane containing the vectors X and S. Their dot product is 1/√2^N, which is a small positive number, so the angle between them is a bit less than 90 degrees. If our state starts out as S, the net effect of (2) and (3) is a slight rotation toward X, and each time we repeat it, we rotate again by the same angle. The angle is 2⋅arcsin(1/√2^N), which is proportional to 2/√2^N, so after (pi/4)⋅√2^N iterations we're pretty close to X. Then we can measure and get X with high probability.

I glossed over implementing the Grover operator, but that shouldn't be too hard, since from the first half of this comment it should be kinda clear how to reflect a vector through a plane.

(6/?)

The last comment didn't have enough rules for calculating with mixed states. Here's some more:

To measure a mixed state, the probabilities of all bit strings are on the main diagonal of the density matrix. For example, the pure state |0> has density matrix ((1,0),(0,0)), so measuring it leads to 0 with certainty. The mixed state “|0> or |1> with probability 1⁄2 each” has density matrix ((1/2,0),(0,1/2)), so the probability of each outcome is 1/2. And the pure state ( |0> + |1> ) / √2 has density matrix ((1/2,1/2),(1/2,1/2)), so the probability of each outcome is also 1/2.

To apply a reversible logical operation to a mixed state, recall that each possible bit string corresponds to a basis vector. So applying a permutation to the possible bit strings is as simple as swapping around some rows and columns in the density matrix. For example, if we have a qubit |0> with density matrix ((1,0),(0,0)) and want to negate it, we need to swap the two rows and swap the two columns, leading to density matrix ((0,0),(0,1)).

You also can apply plain old probabilistic operations to a mixed state, like flip a coin and do something depending on the result. To do that, take a weighted sum of density matrices. For example, if we start with a qubit |0> and want to negate it depending on a coinflip, we take a weighted sum of ((1,0),(0,0)) and ((0,0),(0,1)), which leads to ((1/2,0),(0,1/2)).

To rotate a single qubit in a mixed state by an angle, the calculation is a bit involved. First you divide the density matrix into 2x2 blocks, according to which rows and columns correspond to bit strings that differ on only that qubit. Then in each block you perform a change of basis to ((cos(phi), -sin(phi)), (sin(phi), cos(phi))). For the simple case of phi = 45 degrees, each block ((a,b),(c,d)) becomes ((a+b+c+d,a+b-c-d),(a-b+c-d,a-b-c+d))/2. For example, the density matrix ((1,0),(0,0)) representing the pure state |0> becomes the matrix ((1/2,1/2),(1/2,1/2)) representing the pure state ( |0> + |1> ) / √2, as expected.

I just thought of a simple way to explain tensors. Imagine a linear function that accepts two numbers and returns a number, let's call it f(x,y). Except there are two ways to imagine it:

Linear in both arguments combined: f(1,2)+f(1,3)=f(2,5). Every such function has the form f(x,y)=ax+by for some a and b, so the space of such functions is 2-dimensional. We say that the Cartesian product of R^1 and R^1 is R^2, because 1+1=2.

Linear in each argument when the other is fixed: f(1,2)+f(1,3)=f(1,5). Every such function has the form f(x,y)=axy for some a, so the space of such functions is 1-dimensional. We say that the tensor product of R^1 and R^1 is R^1, because 1*1=1.

In this case the tensor product is lower dimensional than the Cartesian product. But if we take say R^3 and R^3, then the Cartesian product will be R^6 and the tensor product will be R^9, because it will have separate coefficients a_(ij)*x_i*y_j.

Edit: no point asking this question here.

(12/?)

The uncertainty principle, or something like it.

When you measure a qubit cos(φ)|0> + sin(φ)|1>, the result has variance (1-cos(4φ))/4. (I'm skipping over the trig calculations here and below.) If you have a choice between either measuring the qubit, or rotating it by an angle ψ and then measuring it, then the sum of variances of the two operations is at least (1-|cos(2ψ)|)/2. In particular, if ψ=π/4, the sum of variances is at least 1/2.

So no matter what state a qubit is in, and how precisely you know its state, there are two possible measurements such that the sum of variances of their results is at least 1/2. The reason is that the bases corresponding to these measurements are at an angle to each other, not aligned. Apparently in the real world, an object's "position" and "momentum" are two such measurements, so there's a limit on their joint precision.

You can also carry out one measurement and then the other, but that doesn't help - after the first measurement you have variance in the second. Moreover, after the second you have fresh variance in the first. This lets you get an infinite stream of fair coinflips from a single qubit: start with |0>, rotate by π/4, measure, rotate by π/4, measure...

(8/?)

Decoherence.

Let's say you have a state ( |0> + |1> ) / √2 and want to rotate it by 45 degrees and measure it, expecting to get "1" with certainty. But a bug in your equipment affects some other unrelated qubit, which was |0> before and becomes |0 xor your qubit> afterward. So now you have ( |00> + |11> ) / √2. Now rotating your qubit leads to ( |00> + |10> + |11> - |01> ) / 2, and measuring gives "0" or "1" with probability 1/2 each. Some unrelated qubit being xor'ed with yours caused your qubit to become mixed.

In theory, this is recoverable: find the stray qubit and xor it with yours again. But what if you accidentally measured it? Then you'd have to xor your memory with the qubit, which sounds impractical. Worse, what if you measured it and learned about the problem only in case of "1"? This is why quantum computation is so fragile.

(7/?)

No-cloning and no-deleting theorems.

As a warmup, imagine the state ( |00> + |01> ) / √2 gets changed by a quantum operation into ( |00> + |11> ) / √2. The operation is "flip the first bit if the second is 1". After that, if we have only the first qubit to play with, it's statistically indistinguishable from a single qubit in the mixed state "|0> or |1> with probability 1/2 each". Parts of a pure state, considered in isolation, are mixed states.

Mixed states involve probabilities, which makes us think of measurement. And indeed, the operation above can be seen as either "use the first qubit to measure the second" or "entangle the first qubit with the second". When you're inside a quantum system, it looks like measurement; when you're outside and something inside is doing the measuring, it looks like entanglement. This lets us prove theorems using only linear operations, without worrying about measurement as a special case.

Moreover, every mixed state is a part of some pure state. Recall that every mixed state is an ellipsoid. We can rotate it to be axis-aligned, leading to a density matrix with nonnegative entries on the diagonal and zero everywhere else. These entries are the probabilities of getting each bit string, a.k.a. basis vector. Let's say each basis vector |x_i> has probability p_i. Now we can make a bigger system with twice the dimensions, and build a superposition of basis vectors |x_i> ⊗ |x_i> with amplitudes √p_i, all other vectors get zero. That's a pure state, and the first half of its qubits give our original mixed state, just like the first qubit of the pure state ( |00> + |11> ) / √2 gave us the mixed state "|0> or |1> with probability 1/2 each". This lets us prove theorems using only pure states, without worrying about mixed states.

With the preliminaries out of the way...

No-cloning theorem: let's say we have a qubit in unknown pure state and another qubit prepared as |0>. Then we can't copy the first into the second. Proof: assume we can. Let's say |x> is the rest of the universe. Then |00x> and |10x> should become |00y> and |11z> for some y and z. Then by linearity, (( |0> + |1> ) / √2) ⊗ |0x> becomes ( |00y> + |11z> ) / √2. But then measuring our qubits always gives either "00" or "11", so they aren't independent copies of ( |0> + |1> ) / √2 as we wanted. Done.

No-deleting theorem: let's say we have two qubits in identical unknown pure states. Then we can't erase one of them to |0> while leaving the other and the rest of the universe unchanged. Proof: if we could, we could do it in reverse and achieve cloning. Done.

I'm not sure these theorems and proofs are exactly right. Just looked up the rough theorem statements on Wikipedia and worked out the proofs here in the comment draft. If there are errors, I'm sure people will point them out.