AI ALIGNMENT FORUM
AF

Vikrant Varma
Ω54120
Message
Dialogue
Subscribe

Research Engineer at DeepMind.

Publications

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Mechanistic anomaly detection and ELK
Vikrant Varma3y20

To add some more concrete counter-examples:

  • deceptive reasoning is causally upstream of train output variance (e.g. because the model has read ARC's post on anomaly detection), so is included in π.
  • alien philosophy explains train output variance; unfortunately it also has a notion of object permanence we wouldn't agree with, which the (AGI) robber exploits
Reply
Knowledge is not just mutual information
Vikrant Varma3y00

Thanks for this sequence!

I don't understand why the computer case is a counterexample for mutual information, doesn't it depend on your priors (which don't know anything about the other background noise interacting with photons)?

Taking the example of a one-time pad, given two random bit strings A and B, if C = A ⊕ B, learning C doesn't tell you anything about A unless you already have some information about B. So I(C; A) = 0 when B is uniform and independent of A.

Over time, the photons bouncing off the object being sought and striking other objects will leave an imprint in every one of those objects that will have high mutual information with the position of the object being sought.

If our prior was very certain about any factors that could interact with photons, then indeed the resulting imprints would have high mutual information, but it seems like you can rescue mutual information here by saying that our prior is uncertain about these other factors so the resulting imprints are noisy as well.

On the other hand, it seems correct that an entity that did have a more certain prior over interacting factors would see photon imprints as accumulating knowledge (for example photographic film).

Reply
17MONA: Three Month Later - Updates and Steganography Without Optimization Pressure
3mo
0
33JumpReLU SAEs + Early Access to Gemma 2 SAEs
1y
3
39Improving Dictionary Learning with Gated Sparse Autoencoders
1y
32
40[Full Post] Progress Update #1 from the GDM Mech Interp Team
1y
3
36[Summary] Progress Update #1 from the GDM Mech Interp Team
1y
0
81Discussion: Challenges with Unsupervised LLM Knowledge Discovery
2y
11
50Explaining grokking through circuit efficiency
2y
10
19Refining the Sharp Left Turn threat model, part 2: applying alignment techniques
3y
5
36Threat Model Literature Review
3y
3
45Clarifying AI X-risk
3y
16
Load More