AI ALIGNMENT FORUM
AF

DanielVarga
000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
What’s up with LLMs representing XORs of arbitrary features?
DanielVarga2y12

I’ll say that a model linearly represents a binary feature f if there is a linear probe out of the model’s latent space which is accurate for classifying f

 

If a model linearly represents features a and b, then it automatically linearly represents a∧b and a∨b.

I think I misunderstand your definition. Let feature a be represented by x_1 > 0.5, and let feature b be represented by x_2 > 0.5. Let x_i be iid uniform [0, 1]. Isn't that a counterexample to (a and b) being linearly representable?

Reply1
No posts to display.