AI ALIGNMENT FORUM
AF

171
Linda Linsefors
Ω383251132
Message
Dialogue
Subscribe

Hi, I am a Physicist, an Effective Altruist and AI Safety student/researcher.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
33Circuits in Superposition 2: Now with Less Wrong Math
3mo
0
15AI Safety Camp 10
11mo
6
5Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)
1y
1
31AISC9 has ended and there will be an AISC10
1y
0
22Some costs of superposition
2y
8
6AI Safety Camp 2024
2y
0
7Projects I would like to see (possibly at AI Safety Camp)
2y
5
7Apply to lead a project during the next virtual AI Safety Camp
2y
0
17AI Safety Camp, Virtual Edition 2023
3y
2
12How I think about alignment
3y
7
Load More
Outer Alignment
2 years ago
(+9/-80)
Inner Alignment
2 years ago
(+13/-84)
2Linda Linsefors's Shortform
6y
36
Linda Linsefors's Shortform
Linda Linsefors24d20

Estimated MSE loss for three diffrent ways of embedding features into neuons, when there are more possible features than neurons.

I've typed up some math notes for how much MSE loss we should expect for random embedings, and some other alternative embedings, for when you have more features than neurons. I don't have a good sense for how ledgeble this is to anyone but me.

Note that neither of these embedings are optimal. I belive that the optimal embeding for minimising MSE loss is to store the features in almost orthogonal directions, which is similar to random embedings but can be optimised more. But I also belive that MSE loss don't prefeer this solution very much, which means that when there are other tradeofs, MSE loss might not be enough to insentivise superposstion. 

This does not mean we should not expect superpossition in real network.

  1. Many networks uses other loss functions, e.g. cross-entropy.
  2. Even if the loss is MSE on the final output, this does not mean MSE is the right loss for modeling the dynamics in the middle of the network.

 

Setup and notation

  • T features
  • D neurons
  • z active featrues

Assuming: 

  • z≪D<T

True feature values:

  • y = 1               for active featrus
  • y = 0               for inactive features

 

Using random embedding directions (superpossition)

Estimated values:

  • ^y=a + ϵ        where      E[ϵ2]=(z−1)a2/D          for active features
  • ^y=ϵ               where      E[ϵ2]=za2/D                    for active features

Total Mean Squared Error (MSE)

MSErand=z((1−a)2+(z−1)a2D)+(T−z)za2D≈z(1−a)2+zTDa2

This is minimised by 

a=DT+D

Making MSE

MSErand=zTT+D=z(1−DT+D)

 

One feature per neuron

We emebd a single feature in each neuron, and the rest of the features, are just not represented.

Estimated values:

  • ^y=y              for represented features
  • ^y=0              for non represented features

Total Mean Squared Error (MSE)

MSEsingle=zT−DD

 

One neuron per feature

We embed each feature in a single neuron.

  • ^y=a∑y              where the sum is over all feature that shares the same neuron

We assume that the probability of co-activated features on the same neuron is small enough to ignore. We also assume that every neuron is used at least once. Then for any active neuron, the expected number of inactive neurons that will be wrongfully activated, are T−DD, giving us the MSE loss for this case as

MSEmulti=z((1−a)2+(TD−1)a2)

We can already see that this is smaller than MSErand, but let's also calculate what the minimum value is. MSErand is minimised by

a=DT

Making MSE

MSErand=z(1−DT)
Reply
Linda Linsefors's Shortform
Linda Linsefors2mo11

I was pretty sure this exist, maybe even built into LW. It seems like an obvious thing, and there are lots of parts of LW that for some reason is hard to find from the fron page. Googleing "lesswrong dictionary" yealded

https://www.lesswrong.com/w/lesswrong-jargon 

https://www.lesswrong.com/w/r-a-z-glossary

https://www.lesswrong.com/posts/fbv9FWss6ScDMJiAx/appendix-jargon-dictionary 

Reply
Linda Linsefors's Shortform
Linda Linsefors2mo*1320

In Defence of Jargon

People used to say (maybe still do? I'm not sure) that we should use less jargon to increase accessibility to writings on LW, i.e. make it easier to outsider to read. 

I think this is mostly a confused take. The underlying problem is inferential distance. Geting rid of the jargon is actually unhelpful since it hides the fact that there is an inferential distance. 

When I want to explain physics to someone and I don't know what they already know, I start by listing relevant physics jargon and ask them what words they know. This is a super quick way to find out what concepts they already have, and let me know what level I should start on. This work great in Swedish since in Swedish most physics words are distinct from ordinary words, but unfortunately don't work as well in English, which means I have to probe a bit deeper than just checking if they recognise the words.

Jargon isn't typically just synonyms to some common word, and when it is, I predict that it didn't start out that way, but that the real meaning was destroyed by too many people not bothering to learn the word properly. This is because people invent new words (jargon) when they need a new word to point to a new concept that didn't already have a word.

I seen some post by people who are not native to LW trying to fit in and be accepted by using LW jargon, without bothering to understand the underling concepts or even seem to notice that this is something they're supposed to do.  The result is very jarring and rather than making the post look read like a typical LW post, their misuse of LW jargon makes it extra obvious that they are not a native. Edit to add: This clearly illustrates that the jargon isn't just synonyms to words/concepts they already know. 

The way to make LW more accessible is to embrace jargon, as a clear signal of assumed prior knowledge of some concept, and also have a dictionary, so people can look up words they don't know. I think this is also more or less what we're already doing, because it's kind of the obvious thing to do. 

There is basically zero risk that the people wanting less jargon will win this fight, because jargon is just too useful for communication, and humans really like communication, especially nerds. But maybe it would be marginally helpful for more people to have an explicit model of what jargon is and what it's for, which is my justification for this quick take.

Reply2211
Circuits in Superposition: Compressing many small neural networks into one
Linda Linsefors6mo57

According to my calculation, this embedding will result in too much compounding noise. I get the same noise results as you for one layer, but the noise grows too much from layer to layer.

However, Lucius suggested a different embedding, which seems to work. 

We'll have some publication on this eventually. If you want to see the details sooner you can message me.

Reply1
Simple versus Short: Higher-order degeneracy and error-correction
Linda Linsefors7mo10

Since Bayesian statistics is both fundamental and theoretically tractable

What do you mean by "tractable" here?

Reply
Natural Latents: The Math
Linda Linsefors9mo*20

In standard form, a natural latent is always approximately a deterministic function of X. Specifically: Λ(X)≈∏i(x′↦P[Xi=x′i|X¯i]).

What does the arrow mean in this expression?

Reply
AI Safety Camp 10
Linda Linsefors11mo20

You can find their prefeed contact info in each document in the Team section.

Reply
AI Safety Camp 10
Linda Linsefors11mo10

Yes there are, sort of...

You can apply to as many projects as you want, but you can only join one team. 

The reasons for this is: When we've let people join more than one team in the past, they usually end up not having time for both and dropping out of one of the projects.

What this actually means:

When you join a team you're making a promise to spend 10 or more hours per week on that project. When we say you're only allowed to join one team, what we're saying is that you're only allowed to make this promise to one project.

However, you are allowed to help out other teams with their projects, even if you're not officially on the team.

Reply
AI Safety Camp 10
Linda Linsefors11mo10

@Samuel Nellessen 
Thanks for answering Gunnars question.

But also, I'm a bit nervous that posting their email here directly in the comments is too public, i.e. easy for spam-bots to find. 

Reply
AI Safety Camp 10
Linda Linsefors11mo10

If the research lead want to be contactable, their contact info is in their projekt document, under the "Team" section. Most (or all, I'm not sure) research leads have some contact info.

Reply
Load More