All of Ulisse Mini's Comments + Replies

Was considering saving this for a followup post but it's relatively self-contained, so here we go.

Why are huge coefficients sometimes okay? Let's start by looking at norms per position after injecting a large vector at position 20.

This graph is explained by LayerNorm. Before using the residual stream we perform a LayerNorm

# transformer block forward() in GPT2
x = x + self.attn(self.ln_1(x))
x = x + self.mlp(self.ln_2(x))

If x has very large magnitude, then the block doesn't change it much relative to its magnitude. Additionally, attention is ran on the norm... (read more)

3Alex Turner19d
Thanks for writing this up, I hadn't realized this. One conclusion I'm drawing is: If the values in the modified residual streams aren't important to other computations in later sequence positions, then a large-coefficient addition will still lead to reasonable completions. 

Are most uncertainties we care about logical rather than informational? All empirical ML experiments are pure computations a Bayesian superintelligence could do in its head. How much of our uncertainty comes from computational limits in practice, versus actual information bottlenecks?

Random thought: Perhaps you could carefully engineer gradient starvation in order to "avoid generalizing" and defeat the Discrete modes of prediction example. You'd only need to delay it until reflection, then the AI can solve the successor AI problem.

In general: hack our way towards getting value-preserving reflectivity before values drift from "Diamonds" -> "What's labeled as a diamond by humans". (Replacing with "Telling the truth", and "What the human thinks is true" respectively).

I think school is huge in preventing people from becoming smart and curious. I spent 1-2years where I hardly studied at all and mostly played videogames - I wish I hadn't wasted that time, but when I quit I did so of my own free will. I think there's a huge difference between discipline imposed from the outside vs the inside, and getting to the latter is worth a lot. (though I wish I hadn't wasted all that time now haha)


I'm unsure which parts of my upbringing were cruxes for unschooling working. You should probably read a book or something rather than taking my (very abnormal) opinion. I just know how it went for me :)

Epistemic status: personal experience.

I'm unschooled and think it's clearly better, even if you factor in my parents being significantly above average in parenting. Optimistically school is babysitting, people learn nothing there while wasting most of their childhood. Pessimistically it's actively harmful by teaching people to hate learning/build antibodies against education.

Here's a good documentary made by someone who's been in and out of school. I can't give detailed criticism since I (thankfully) never had to go to school.

EDIT: As for what the alternat... (read more)

I would very much assume that you have a strong genetic disposition to be smart and curious. Do you think unschooling would work acceptably well for kids who are not smart and curious?

If the title is meant to be a summary of the post, I think that would be analogous to someone saying "nuclear forces provide an untapped wealth of energy". It's true, but the reason the energy is untapped is because nobody has come up with a good way of tapping into it.

The difference is people have been trying hard to harness nuclear forces for energy, while people have not been trying hard to research humans for alignment in the same way. Even relative to the size of the alignment field being far smaller, there hasn't been a real effort as far as I can... (read more)

2Logan Riggs Smith1y
To add, Turntrout does state: so the doc Ulisse provided is a decent write-up about just that, but there are more official posts intended to published.

I think even without point #4 you don't necessarily get an AI maximizing diamonds. Heuristically, it feels to me like you're bulldozing open problems without understanding them (e.g. ontology identification by training with multiple models of physics, getting it not to reward-hack by explicit training, etc.) all of which are vulnerable to a deceptively aligned model (just wait till you're out of training to reward-hack). Also, every time you say "train it by X so it learns Y" you're assuming alignment (e.g. "digital worlds where the sub-atomic physics is d... (read more)