Alignment stream of thought


Sorted by New

Wiki Contributions


I think school is huge in preventing people from becoming smart and curious. I spent 1-2years where I hardly studied at all and mostly played videogames - I wish I hadn't wasted that time, but when I quit I did so of my own free will. I think there's a huge difference between discipline imposed from the outside vs the inside, and getting to the latter is worth a lot. (though I wish I hadn't wasted all that time now haha)


I'm unsure which parts of my upbringing were cruxes for unschooling working. You should probably read a book or something rather than taking my (very abnormal) opinion. I just know how it went for me :)

Epistemic status: personal experience.

I'm unschooled and think it's clearly better, even if you factor in my parents being significantly above average in parenting. Optimistically school is babysitting, people learn nothing there while wasting most of their childhood. Pessimistically it's actively harmful by teaching people to hate learning/build antibodies against education.

Here's a good documentary made by someone who's been in and out of school. I can't give detailed criticism since I (thankfully) never had to go to school.

EDIT: As for what the alternative should be, I honestly don't know. Shifting equilibria is hard, though it's easy to give better examples (e.g. dath ilan, things in the documentary I linked.) For a personal solution: Homeschool your kids.

If the title is meant to be a summary of the post, I think that would be analogous to someone saying "nuclear forces provide an untapped wealth of energy". It's true, but the reason the energy is untapped is because nobody has come up with a good way of tapping into it.

The difference is people have been trying hard to harness nuclear forces for energy, while people have not been trying hard to research humans for alignment in the same way. Even relative to the size of the alignment field being far smaller, there hasn't been a real effort as far as I can see. Most people immediately respond with "AGI is different from humans for X,Y,Z reasons" (which are true) and then proceed to throw out the baby with the bathwater by not looking into human value formation at all.

Planes don't fly like birds, but we sure as hell studied birds to make them.

If you come up with a strategy for how to do this then I'm much more interested, and that's a big reason why I'm asking for a summary since I think you might have tried to express something like this in the post that I'm missing.

This is their current research direction, The shard theory of human values which they're currently making posts on.

I think even without point #4 you don't necessarily get an AI maximizing diamonds. Heuristically, it feels to me like you're bulldozing open problems without understanding them (e.g. ontology identification by training with multiple models of physics, getting it not to reward-hack by explicit training, etc.) all of which are vulnerable to a deceptively aligned model (just wait till you're out of training to reward-hack). Also, every time you say "train it by X so it learns Y" you're assuming alignment (e.g. "digital worlds where the sub-atomic physics is different, such that it learns to preserve the diamond-configuration despite ontological confusion")

IMO shard theory provides a great frame to think about this in, it's a must-read for improving alignment intuitions.