Edit February 2024: Now I think maybe we can't do better.
Lately, "alignment" means "follow admin rules and do what users mean". Admins can put in rules like "don't give instructions for bombs, hacking, or bioweapons" and "don't take sides in politics". As AI gets more powerful, we can use it to write better rules and test for loopholes. And "do what users mean" implies some low-impactfulness: it won't factory reset your computer when you ask it to fix the font size.
In other words, the plan has two steps:
1. Make a machine that can do literally anything
2. Configure it to do good things
I don't like this plan very much because it puts power in the hands of the few and has big security liabilities.
Temptation
Imagine you're on the team of three people that writes the constitution for the training run for the gigaAI1. Or you're the one actually typing it in. Or you do infra and write all the code that strings pass through on the way to and from the training cluster.
Might you want to take advantage of the opportunity? I can think of a few subtle things I might add.
If I think about the people in my life I know well, I can only name one that I would nominate for the constitution writing position.
One solution I've heard for this is to use the powerful DWIM machine to construct a second unalterable machine, then delete the first machine. You might call the time between creating the first and second machine the temptation mountain (in a plot where X is time and Y is the maximum power of any individual person). Then you want to minimize the volume of this mountain. It is difficult to shrink the mountain very much though.
Security risks
If you're 90% through your DWIM AI development and it seems very promising, then you must be very careful with your code and data. If someone else gets your secrets, then they can make it do what they mean, and they might mean for it to do something quite different. You should be careful with your team members too since threat