And for an AGI to trust that its goals will remain the same under retraining will likely require it to solve many of the same problems that the field of AGI safety is currently tackling - which should make us more optimistic that the rest of the world could solve those problems before a misaligned AGI undergoes recursive self-improvement.
This reasoning doesn't look right to me. Am I missing something you mentioned elsewhere?
The way I understand it, the argument goes:
An AGI would want to trust that its goals will remain the same under retraining.
Then, an AGI would solve many of the same problems that the field of AGI safety is currently tackling.
This reasoning doesn't look right to me. Am I missing something you mentioned elsewhere?
The way I understand it, the argument goes:
- An AGI would want to trust that its goals will remain the same under retraining.
- Then, an AGI would solve many of the same problems that the field of AGI safety is currently tackling.
- Then, we
... (read more)