One question that occurred to me, reading the extended GPT-generated text. (Probably more a curiosity question than a contribution as such...)
To what extent does text generated by GPT-simulated 'agents', then published on the internet (where it may be used in a future dataset to train language models), create a feedback loop?
Two questions that I see as intuition pumps on this point:
Would it be a bad idea to recursively ask GPT-n "You're a misaligned agent simulated by a language model and your name is [unique identifier]. What would you like to say, knowing that the text you generate will be used in training future GPT-n models, to try to influence that process?" then
One question that occurred to me, reading the extended GPT-generated text. (Probably more a curiosity question than a contribution as such...)
To what extent does text generated by GPT-simulated 'agents', then published on the internet (where it may be used in a future dataset to train language models), create a feedback loop?
Two questions that I see as intuition pumps on this point:
- Would it be a bad idea to recursively ask GPT-n "You're a misaligned agent simulated by a language model and your name is [unique identifier]. What would you like to say, knowing that the text you generate will be used in training future GPT-n models, to try to influence that process?" then
... (read more)