ZY — AI Alignment Forum

A couple questions/clarifications:

1. Where do you get the base/pre-trained model for GPT-4? Would that be through collaboration with OpenAI?

This indicates base models learned to emulate AI assistants^[1] from pre-training data. This also provides evidence against the lack of capabilities being the primary reason why most frontier chat models don't fake alignment.

2. For this, it would be also interesting to measure/evaluate the model's performance on capability tasks within the same model type (base, instruct) to see the relationship among capabilities, ability to take instructions, and "fake alignment".

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments