AI ALIGNMENT FORUM
AF

ZY
000
Message
Dialogue
Subscribe

I try to practice independent reasoning/critical thinking, to challenge current solutions to be more considerate/complete. I value receiving and giving dissent. I do not reply to DMs for non-personal (with respect to the user who reached out directly) discussions, and will post here instead with reference to the user and my reply.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
0ZY's Shortform
1y
0
Why Do Some Language Models Fake Alignment While Others Don't?
ZY2mo00

A couple questions/clarifications: 

1. Where do you get the base/pre-trained model for GPT-4? Would that be through collaboration with OpenAI?

This indicates base models learned to emulate AI assistants[1] from pre-training data. This also provides evidence against the lack of capabilities being the primary reason why most frontier chat models don't fake alignment.

2. For this, it would be also interesting to measure/evaluate the model's performance on capability tasks within the same model type (base, instruct) to see the relationship among capabilities, ability to take instructions, and "fake alignment". 

Reply
No posts to display.