AI ALIGNMENT FORUM
AF

Zhu Xiaohu
010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Towards understanding-based safety evaluations
Zhu Xiaohu2y10

I have a pretty fundamental concern with these sorts of techniques as a mechanism for eventually assessing alignment

that would lead to safety or alignment goodharting problem. 

Reply
No posts to display.