AI ALIGNMENT FORUM
AF

396
David James
000
Message
Dialogue
Subscribe

My top interest is AI safety, followed by reinforcement learning. My professional background is in software engineering, computer science, machine learning. I have degrees in electrical engineering, liberal arts, and public policy. I currently live in the Washington, DC metro area; before that, I lived in Berkeley for about five years.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
Self-fulfilling misalignment data might be poisoning our AI models
David James7mo0-6

and I don't really see how to do that without directly engaging with the knowledge of the failure modes there.

I agree. To put it another way, even if all training data was scrubbed of all flavors of deception, how could ignorance of it be durable?

Reply