x

AI ALIGNMENT FORUM

AF

Benaya Koren — AI Alignment Forum

Benaya Koren

Benaya Koren

Message

-3

2

24

2

3y

Benaya Koren

-3

2

3y

Continuous Adversarial Quality Assurance: Extending RLHF and Constitutional AI

Introduction Lately, the problem of aligning artificial intelligence with human values rapidly changed its status from hypothetical to most concrete, with the rise of more general and more powerful models. The existing methods (Constitutional AI, RLHF and the like) are mostly good enough for common usage with the current models,...

Jul 8, 2023•6