Continuous Adversarial Quality Assurance: Extending RLHF and Constitutional AI
Introduction Lately, the problem of aligning artificial intelligence with human values rapidly changed its status from hypothetical to most concrete, with the rise of more general and more powerful models. The existing methods (Constitutional AI, RLHF and the like) are mostly good enough for common usage with the current models,...
Jul 8, 20236