You are viewing revision 1.0.0, last edited by RogerDearnaley
Training on narrow examples of misaligned behavior sometimes extrapolates to broadly misaligned behavior, seemingly altering the assistant's goals or persona rather than just training on that specific behavior