Training on narrow examples of misaligned behavior sometimes extrapolates to broadly misaligned behavior, seemingly altering the assistant's goals or persona rather than just training on that specific behavior