CAIS-inspired approach towards safer and more interpretable AGIs — AI Alignment Forum