To what extent are the scaling properties of Transformer networks exceptional? — AI Alignment Forum