5 ways to improve CoT faithfulness
In the case for CoT unfaithfulness is overstated, @nostalgebraist pointed out that reading the chain-of-thought (CoT) reasoning of models is neglected as an interpretability technique. I completely agree, but I also believe that with our current methods, there's a significant risk of being misled by the CoTs of models like...