AI ALIGNMENT FORUM
AF

Cody
Ω17001
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Four ways learning Econ makes people dumber re: future AI
Cody Rushing9d52

Regardless of the whether these anti-pedagogy's are correct, I'm confused about why you think you've shown that learning econ made the economists dumber. It seems like the majority of your tweets you linked, excluding maybe Tweet 4, are actually just the economists discussing narrow AI and failing to consider general intelligence?

If you meant to say something like 'econ pedagogy makes it hard for economists to view AGI as something that could actually be intelligent in a way similar to humans', then I may be more inclined to agree with you.

Reply
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
Cody Rushing1y1321

It also seems to have led to at least one claim in a policy memo that advocates of AI safety are being silly because mechanistic interpretability was solved.

Small nitpick (I agree with mostly everything else in the post and am glad you wrote it up). This feels like an unfair criticism - I assume you are referring specifically to the statement in their paper that:

Although advocates for AI safety guidelines often allude to the "black box" nature of AI models, where the logic behind their conclusions is not transparent, recent advancements in the AI sector have resolved this issue, thereby ensuring the integrity of open-source code models.

I think Anthropic's interpretability team, while making maybe dubious claims about the impact of their work on safety, has been clear that mechanistic interpretability is far from 'solved.' For instance, Chris Olah in the linked NYT article from today:

“There are lots of other challenges ahead of us, but the thing that seemed scariest no longer seems like a roadblock,” he said.

Also, in the paper's section on Inability to Evaluate:

it's unclear that they're really getting at the fundamental thing we care about

I think they are overstating how far/useful mechanistic interpretability is currently. However, I don't think this messaging is close to 'mechanistic interpretability solves AI Interpretability' - this error is on a16z, not Anthropic. 

Reply
Rationality
2y
(+390/-118)
56Ctrl-Z: Controlling AI Agents via Resampling
5mo
0
38[Paper] All's Fair In Love And Love: Copy Suppression in GPT-2 Small
2y
0