AI ALIGNMENT FORUM
AF

Max Kanwal
Ω9000
Message
Dialogue
Subscribe

PhD Student @ Stanford University

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Mechanistic Transparency for Machine Learning
Max Kanwal7y50

I see two major challenges (one of which leans heavily on progress in linguistics). I can see there being mathematical theory to guide candidate model decompositions (Challenge 1), but I imagine that linking up a potential model decomposition to a theory of 'semantic interpretability' (Challenge 2) is equally hard, if not harder.

Any ideas on how you plan to address Challenge 2? Maybe the most robust approach would involve active learning of the pseudocode, where a human guides the algorithm in its decomposition and labeling of each abstract computation.

Reply
No wikitag contributions to display.
6On the Role of Counterfactuals in Learning
7y
1