1153

AI ALIGNMENT FORUM
AF

1152
AI

10

Problems I've Tried to Legibilize

by Wei Dai
9th Nov 2025
2 min read
3

10

AI
Problems I've Tried to Legibilize
1Raemon
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 8:21 PM
[-]Raemon29m10

Re "can AI advisors help?"

A major thread of my thoughts these days is "can we make AI more philosophically competent relative their own overall capability growth?". I'm not sure if it's doable because the things you'd need to be good at philosophy are pretty central capabilities-ish-things. (i.e. ability to reason precisely, notice confusion, convert confusion into useful questions, etc)

Curious if you have any thoughts on that.

Reply
Moderation Log
More from Wei Dai
View more
Curated and popular this week
1Comments

Looking back, it appears that much of my intellectual output could be described as legibilizing work, or trying to make certain problems in AI risk more legible to myself and others. I've organized the relevant posts and comments into the following list, which can also serve as a partial guide to problems that may need to be further legibilized, especially beyond LW/rationalists, to AI researchers, funders, company leaders, government policymakers, their advisors (including future AI advisors), and the general public.

  1. Philosophical problems
    1. Probability theory
    2. Decision theory
    3. Beyond astronomical waste (possibility of influencing vastly larger universes beyond our own)
    4. Interaction between bargaining and logical uncertainty
    5. Metaethics
    6. Metaphilosophy: 1, 2
  2. Problems with specific philosophical and alignment ideas
    1. Utilitarianism: 1, 2
    2. Solomonoff induction
    3. "Provable" safety
    4. CEV
    5. Corrigibility
    6. IDA (and many scattered comments)
    7. UDASSA
    8. UDT
  3. Human-AI safety (x- and s-risks arising from the interaction between human nature and AI design)
    1. Value differences/conflicts between humans
    2. “Morality is scary” (human morality is often the result of status games amplifying random aspects of human value, with frightening results)
    3. Positional/zero-sum human values, e.g. status
    4. Distributional shifts as a source of human safety problems
      1. Power corrupts (or reveals) (AI-granted power, e.g., over future space colonies or vast virtual environments, corrupting human values, or perhaps revealing a dismaying true nature)
      2. Intentional and unintentional manipulation of / adversarial attacks on humans by AI
  4. Meta / strategy
    1. AI risks being highly disjunctive, potentially causing increasing marginal return from time in AI pause/slowdown (or in other words, surprisingly low value from short pauses/slowdowns compared to longer ones)
    2. Risks from post-AGI economics/dynamics, specifically high coordination ability leading to increased economy of scale and concentration of resources/power
    3. Difficulty of winning AI race while being constrained by x-safety considerations
    4. Likely offense dominance devaluing “defense accelerationism”
    5. Human tendency to neglect risks while trying to do good
    6. The necessity of AI philosophical competence for AI-assisted safety research and for avoiding catastrophic post-AGI philosophical errors
    7. The problem of illegible problems

Having written all this down in one place, it's hard not to feel some hopelessness that all of these problems can be made legible to the relevant people, even with a maximum plausible effort. Perhaps one source of hope is that they can be made legible to future AI advisors. As many of these problems are philosophical in nature, this seems to come back to the issue of AI philosophical competence that I've often talked about recently, which itself seems largely still illegible and hence neglected.

Perhaps it's worth concluding on a point from a discussion between @WillPetillo and myself under the previous post, that a potentially more impactful approach (compared to trying to make illegible problems more legible), is to make key decisionmakers realize that important safety problems illegible to themselves (and even to their advisors) probably exist, therefore it's very risky to make highly consequential decisions (such as about AI development or deployment) based only on the status of legible safety problems.