AI ALIGNMENT FORUM
AF

Alignment Newsletter

Aug 01, 2018 by Rohin Shah

I publish the Alignment Newsletter, a weekly publication with recent content relevant to AI alignment. See here for more details. Quick links: email signup form, RSS feed, spreadsheet of all summaries.

3The Alignment Newsletter #1: 04/09/18
Rohin Shah
7y
0
1The Alignment Newsletter #2: 04/16/18
Rohin Shah
7y
0
2The Alignment Newsletter #3: 04/23/18
Rohin Shah
7y
0
1The Alignment Newsletter #4: 04/30/18
Rohin Shah
7y
0
1The Alignment Newsletter #5: 05/07/18
Rohin Shah
7y
0
1The Alignment Newsletter #6: 05/14/18
Rohin Shah
7y
0
1The Alignment Newsletter #7: 05/21/18
Rohin Shah
7y
0
1The Alignment Newsletter #8: 05/28/18
Rohin Shah
7y
0
1The Alignment Newsletter #9: 06/04/18
Rohin Shah
7y
0
5The Alignment Newsletter #10: 06/11/18
Rohin Shah
7y
0
1The Alignment Newsletter #11: 06/18/18
Rohin Shah
7y
0
3The Alignment Newsletter #12: 06/25/18
Rohin Shah
7y
0
17Alignment Newsletter #13: 07/02/18
Rohin Shah
7y
0
4Alignment Newsletter #14
Rohin Shah
7y
0
8Alignment Newsletter #15: 07/16/18
Rohin Shah
7y
0
12Alignment Newsletter #16: 07/23/18
Rohin Shah
7y
0
6Alignment Newsletter #17
Rohin Shah
7y
0
4Alignment Newsletter #18
Rohin Shah
7y
0
4Alignment Newsletter #19
Rohin Shah
7y
0
5Alignment Newsletter #20
Rohin Shah
7y
0
7Alignment Newsletter #21
Rohin Shah
7y
0
5Alignment Newsletter #22
Rohin Shah
7y
0
3Alignment Newsletter #23
Rohin Shah
7y
0
5Alignment Newsletter #24
Rohin Shah
7y
2
9Alignment Newsletter #25
Rohin Shah
7y
3
4Alignment Newsletter #26
Rohin Shah
7y
0
6Alignment Newsletter #27
Rohin Shah
7y
0
6Alignment Newsletter #28
Rohin Shah
7y
0
5Alignment Newsletter #29
Rohin Shah
7y
0
12Alignment Newsletter #30
Rohin Shah
7y
0
7Alignment Newsletter #31
Rohin Shah
7y
0
6Alignment Newsletter #32
Rohin Shah
7y
0
7Alignment Newsletter #33
Rohin Shah
7y
0
8Alignment Newsletter #34
Rohin Shah
7y
0
5Alignment Newsletter #35
Rohin Shah
7y
0
10Alignment Newsletter #36
Rohin Shah
7y
0
9Alignment Newsletter #37
Rohin Shah
7y
2
6Alignment Newsletter #38
Rohin Shah
7y
0
12Alignment Newsletter #39
Rohin Shah
7y
0
8Alignment Newsletter #40
Rohin Shah
7y
0
7Alignment Newsletter #41
Rohin Shah
6y
6
11Alignment Newsletter #42
Rohin Shah
6y
0
7Alignment Newsletter #43
Rohin Shah
6y
2
9Alignment Newsletter #44
Rohin Shah
6y
0
12Alignment Newsletter #45
Rohin Shah
6y
0
11Alignment Newsletter #46
Rohin Shah
6y
0
10Alignment Newsletter #47
Rohin Shah
6y
0
16Alignment Newsletter #48
Rohin Shah
6y
14
11Alignment Newsletter #49
Rohin Shah
6y
0
7Alignment Newsletter #50
Rohin Shah
6y
0
11Alignment Newsletter #51
Rohin Shah
6y
0
8Alignment Newsletter #52
Rohin Shah
6y
1
29Alignment Newsletter One Year Retrospective
Rohin Shah
6y
10
10Alignment Newsletter #53
Rohin Shah
6y
0
10[AN #54] Boxing a finite-horizon AI system to keep it unambitious
Rohin Shah
6y
0
11[AN #55] Regulatory markets and international standards as a means of ensuring beneficial AI
Rohin Shah
6y
1
9[AN #56] Should ML researchers stop running experiments before making hypotheses?
Rohin Shah
6y
8
15[AN #57] Why we should focus on robustness in AI safety, and the analogous problems in programming
Rohin Shah
6y
14
22[AN #58] Mesa optimization: what it is, and why we should care
Rohin Shah
6y
1
17[AN #59] How arguments for AI risk have changed over time
Rohin Shah
6y
0
10[AN #60] A new AI challenge: Minecraft agents that assist human players in creative mode
Rohin Shah
6y
2
0[AN #61] AI policy and governance, from two people in the field
Rohin Shah
6y
1
14[AN #62] Are adversarial examples caused by real but imperceptible features?
Rohin Shah
6y
9
15[AN #63] How architecture search, meta learning, and environment design could lead to general intelligence
Rohin Shah
6y
3
8[AN #64]: Using Deep RL and Reward Uncertainty to Incentivize Preference Learning
Rohin Shah
6y
8
9[AN #65]: Learning useful skills by watching humans “play”
Rohin Shah
6y
0
9[AN #66]: Decomposing robustness into capability robustness and alignment robustness
Rohin Shah
6y
1
12[AN #67]: Creating environments in which to study inner alignment failures
Rohin Shah
6y
0
12[AN #68]: The attainable utility theory of impact
Rohin Shah
6y
0
25[AN #69] Stuart Russell's new book on why we need to replace the standard model of AI
Rohin Shah
6y
10
14[AN #70]: Agents that help humans who are still learning about their own preferences
Rohin Shah
6y
0
10[AN #71]: Avoiding reward tampering through current-RF optimization
Rohin Shah
6y
0
16[AN #72]: Alignment, robustness, methodology, and system building as research priorities for AI safety
Rohin Shah
6y
0
8[AN #73]: Detecting catastrophic failures by learning how agents tend to break
Rohin Shah
6y
0
13[AN #74]: Separating beneficial AI into competence, alignment, and coping with impacts
Rohin Shah
6y
0
18[AN #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee
Rohin Shah
6y
0
10[AN #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations
Rohin Shah
6y
2
10[AN #77]: Double descent: a unification of statistical theory and modern ML practice
Rohin Shah
6y
2
15[AN #78] Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison
Rohin Shah
6y
5
9[AN #79]: Recursive reward modeling as an alignment technique integrated with deep RL
Rohin Shah
6y
0
18[AN #80]: Why AI risk might be solved without additional intervention from longtermists
Rohin Shah
6y
75
23[AN #81]: Universality as a potential solution to conceptual difficulties in intent alignment
Rohin Shah
6y
4
12[AN #82]: How OpenAI Five distributed their training computation
Rohin Shah
5y
0
10[AN #83]: Sample-efficient deep learning with ReMixMatch
Rohin Shah
5y
1
13[AN #84] Reviewing AI alignment work in 2018-19
Rohin Shah
5y
0
10[AN #85]: The normative questions we should be asking for AI alignment, and a surprisingly good chatbot
Rohin Shah
5y
0
10[AN #86]: Improving debate and factored cognition through human experiments
Rohin Shah
5y
0
14[AN #87]: What might happen as deep learning scales even further?
Rohin Shah
5y
0
12[AN #88]: How the principal-agent literature relates to AI risk
Rohin Shah
5y
0
10[AN #89]: A unifying formalism for preference learning algorithms
Rohin Shah
5y
0
9[AN #90]: How search landscapes can contain self-reinforcing feedback loops
Rohin Shah
5y
1
11[AN #91]: Concepts, implementations, problems, and a benchmark for impact measurement
Rohin Shah
5y
6
11[AN #92]: Learning good representations with contrastive predictive coding
Rohin Shah
5y
0
11[AN #93]: The Precipice we’re standing at, and how we can back away from it
Rohin Shah
5y
0
8[AN #94]: AI alignment as translation between humans and machines
Rohin Shah
5y
0
11[AN #95]: A framework for thinking about how to make AI go well
Rohin Shah
5y
0
11[AN #96]: Buck and I discuss/argue about AI Alignment
Rohin Shah
5y
4
10[AN #97]: Are there historical examples of large, robust discontinuities?
Rohin Shah
5y
0
11[AN #98]: Understanding neural net training by seeing which gradients were helpful
Rohin Shah
5y
3
14[AN #99]: Doubling times for the efficiency of AI algorithms
Rohin Shah
5y
0
21[AN #100]: What might go wrong if you learn a reward function while acting
Rohin Shah
5y
2
9[AN #101]: Why we should rigorously measure and forecast AI progress
Rohin Shah
5y
0
19[AN #102]: Meta learning by GPT-3, and a list of full proposals for AI alignment
Rohin Shah
5y
0
18[AN #103]: ARCHES: an agenda for existential safety, and combining natural language with deep RL
Rohin Shah
5y
0
12[AN #104]: The perils of inaccessible information, and what we can learn about AI alignment from COVID
Rohin Shah
5y
2
16[AN #105]: The economic trajectory of humanity, and what we might mean by optimization
Rohin Shah
5y
1
10[AN #106]: Evaluating generalization ability of learned reward models
Rohin Shah
5y
2
9[AN #107]: The convergent instrumental subgoals of goal-directed agents
Rohin Shah
5y
1
14[AN #108]: Why we should scrutinize arguments for AI risk
Rohin Shah
5y
5
12[AN #109]: Teaching neural nets to generalize the way humans would
Rohin Shah
5y
0
9[AN #110]: Learning features from human feedback to enable reward learning
Rohin Shah
5y
0
16[AN #111]: The Circuits hypotheses for deep learning
Rohin Shah
5y
0
13[AN #112]: Engineering a Safer World
Rohin Shah
5y
2
13[AN #113]: Checking the ethical intuitions of large language models
Rohin Shah
5y
0
14[AN #114]: Theory-inspired safety solutions for powerful Bayesian RL agents
Rohin Shah
5y
1
13[AN #115]: AI safety research problems in the AI-GA framework
Rohin Shah
5y
2
13[AN #116]: How to make explanations of neurons compositional
Rohin Shah
5y
2
15[AN #117]: How neural nets would fare under the TEVV framework
Rohin Shah
5y
0
10[AN #118]: Risks, solutions, and prioritization in a world with many AI systems
Rohin Shah
5y
6
9[AN #119]: AI safety when agents are shaped by environments, not rewards
Rohin Shah
5y
0
9[AN #120]: Tracing the intellectual roots of AI and AI alignment
Rohin Shah
5y
4
14[AN #121]: Forecasting transformative AI timelines using biological anchors
Rohin Shah
5y
5
15[AN #122]: Arguing for AGI-driven existential risk from first principles
Rohin Shah
5y
0
14[AN #123]: Inferring what is valuable in order to align recommender systems
Rohin Shah
5y
1
9[AN #124]: Provably safe exploration through shielding
Rohin Shah
5y
0
16[AN #125]: Neural network scaling laws across multiple modalities
Rohin Shah
5y
1
15[AN #126]: Avoiding wireheading by decoupling action feedback from action effects
Rohin Shah
5y
1
24[AN #127]: Rethinking agency: Cartesian frames as a formalization of ways to carve up the world into an agent and its environment
Rohin Shah
5y
0
11[AN #128]: Prioritizing research on AI existential safety based on its application to governance demands
Rohin Shah
5y
0
11[AN #129]: Explaining double descent by measuring bias and variance
Rohin Shah
5y
1
8[AN #130]: A new AI x-risk podcast, and reviews of the field
Rohin Shah
5y
0
10[AN #131]: Formalizing the argument of ignored attributes in a utility function
Rohin Shah
5y
2
15[AN #132]: Complex and subtly incorrect arguments as an obstacle to debate
Rohin Shah
5y
1
11[AN #133]: Building machines that can cooperate (with humans, institutions, or other machines)
Rohin Shah
4y
0
10[AN #134]: Underspecification as a cause of fragility to distribution shift
Rohin Shah
4y
0
22[AN #135]: Five properties of goal-directed systems
Rohin Shah
4y
0
13[AN #136]: How well will GPT-N perform on downstream tasks?
Rohin Shah
4y
0
13[AN #137]: Quantifying the benefits of pretraining on downstream task performance
Rohin Shah
4y
0
9[AN #138]: Why AI governance should find problems rather than just solving them
Rohin Shah
4y
0
17[AN #139]: How the simplicity of reality explains the success of neural nets
Rohin Shah
4y
6
22[AN #140]: Theoretical models that predict scaling laws
Rohin Shah
4y
0
18[AN #141]: The case for practicing alignment work on GPT-3 and other large models
Rohin Shah
4y
4
21[AN #142]: The quest to understand a network well enough to reimplement it by hand
Rohin Shah
4y
3
10[AN #143]: How to make embedded agents that reason probabilistically about their environments
Rohin Shah
4y
0
15[AN #144]: How language models can also be finetuned for non-language tasks
Rohin Shah
4y
0
27Alignment Newsletter Three Year Retrospective
Rohin Shah
4y
0
12[AN #145]: Our three year anniversary!
Rohin Shah
4y
0
14[AN #146]: Plausible stories of how we might fail to avert an existential catastrophe
Rohin Shah
4y
0
12[AN #147]: An overview of the interpretability landscape
Rohin Shah
4y
2
16[AN #148]: Analyzing generalization across more axes than just accuracy or loss
Rohin Shah
4y
3
15[AN #149]: The newsletter's editorial policy
Rohin Shah
4y
3
12[AN #150]: The subtypes of Cooperative AI research
Rohin Shah
4y
0
16[AN #151]: How sparsity in the final layer makes a neural net debuggable
Rohin Shah
4y
1
15[AN #152]: How we’ve overestimated few-shot learning capabilities
Rohin Shah
4y
3
19[AN #153]: Experiments that demonstrate failures of objective robustness
Rohin Shah
4y
0
11[AN #154]: What economic growth theory has to say about transformative AI
Rohin Shah
4y
0
14[AN #155]: A Minecraft benchmark for algorithms that learn without reward functions
Rohin Shah
4y
1
22[AN #156]: The scaling hypothesis: a plan for building AGI
Rohin Shah
4y
16
18[AN #157]: Measuring misalignment in the technology underlying Copilot
Rohin Shah
4y
9
15[AN #158]: Should we be optimistic about generalization?
Rohin Shah
4y
0
11[AN #159]: Building agents that know how to experiment, by training on procedurally generated games
Rohin Shah
4y
4
14[AN #160]: Building AIs that learn and think like people
Rohin Shah
4y
2
11[AN #161]: Creating generalizable reward functions for multiple tasks by learning a model of functional similarity
Rohin Shah
4y
0
18[AN #162]: Foundation models: a paradigm shift within AI
Rohin Shah
4y
0
26[AN #163]: Using finite factored sets for causal and temporal inference
Rohin Shah
4y
0
9[AN #164]: How well can language models write code?
Rohin Shah
4y
5
12[AN #165]: When large models are more likely to lie
Rohin Shah
4y
0
26[AN #166]: Is it crazy to claim we're in the most important century?
Rohin Shah
4y
2
18[AN #167]: Concrete ML safety problems and their relevance to x-risk
Rohin Shah
4y
4
10[AN #168]: Four technical topics for which Open Phil is soliciting grant proposals
Rohin Shah
4y
0
18[AN #169]: Collaborating with humans without human data
Rohin Shah
4y
0
15[AN #170]: Analyzing the argument for risk from power-seeking AI
Rohin Shah
4y
0
20[AN #171]: Disagreements between alignment "optimists" and "pessimists"
Rohin Shah
3y
0
25[AN #172] Sorry for the long hiatus!
Rohin Shah
3y
0
21[AN #173] Recent language model results from DeepMind
Rohin Shah
3y
0