riceissa — AI Alignment Forum

Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safety

This paper is a revised and expanded version of my blog post Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate, now with David Manheim as co-author. Abstract: > Several different approaches exist for ensuring the safety of future Transformative Artificial Intelligence (TAI) or...

Jan 27, 202227

Analogies and General Priors on Intelligence

This post is part 2 in our sequence on Modeling Transformative AI Risk. We are building a model to understand debates around existential risks from advanced AI. The model is made with Analytica software, and consists of nodes (representing key hypotheses and cruxes) and edges (representing the relationships between these...

Aug 20, 202157

riceissa's Shortform

Mar 27, 20216

Timeline of AI safety

Here is a timeline of AI safety that I originally wrote in 2017. The timeline has been updated several times since then, mostly by Vipul Naik. Here are some highlights by year from the timeline since 2013: YearHighlights2013Research and outreach focused on forecasting and timelines continue. Connections with the nascent...

Feb 7, 202173

Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate

This post is my attempt to summarize and distill the major public debates about MIRI's highly reliable agent designs (HRAD) work (which includes work on decision theory), including the discussions in Realism about rationality and Daniel Dewey's My current thoughts on MIRI's "highly reliable agent design" work. Part of the...

Jun 22, 202085

How does iterated amplification exceed human abilities?

When I first started learning about IDA, I thought that agents trained using IDA would be human-level after the first stage, i.e. that Distill(H) would be human-level. As I've written about before, Paul later clarified this, so my new understanding is that after the first stage, the distilled agent will...

May 2, 202019

What are some exercises for building/generating intuitions about key disagreements in AI alignment?

I am interested in having my own opinion about more of the key disagreements within the AI alignment field, such as whether there is a basin of attraction for corrigibility, whether there is a theory of rationality that is sufficiently precise to build hierarchies of abstraction, and to what extent...

Mar 16, 202018

Issa Rice

Issa Rice

Issa Rice

Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate

Comparison of decision theories (with a focus on logical-counterfactual decision theories)

Timeline of AI safety

Analogies and General Priors on Intelligence

Issa Rice

Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate

Comparison of decision theories (with a focus on logical-counterfactual decision theories)

Timeline of AI safety

Analogies and General Priors on Intelligence

Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safety

Analogies and General Priors on Intelligence

riceissa's Shortform

Timeline of AI safety

Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate

How does iterated amplification exceed human abilities?

What are some exercises for building/generating intuitions about key disagreements in AI alignment?