AI ALIGNMENT FORUM
AF

Issa Rice
Ω208102740
Message
Dialogue
Subscribe

I am Issa Rice. https://issarice.com/

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
3riceissa's Shortform
4y
0
Your posts should be on arXiv
riceissa2y37

I didn't log the time I spent on the original blog post, and it's kinda hard to assign hours to this since most of the reading and thinking for the post happened while working on the modeling aspects of the MTAIR project. If I count just the time I sat down to write the blog post, I would guess maybe less than 20 hours.

As for the "convert the post to paper" part, I did log that time and it came out to 89 hours, so David's estimate of "perhaps another 100 hours" is fairly accurate.

Reply
AMA Conjecture, A New Alignment Startup
riceissa3y10

Did you end up running it through your internal infohazard review and if so what was the result?

Reply
Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate
riceissa4y10

With help from David Manheim, this post has now been turned into a paper. Thanks to everyone who commented on the post!

Reply
MIRI/OP exchange about decision theory
riceissa4y80

Rob, are you able to disclose why people at Open Phil are interested in learning more decision theory? It seems a little far away from the AI strategy reports they've been publishing in recent years, and it also seemed like they were happy to keep funding MIRI (via their Committee for Effective Altruism Support) despite disagreements about the value of HRAD research, so the sudden interest in decision theory is intriguing.

Reply
AGI will drastically increase economies of scale
riceissa4y40

I was reading parts of Superintelligence recently for something unrelated and noticed that Bostrom makes many of the same points as this post:

If the frontrunner is an AI system, it could have attributes that make it easier for it to expand its capabilities while reducing the rate of diffusion. In human-run organizations, economies of scale are counteracted by bureaucratic inefficiencies and agency problems, including difficulties in keeping trade secrets. These problems would presumably limit the growth of a machine intelligence project so long as it is operated by humans. An AI system, however, might avoid some of these scale diseconomies, since the AI’s modules (in contrast to human workers) need not have individual preferences that diverge from those of the system as a whole. Thus, the AI system could avoid a sizeable chunk of the inefficiencies arising from agency problems in human enterprises. The same advantage—having perfectly loyal parts—would also make it easier for an AI system to pursue long-range clandestine goals. An AI would have no disgruntled employees ready to be poached by competitors or bribed into becoming informants.

Reply
Introduction to Cartesian Frames
riceissa5y20

So the existence of this interface implies that A is “weaker” in a sense than A’.

Should that say B instead of A', or have I misunderstood? (I haven't read most of the sequence.)

Reply
The Alignment Problem: Machine Learning and Human Values
riceissa5y90

Does anyone know how Brian Christian came to be interested in AI alignment and why he decided to write this book instead of a book about a different topic? (I haven't read the book and looked at the Amazon preview but couldn't find the answer there.)

Reply
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
riceissa5y20

HCH is the result of a potentially infinite exponential process (see figure 1) and thereby, computationally intractable. In reality, we can not break down any task into its smallest parts and solve these subtasks one after another because that would take too much computation. This is why we need to iterate distillation and amplification and cannot just amplify.

In general your post talks about amplification (and HCH) as increasing the capability of the system and distillation as saving on computation/making things more efficient. But my understanding, based on this conversation with Rohin Shah, is that amplification is also intended to save on computation (otherwise we could just try to imitate humans). In other words, the distillation procedure is able to learn more quickly by training on data provided by the amplified system compared to just training on the unamplified system. So I don't like the phrasing that distillation is the part that's there to save on computation, because both parts seem to be aimed at that.

(I am making this comment because I want to check my understand with you or make sure you understand this point because it doesn't seem to be stated in your post. It was one of the most confusing things about IDA to me and I'm still not sure I fully understand it.)

Reply
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
riceissa5y10

I still don't understand how corrigibility and intent alignment are different. If neither implies the other (as Paul says in his comment starting with "I don't really think this is true"), then there must be examples of AI systems that have one property but not the other. What would a corrigible but not-intent-aligned AI system look like?

I also had the thought that the implicative structure (between corrigibility and intent alignment) seems to depend on how the AI is used, i.e. on the particulars of the user/overseer. For example if you have an intent-aligned AI and the user is careful about not deploying the AI in scenarios that would leave them disempowered, then that seems like a corrigible AI. So for this particular user, it seems like intent alignment implies corrigibility. Is that right?

The implicative structure might also be different depending on the capability of the AI, e.g. a dumb AI might have corrigibility and intent alignment equivalent, but the two concepts might come apart for more capable AI.

Reply
My Understanding of Paul Christiano's Iterated Amplification AI Safety Research Agenda
riceissa5y20

IDA tries to prevent catastrophic outcomes by searching for a competitive AI that never intentionally optimises for something harmful to us and that we can still correct once it’s running.

I don't see how the "we can still correct once it’s running" part can be true given this footnote:

However, I think at some point we will probably have the AI system autonomously execute the distillation and amplification steps or otherwise get outcompeted. And even before that point we might find some other way to train the AI in breaking down tasks that doesn’t involve human interaction.

After a certain point it seems like the thing that is overseeing the AI system is another AI system and saying that "we" can correct the first AI system seems like a confusing way to phrase this situation. Do you think I've understood this correctly / what do you think?

Reply
Load More
Yudkowsky's Law
3y
(+1134)
Do The Math, Then Burn The Math and Go With Your Gut
4y
(+1113)
Do The Math, Then Burn The Math and Go With Your Gut
4y
(+9/-9)
Do The Math, Then Burn The Math and Go With Your Gut
4y
(+812)
Do The Math, Then Burn The Math and Go With Your Gut
4y
(+76)
How An Algorithm Feels
4y
(+19)
Cognitive Reduction
4y
(+30)
Parenting
5y
(+17)
Family planning
5y
(+75)
Family planning
5y
(+141)
Load More
14Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safety
4y
0
27Analogies and General Priors on Intelligence
4y
11
3riceissa's Shortform
4y
0
19Timeline of AI safety
5y
3
32Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate
5y
11
8How does iterated amplification exceed human abilities?
Q
5y
Q
9
7What are some exercises for building/generating intuitions about key disagreements in AI alignment?
Q
5y
Q
2
11Deliberation as a method to find the "actual preferences" of humans
6y
1
12What are the differences between all the iterative/recursive approaches to AI alignment?
Q
6y
Q
14
24Comparison of decision theories (with a focus on logical-counterfactual decision theories)
6y
12
Load More