My Thoughts on the ML Safety Course

(opinions are my own)
I think this is a good review. Some points that resonated with me:
1. "The concepts of systemic safety, monitoring, robustness, and alignment seem rather fuzzy." I don't think the difference between objective and capabilities robustness is discussed but this distinction seems important. Also, I agree that Truthful AI could easily go into monitoring.
2. "Lack of concrete threat models." At the beginning of the course, there are a few broad arguments for why AI might be dangerous but not a lot of concrete failure modes. Adding more failure modes here would better motivate the material.

3. Give more clarity on how the various ML safety techniques address the alignment problem, and how they can potentially scale to solve bigger problems of a similar nature as AIs scale in capabilities
4. Give an assessment on the most pressing issues that should be addressed by the ML community and the potential work that can be done to contribute to the ML safety field

You can read more about how these technical problems relate to AGI failure modes and how they rank on importance, tractability, and crowdedness in Pragmatic AI Safety 5. I think the creators included this content in a separate forum post for a reason.

The course is intended for to two audiences: people who are already worried about AI X-risk and people who are only interested in the technical content. The second group doesn't necessarily care about why each research direction relates to reducing X-risk.

Putting a lot of emphasis on this might just turn them off. It could give them the impression that you have to buy X-risk arguments in order to work on these problems (which I don't think is true) or it could make them less likely to recommend the course to others, causing fewer people to engage with the X-risk material overall.

[-]joshc3y20

[-]zeshen3y10

Thanks for the comment!

You can read more about how these technical problems relate to AGI failure modes and how they rank on importance, tractability, and crowdedness in Pragmatic AI Safety 5. I think the creators included this content in a separate forum post for a reason.

I felt some of the content in the PAIS series would've been great for the course, though the creators probably had a reason to exclude them and I'm not sure why.

The second group doesn't necessarily care about why each research direction relates to reducing X-risk.

In this case I feel it could be better for the chapter on x-risk to be removed entirely. Might be better to not include it at all than to include it and mostly show quotes by famous people without properly engaging in the arguments.

[-]joshc3y10

From what I understand, Dan plans to add more object-level arguments soon.

23

My Thoughts on the ML Safety Course

23

Overview

Background

My Initial Expectations of the Course

Summary

Course Background

What it is about

What I liked

What I didn’t like

Risk Analysis

What it is about

What I liked

What I didn’t like

Description of risk

Applying the risk model to ML safety

Description of hazard

Usage of the Bow Tie model

Application of the STAMP framework

Robustness

What it is about

What I liked

What I didn’t like

How the techniques scale

Adversarial robustness vs black swan robustness

Monitoring

What it is about

What I liked

What I didn’t like

How the techniques scale

Interpretable uncertainty as a safety metric

Trojans as mechanisms for deceptive alignment

How emergent behaviors and proxy gaming fits in

Alignment

What it is about

What I liked

What I didn’t like

Language model and lying

Machine ethics and ML safety

Systemic Safety

What it is about

What I liked

What I didn’t like

Relevance of improving decision making

Relevance of cyberdefense

Relevance of cooperative AI

X-risk Overview

What it is about

What I liked

What I didn’t like

Lack of concrete threat models

Balancing safety vs capabilities

Final Thoughts