From what I understand, Dan plans to add more object-level arguments soon.

(opinions are my own)
I think this is a good review. Some points that resonated with me:
1. "The concepts of systemic safety, monitoring, robustness, and alignment seem rather fuzzy." I don't think the difference between objective and capabilities robustness is discussed but this distinction seems important. Also, I agree that Truthful AI could easily go into monitoring.
2. "Lack of concrete threat models." At the beginning of the course, there are a few broad arguments for why AI might be dangerous but not a lot of concrete failure modes. Adding more failure...

Thanks for the comment! I felt some of the content in the PAIS series would've been great for the course, though the creators probably had a reason to exclude them and I'm not sure why.  In this case I feel it could be better for the chapter on x-risk to be removed entirely. Might be better to not include it at all than to include it and mostly show quotes by famous people without properly engaging in the arguments.

PAIS #5 might be helpful here. It explains how a variety of empirical directions are related to X-Risk and probably includes many of the ones that academics are working on. 

This is because longer runs will be outcompeted by runs that start later and therefore use better hardware and better algorithms.

Wouldn't companies port their partially-trained models to new hardware? I guess the assumption here is that when more compute is available, actors will want to train larger models. I don't think this is obviously true because:
1. Data may be the bigger bottleneck. There was some discussion of this here. Making models larger doesn't help very much after a certain point compared with training them with more data.
2. If training runs ...

Claim 1: there is an AI system that (1) performs well ... (2) generalizes far outside of its training distribution.

Don't humans provide an existence proof of this? The point about there being a 'core' of general intelligence seems unnecessary.

I agree that humans satisfying the conditions of claim 1 is an argument in favour of it being possible to build machines that do the same. A couple of points: I think the threat model would posit the core of general intelligence as the reason both why humans can do these things and why the first AGI we build might also do these things. Claim 1 should perhaps be more clear that it's not just saying such an AI design is possible, but that it's likely to be found and built.

Safety and value alignment are generally toxic words, currently. Safety is becoming more normalized due to its associations with uncertainty, adversarial robustness, and reliability, which are thought respectable. Discussions of superintelligence are often derided as “not serious”, “not grounded,” or “science fiction.”


Here's a relevant question in the 2016 survey of AI researchers:


These numbers seem to conflict with what you said but maybe I'm misinterpreting you. If there is a conflict here, do you think that if this survey was done again, the...