Josh Clymer

Wiki Contributions


From what I understand, Dan plans to add more object-level arguments soon.

(opinions are my own)
I think this is a good review. Some points that resonated with me:
1. "The concepts of systemic safety, monitoring, robustness, and alignment seem rather fuzzy." I don't think the difference between objective and capabilities robustness is discussed but this distinction seems important. Also, I agree that Truthful AI could easily go into monitoring.
2. "Lack of concrete threat models." At the beginning of the course, there are a few broad arguments for why AI might be dangerous but not a lot of concrete failure modes. Adding more failure modes here would better motivate the material.

3. Give more clarity on how the various ML safety techniques address the alignment problem, and how they can potentially scale to solve bigger problems of a similar nature as AIs scale in capabilities
4. Give an assessment on the most pressing issues that should be addressed by the ML community and the potential work that can be done to contribute to the ML safety field

You can read more about how these technical problems relate to AGI failure modes and how they rank on importance, tractability, and crowdedness in Pragmatic AI Safety 5. I think the creators included this content in a separate forum post for a reason.

The course is intended for to two audiences: people who are already worried about AI X-risk and people who are only interested in the technical content. The second group doesn't necessarily care about why each research direction relates to reducing X-risk.

Putting a lot of emphasis on this might just turn them off. It could give them the impression that you have to buy X-risk arguments in order to work on these problems (which I don't think is true) or it could make them less likely to recommend the course to others, causing fewer people to engage with the X-risk material overall.

PAIS #5 might be helpful here. It explains how a variety of empirical directions are related to X-Risk and probably includes many of the ones that academics are working on. 

This is because longer runs will be outcompeted by runs that start later and therefore use better hardware and better algorithms.

Wouldn't companies port their partially-trained models to new hardware? I guess the assumption here is that when more compute is available, actors will want to train larger models. I don't think this is obviously true because:
1. Data may be the bigger bottleneck. There was some discussion of this here. Making models larger doesn't help very much after a certain point compared with training them with more data.
2. If training runs are happening over months, there will be strong incentives to make use of previously trained models -- especially in a world where people are racing to build AGI. This could look like anything from slapping on more layers to developing algorithms that expand the model in all relevant dimensions as it is being trained. Here's a paper about progressive learning for vision transformers. I didn't find anything for NLP, but I also haven't looked very hard.

Claim 1: there is an AI system that (1) performs well ... (2) generalizes far outside of its training distribution.

Don't humans provide an existence proof of this? The point about there being a 'core' of general intelligence seems unnecessary.

Safety and value alignment are generally toxic words, currently. Safety is becoming more normalized due to its associations with uncertainty, adversarial robustness, and reliability, which are thought respectable. Discussions of superintelligence are often derided as “not serious”, “not grounded,” or “science fiction.”


Here's a relevant question in the 2016 survey of AI researchers:


These numbers seem to conflict with what you said but maybe I'm misinterpreting you. If there is a conflict here, do you think that if this survey was done again, the results would be different? Or do you think these responses do not provide an accurate impression of how researchers actually feel/felt (maybe because of agreement bias or something)?