It’s well-established in the AI alignment literature what happens when an AI system learns or is given an objective that doesn’t fully capture what we want. Human preferences and values are inevitably left out and the AI, likely being a powerful optimizer, will take advantage of the dimensions of freedom...
Just a year ago we released a two part episode titled An Overview of Technical AI Alignment with Rohin Shah. That conversation provided details on the views of central AI alignment research organizations and many of the ongoing research efforts for designing safe and aligned systems. Much has happened in...
Most relevant to AI alignment, and a pertinent question to focus on for interested readers/listeners is: if we are are unable to establish a governance mechanism as a global community on the concept that we should not let AI make the decision to kill humans, then what effects will this...