Optimally Combining Probe Monitors and Black Box Monitors
by Tim Hua, James Baskerville, BionicD0LPH1N, Mia Hopman, Aryan Bhatt, and Tyler Tracy
Link to our arXiv paper "Combining Cost-Constrained Runtime Monitors for AI Safety" here: https://arxiv.org/abs/2507.15886. Code can be found here. Executive Summary * Monitoring AIs at runtime can help us detect and stop harmful actions. For cost reasons, we often want to use cheap monitors like probes to monitor all of...
Jul 27, 2025•53