Stupid question: because we already know the goal ("keep the diamond intact and in the vault") what prevents us from bypassing the sensors and just directly evaluating the AI based on whether or not the diamond is in the room? Granted, this only works in simulated training, but as long as the AI doesn't know whether or not it's in deployment (an adversarial training process might help here) that won't matter.
As any goal we could have is a subset of the possible states of the area we care about, verifying whether or not our goal is achieved should be easier... (read more)
Stupid question: because we already know the goal ("keep the diamond intact and in the vault") what prevents us from bypassing the sensors and just directly evaluating the AI based on whether or not the diamond is in the room? Granted, this only works in simulated training, but as long as the AI doesn't know whether or not it's in deployment (an adversarial training process might help here) that won't matter.
As any goal we could have is a subset of the possible states of the area we care about, verifying whether or not our goal is achieved should be easier... (read more)