If I understand this right, there is a diamond in a hightech room to be protected. The goal is to know if the diamond is in place and not just a image or a dummy like a picture or similar.

If the AI only is getting footage from a normal camera, not from a lidar sensor for depth information of the diamond (with would see if there is a fake image hanging in front of the camera), wouldn't it be easier to train the AI to look at the reflection/refraction of the light of the diamond? (For example a light that is turning on at the side of the room in the moment the camera is triggered) If there is a picture in front of the camera, there wouldn't be any reflections or refractions, if there is a dummy, the reflection/refraction would be different, because the dummy couldn't be a perfect recreation of the real diamond. The material and the making of that dummy would influence the refraction of the light and the absorbed colors etc. Or use 2 images, one without and one with the light triggered, compare the 2 images, and then the image with the light on with the image taken when the diamond was confirmed to be in place.