AI ALIGNMENT FORUM
AF

Rachael Churchill
Ω2000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Prizes for ELK proposals
Rachael Churchill4y20

Am I right in thinking:

1) that the problem can be stated as: the AI has latent knowledge of lots of variables, like the status of the cameras, doors, alarm system, etc and also whether the diamond is in the vault; but you can't directly ask it whether the diamond is in the vault, because its training has taught it to answer "would a human observer think the diamond is in the vault?" instead (because there was no way at training time to give it feedback on whether it correctly predicted the diamond was in the vault, only feedback on whether it correctly predicted a human thought the diamond was in the vault)?

2) that you do have access to z, the large "vector of floats representing the generative model’s latent space", but that you have no idea which part(s) of it represents the AI's knowledge about whether the diamond is in the room?

Reply
No wikitag contributions to display.
No posts to display.