Posts

Sorted by New

Wiki Contributions

Comments

I have a question about the conjecture at the end of Direction 17.5. Let  be a utility function with values in  and let  be a strictly monotonous function. Then  and  have the same maxima.  can be non-linear, e.g. . Therefore, I wonder if the condition  should be weaker.

Moreover, I ask myself if it is possible to modify  by a small amount at a place far away from the optimal policy such that  is still optimal for the modified utility function. This would weaken the statement about the uniqueness of the utility function even more. Think of an AI playing Go. If a weird position on the board has the utility -1.01 instead of -1, this should not change the winning strategy. I have to go through all of the definitions to see if I can actually produce a more mathematical example. Nevertheless, you may have a quick opinion if this could happen.