Scott Garrabrant - AI Alignment Forum

Here are the most interesting things about these objects to me that I think this post does not capture.

Given a distribution over non-negative non-identically-zero infrafunctions, up to a positive scalar multiple, the pointwise geometric expectation exists, and is an infra function (up to a positive scalar multiple).

(I am not going to give all the math and be careful here, but hopefully this comment will provide enough of a pointer if someone wants to investigate this.)

This is a bit of a miracle. Compare this with arithmetic expectation of utility functions. This is not always well defined. For example, if you have a sequence of utility functions U_n, each with weight 2^{-n}, but which alternate in which of two outcomes they prefer, and each utility function gets an internal weighting to cancel out their small weight an then some, the expected utility will not exist. There will be a series of larger and larger utility monsters canceling each other out, and the limit will not exist. You could fix this requiring your utility functions are bounded, as is standard for dealing with utility monsters, but it is really interesting that in the case of infra functions and geometric expectation, you don't have to.

If you try to do a similar trick with infra functions, up to a positive scalar multiple, geometric expectation will go to infinity, but you can renormalize everything since you are only working up to a scalar multiple, to make things well defined.

We needed the geometric expectation to only be working up to a scalar multiple, and you cant expect a utility function if you take a geometric expectation of utility functions. (but you do get an infrafunction!)

If you start with utility functions, and then merge them geometrically, the resulting infrafunction will be maximized at the Nash bargaining solution, but the entire infrafunction can be thought of as an extended preference over lotteries of the pair of utility functions, where as Nash bargaining only told you the maximum. In this way geometric merging of infrafunctions is starting with an input more general than the utility functions of Nash bargaining, and giving an output more structured than the output of Nash bargaining, and so can be thought of as a way of making Nash bargaining more compositional. (Since the input and output are now the same type, so you can stack them on top of each other.)

For these two reasons (utility monster resistance and extending Nash bargaining), I am very interested in the mathematical object that is non-negative non-identically-zero infrafunctions defined only up to a positive scalar multiple, and more specifically, I am interested in the set of such functions as a convex set where mixing is interpreted as pointwise geometric expectation.

Infrafunctions and Robust Optimization

Scott Garrabrant1y71

I have been thinking about this same mathematical object (although with a different orientation/motivation) as where I want to go with a weaker replacement for utility functions.

I get the impression that for Diffractor/Vanessa, the heart of a concave-value-function-on-lotteries is that it represents the worst case utility over some set of possible utility functions. For me, on the other hand, a concave value function represents the capacity for compromise -- if I get at least half the good if I get what I want with 50% probability, then I have the capacity to merge/compromise with others using tools like Nash bargaining.

This brings us to the same mathematical object, but it feels like I am using the definition of convex set related to the line segment connecting any two points in the set is also in the set, where Diffractor/Vanessa is using the definition of convex set related to being an intersection of half planes.

I think this pattern where I am more interested in merging, and Diffractor and Vanessa are more interested in guarantees, but we end up looking at the same math is a pattern, and I think the dual definitions of convex set in part explains (or at least rhymes with) this pattern.

Concave Utility Question

Scott Garrabrant1y40

Then it is equivalent to the thing I call B2 in edit 2 in the post (Assuming A1-A3).

In this case, your modified B2 is my B2, and your B3 is my A4, which follows from A5 assuming A1-A3 and B2, so your suspicion that these imply C4 is stronger than my Q6, which is false, as I argue here.

However, without A5, it is actually much easier to see that this doesn't work. The counterexample here satisfies my A1-A3, your weaker version of B2, your B3, and violates C4.

Concave Utility Question

Scott Garrabrant1y20

Your B3 is equivalent to A4 (assuming A1-3).

Concave Utility Question

Scott Garrabrant1y62

Your B2 is going to rule out a bunch of concave functions. I was hoping to only use axioms consistent with all (continuous) concave functions.

Concave Utility Question

Scott Garrabrant1y20

I am skeptical that it will be possible to salvage any nice VNM-like theorem here that makes it all the way to concavity. It seems like the jump necessary to fix this counterexample will be hard to express in terms of only a preference relation.

Concave Utility Question

Answer by Scott GarrabrantApr 15, 202330

The answers to Q3, Q4 and Q6 are all no. I will give a sketchy argument here.

Consider the one dimensional case, where the lotteries are represented by real numbers in the interval , and consider the function $u : L \to [0, 1]$ given by $u (x) = \frac{1}{2} - (x - \frac{1}{3})^{3} (x - \frac{2}{3})$ . Let $⪰$ be the preference order given by $x ⪰ y$ if and only if $u (x) \geq u (y)$ .

$u$ is continuous and quasi-concave, which means $⪰$ is going to satisfy A1, A2, A3, A4, and B2. Further, since $u$ is monotonically increasing up to the unique argmax, and then monotonically decreasing, $⪰$ is going to satisfy A5.

$u$ is not concave, but we need to show there is not another concave function giving the same preference relation as $u$ . The only way to keep the same preference relation is to compose $u$ with a strictly monotonic function $f$ , so $v (x) = f (u (x)$ ).

If $f$ is smooth, we have a problem, since $v^{'} (\frac{1}{3}) = f^{'} (u (\frac{1}{3})) u^{'} (\frac{1}{3}) = f^{'} (\frac{1}{2}) 0 = 0$ . However, since, $v^{'}$ must be on some $x > \frac{1}{3}$ , but concavity would require $v^{'}$ to be decreasing.

In order to remove the inflection point at $x = \frac{1}{3}$ , we need to flatten it out with some $f$ that has infinite slope at $\frac{1}{2}$ . For example, we could take $f (z) = \sqrt[3]{z - \frac{1}{2}}$ . However, any f that removes the inflection point at $x = \frac{1}{3}$ , will end up adding an inflection point at $x = \frac{2}{3}$ , which will have a infinite negate slope. This newly created inflection point will cause a problem for similar reasons.

Concave Utility Question

Scott Garrabrant1y30

You can also think of A5 in terms of its contrapositive: For all , if $A ≻ B$ , then for all $0 < p \leq 1$ $A ≻ p A + (1 - p) B$

This is basically just the strict version of A4. I probably should have written it that way instead. I wanted to use $⪰$ instead of $≻$ , because it is closer to the base definition, but that is not how I was natively thinking about it, and I probably should have written it the way I think about it.

Concave Utility Question

Scott Garrabrant1y30

Alex's counterexample as stated is not a counterexample to Q4, since it is in fact concave.

I believe your counterexample violates A5, taking , $A = X$ , and $p = \frac{1}{2}$ .

Concave Utility Question

Scott Garrabrant1y52

That does not rule out your counterexample. The condition is never met in your counterexample.

AI ALIGNMENT FORUM
AF

Sequences

Posts

Wiki Contributions

Comments