Radical Probabilism [Transcript]

Ben Pace

[-]David Scott Krueger (formerly: capybaralet)5y*30

Abram Demski: But it's like, how do you do that if “I don't have a good hypothesis” doesn't make any predictions?

One way you can imagine this working is that you treat “I don't have a good hypothesis” as a special hypothesis that is not required to normalize to 1.
For instance, it could say that observing any particular real number, r, has probability epsilon > 0.
So now it "makes predictions", but this doesn't just collapse to including another hypothesis and using Bayes rule.

You can also imagine updating this special hypothesis (which I called a "Socratic hypothesis" in comments on the original blog post on Radical Probabilism) in various ways.

[-]Kenny5y20

Are there any other detailed descriptions of what a "Jeffrey update" might look like or how one would perform one?

I think I get the point of there being "rationality constraints" that don't, by implication, strictly require Bayesian updates. But are Jeffrey updates the entire set of possible updates that are required?

Can anyone describe a concrete example contrasting a Bayesian update and a Jeffrey update for the same circumstances, e.g. prior beliefs and new information learned?

It kinda seems like Jeffrey updates are 'possibly rational updates' but they're only justified if one can perform them for no possible (or knowable) reason. That doesn't seem practical – how could that work?

[-]abramdemski5y*70

Understandable questions. I hope to expand this talk into a post which will explain things more properly.

Think of the two requirements for Bayes updates as forming a 2x2 matrix. If you have both (1) all information you learned can be summarised into one proposition which you learn with 100% confidence, and (2) you know ahead of time how you would respond to that information, then you must perform a Bayesian update. If you have (2) but not (1), ie you update some X to less than 100% confidence but you knew ahead of time how you would update to changed beliefs about X, then you are required to do a Jeffrey update. But if you don't have (2), updates are not very constrained by Dutch-book type rationality. So in general, Jeffrey argued that there are many valid updates beyond Bayes and Jeffrey updates.

Jeffrey updates are a simple generalization of Bayes updates. When a Bayesian learns X, they update it to 100%, and take P(Y|X) to be the new P(Y) for all Y. (More formally, we want to update P to get a new probability measure Q. We do so by setting Q(Y)=P(Y|X) for all Y.) Jeffrey wanted to handle the case where you somehow become 90% confident of X, instead of fully confident. He thought this was more true to human experience. A Jeffrey update is just the weighted average of the two possible Bayesian updates. (More formally, we want to update P to get Q where Q(X)=c for some chosen c. We set Q(Y) = cP(Y|X) + (1-c)P(Y|~X).)

A natural response for a classical Bayesian is: where does 90% come from? (Where does c come from?) But the Radical Probabilism retort is: where do observations come from? The Bayesian already works in a framework where information comes in from "outside" somehow. The radical probabilist is just working in a more general framework where more general types of evidence can come in from outside.

Pearl argued against this practice in his book introducing Bayesian networks. But he introduced an equivalent -- but more practical -- concept which he calls virtual evidence. The Bayesian intuition freaks out at somehow updating X to 90% without any explanation. But the virtual evidence version is much more intuitive. (Look it up; I think you'll like it better.) I don't think virtual evidence goes against the spirit of Radical Probabilism at all, and in fact if you look at Jeffrey's writing he appears to embrace it. So I hope to give that version in my forthcoming post, and explain why it's nicer than Jeffrey updates in practice.

[-]DanielFilan5y30

More formally, we want to update P to get Q where Q(X)=c for some chosen c. We set Q(Y) = cP(Y|X) + (1-c)P(~Y|X).

Huh, I'm really surprised this isn't Q(Y) = cP(Y|X) + (1-c)P(Y|~X). Was that a typo? If not, why choose your equation over mine?

[-]abramdemski5y20

Ah, yep! Corrected.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

18

Radical Probabilism [Transcript]

18

Talk

Q&A