rk — AI Alignment Forum

I'd like to check my understanding of the last two transitions a little. If someone could check the below I'd be grateful.

When we move to the utilitarianism-like quadrant, individual actors have preference ordering over actions (not states). So if they were the kind of utilitarians we often think about (with utilities attached to states), that would be something like ordering the actions by their expected utility (as they see it). So we actually get something like combined epistemic modesty and utilitarianism here?

Then, when we move to the futarchy-like quadrant, we additionally elicit probabilities for (all possible?) observations from each agent. We give them less and less weight in the decision calculus when their predictions are wrong. This stops agents that have crazy beliefs reliably voting us into world states they wouldn't like anyway (though I think that world states aren't present in the formalism in this quadrant).

Does the above paragraph mean that people with unique preferences and crazy beliefs eventually end up without having their preferences respected (whereas someone with unique preferences and accurate beliefs would still have their preferences respected)?

(Some extra questions. I'm more interested in answers to questions above the line, so feel free to skip these)

Also, do we have to treat the agents as well-calibrated across all domains? Or is the system able to learn that their thoughts should be given weight in some circumstances and not others? The reason I think we can't do that is because it seems like there is just one number that represents the agent's overall accuracy (theta_i)

A possible fix to the above is that individual agents could do this subject-specific evaluation of other agents and would update their credences based on partially-accurate agents, thus the information still gets preserved. But I think this leads to another problem: could there be a double-counting when both Critch's mechanism and other agents pick up on the accuracy of an agent? Or are we fine because agents who over-update on others' views also get their vote penalised?

Complete Class: Consequentialist Foundations

rk7y10

Dependent Type Theory and Zero-Shot Reasoning

rk7y20

Cool! This reminds me of dimensional analysis, but with types instead of dimensions (and types have more structure cos "->" is not commutative).

Also, I guess because of dependent types' connection with constructive mathematics, this works less well if you want to create a proof that uses excluded middle? I suppose you would end up with something of the type (P -> ⊥) -> ⊥ and then be stuck

(Minor point: is the reason you say "basically two options" for because you could also have h(h(fxx)) etc.? I guess we eventually prove that that would be the same function!)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments