Mark Xu

Sequences

Intermittent Distllations

Wiki Contributions

Comments

Rogue AGI Embodies Valuable Intellectual Property

Yeah, I'm really not sure how the monopoly -> non-monopoly dynamics play out in practice. In theory, perfect competition should drive the cost to the cost of marginal production, which is very low for software. I briefly tried getting empirical data for this, but couldn't find it, plausibly since I didn't really know the right search terms.

AMA: Paul Christiano, alignment researcher

How would you teach someone how to get better at the engine game?

AMA: Paul Christiano, alignment researcher

You've written multiple outer alignment failure stories. However, you've also commented that these aren't your best predictions. If you condition on humanity going extinct because of AI, why did it happen?

Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

I'm curious what "put it in my SuperMemo" means. Quick googling only yielded SuperMemo as a language learning tool.

Transparency Trichotomy

I agree it's sort of the same problem under the hood, but I think knowing how you're going to go from "understanding understanding" to producing an understandable model controls what type of understanding you're looking for.

I also agree that this post makes ~0 progress on solving the "hard problem" of transparency, I just think it provides a potentially useful framing and creates a reference for me/others to link to in the future.

Open Problems with Myopia

One way of looking at DDT is "keeping it dumb in various ways." I think another way of thinking about is just designing a different sort of agent, which is "dumb" according to us but not really dumb in an intrinsic sense. You can imagine this DDT agent looking at agents that do do acausal trade and thinking they're just sacrificing utility for no reason.

There is some slight awkwardness in that the decision problems agents in this universe actually encounter means that UDT agents will get higher utility than DDT agents.

I agree that the maximum a posterior world doesn't help that much, but I think there is some sense in which "having uncertainty" might be undesirable.

Load More