DanielFilan

Wiki Contributions

Comments

Project Intro: Selection Theorems for Modularity

The method that is normally used for this in the biological literature (including the Kashtan & Alon paper mentioned above), and in papers by e.g. CHAI dealing with identifying modularity in deep modern networks, is taken from graph theory. It involves the measure Q, which is defined as follows:

FWIW I do not use this measure in my papers, but instead use a different graph-theoretic measure. (I also get the sense that Q is more of a network theory thing than a graph theory thing)

rohinmshah's Shortform

I think it's more concerning in cases where you're getting all of your info from goal-oriented behaviour and solving the inverse planning problem

It's also not super clear what you algorithmically do instead - words are kind of vague, and trajectory comparisons depend crucially on getting the right info about the trajectory, which is hard, as per the ELK document.

rohinmshah's Shortform

One objection: an assistive agent doesn’t let you turn it off, how could that be what we want? This just seems totally fine to me — if a toddler in a fit of anger wishes that its parents were dead, I don’t think the maximally-toddler-aligned parents would then commit suicide, that just seems obviously bad for the toddler.

I think this is way more worrying in the case where you're implementing an assistance game solver, where this lack of off-switchability means your margins for safety are much narrower.

Though [the claim that slightly wrong observation model => doom] isn’t totally clear, e.g. is it really that easy to mess up the observation model such that it leads to a reward function that’s fine with murdering humans? It seems like there’s a lot of evidence that humans don’t want to be murdered!

I think it's more concerning in cases where you're getting all of your info from goal-oriented behaviour and solving the inverse planning problem - in those cases, the way you know how 'human preferences' rank future hyperslavery vs wireheaded rat tiling vs humane utopia is by how human actions affect the likelihood of those possible worlds, but that's probably not well-modelled by Boltzmann rationality (e.g. the thing I'm most likely to do today is not to write a short computer program that implements humane utopia), and it seems like your inference is going to be very sensitive to plausible variations in the observation model.

Job Offering: Help Communicate Infrabayesianism

A future episode might include a brief distillation of that episode ;)

Summary of the Acausal Attack Issue for AIXI

But wait, there can only be so many low-complexity universes, and if they're launching successful attacks, said attacks would be distributed amongst a far far far larger population of more-complex universes.

Can't you just condition on the input stream to affect all the more-complex universes, rather than targetting a single universe? Specifically: look at the input channel, run basically-Solomonoff-induction yourself, then figure out which universe you're being fed inputs of and pick outputs appropriately. You can't be incredibly powerful this way, since computable universes can't actually contain Solomonoff inductors, but you can predict well a pretty wide variety of universes, e.g. all those computable in polynomial time.

Introduction To The Infra-Bayesianism Sequence

That said, this sequence is tricky to understand and I'm bad at it! I look forward to brave souls helping to digest it for the community at large.

I interviewed Vanessa here in an attempt to make this more digestible: it hopefully acts as context for the sequence, rather than a replacement for reading it.

Shulman and Yudkowsky on AI progress

One thing Carl notes is that a variety of areas where AI could contribute a lot to the economy are currently pretty unregulated. But I think there's a not-crazy story where once you are within striking range of making an area way more efficient with computers, then the regulation hits. I'm not sure how to evaluate how right that is (e.g. I don't think it's the story of housing regulation), but just wanted it said.

Soares, Tallinn, and Yudkowsky discuss AGI cognition

Anders Sandberg could tell us what fraction of the reachable universe is being lost per minute, which would tell us how much more surety it would need to expect to gain by waiting another minute before acting.

From Ord (2021):

Each year the affectable universe will only shrink in volume by about one part in 5 billion.

So, since there are about 5e5 minutes in a year, you lose about 1 part in 5e5 * 5e9 = 3e15 every minute.

Load More