All of romeostevensit's Comments + Replies

Sample complexity reduction is one of our main moment to moment activities, but humans seem to apply it across bigger bridges and this is probably part of transfer learning. One of the things we can apply sample complexity reduction to is the 'self' object, the idea of a coherent agent across differing decision points. The tradeoff between local and global loss seems to regulate this. Humans don't seem uniform on this dimension, foxes care more about local loss, hedgehogs more about global loss. Most moral philosophy seem like appeals to different possible... (read more)

When I look at metaphilosophy, the main places I go looking are places with large confusion deltas. Where, who, and why did someone become dramatically less philosophically confused about something, turning unfalsifiable questions into technical problems. Kuhn was too caught up in the social dynamics to want to do this from the perspective of pure ideas. A few things to point to.

  1. Wittgenstein noticed that many philosophical problems attempt to intervene at the wrong level of abstraction and posited that awareness of abstraction as a mental event might hel
... (read more)
Also I wrote this a while back

Is there a good primer somewhere on how causal models interact with the standard model of physics?

Tangentially related: recent discussion raising a seemingly surprising point about LLM's being lossless compression finders

The first intuition pump that comes to mind for distinguishing mechanisms is examining how my brain generates and assigns credence to the hypothesis that something going wrong with my car is a sensor malfunction vs telling me about a problem in the world that the sensor exists to alert me to.

One thing that happens is that the broken sensor implies a much larger space of worlds because it can vary arbitrarily instead of only in tight informational coupling with the underlying physical system. So fluctuations outside the historical behavior of the sensor eit... (read more)

Whether or not details (and lots of specific detail arguments) matter hinges on the sensitivity argument (which is an argument about basins?) in general, so I'd like to see that addressed directly. What are the arguments for high sensitivity worlds other than anthropics? What is the detailed anthropic argument?

Rambling/riffing: Boundaries typically need holes in order to be useful. Depending on the level of abstraction, different things can be thought of as holes. One way to think of a boundary is a place where a rule is enforced consistently, and this probably involves pushing what would be a continuous condition into a condition with a few semi discrete modes (in the simplest case enforcing a bimodal distribution of outcomes). In practice, living systems seem to have settled on stacking a bunch of one dimensional gate keepers together as presumably the modular... (read more)

Many proposals seem doomed to me because they involve one or multiple steps where they assume a representation, then try to point to robust relations in the representation and hope they'll hold in the territory. This wouldn't be so bad on its own but when pointed to it seems like handwaving happens rather than something more like conceptual engineering. I am relatively more hopeful about John's approach as being one that doesn't fail to halt and catch fire at these underspecified steps in other plans. In other areas like math and physics we try to get the ... (read more)

This is exactly what I was thinking about though, this idea of monitoring every human on earth seems like a failure of imagination on our part. I'm not safe from predators because I monitor the location of every predator on earth. I admit that many (overwhelming majority probably) of scenarios in this vein are probably pretty bad and involve things like putting only a few humans on ice while getting rid of the rest.

3Rob Bensinger2y
I mean, all of this feels very speculative and un-cruxy to me; I wouldn't be surprised if the ASI indeed is able to conclude that humanity is no threat at all, in which case it kills us just to harvest the resources. I do think that normal predators are a little misleading in this context, though, because they haven't crossed the generality ('can do science and tech') threshold. Tigers won't invent new machines, so it's easier to upper-bound their capabilities. General intelligences are at least somewhat qualitatively trickier, because your enemy is 'the space of all reachable technologies' (including tech that may be surprisingly reachable). Tigers can surprise you, but not in very many ways and not to a large degree.

I guess the threat model relies on the overhang. If you need x compute for powerful ai, then you need to control more than all the compute on earth minus x to ensure safety, or something like that. Controlling the people probably much easier.

1Rob Bensinger2y
Yes, where killing all humans is an example of "controlling the people", from the perspective of an Unfriendly AI.

New-to-me thought I had in response to the kill all humans part. When predators are a threat to you, you of course shoot them. But once you invent cheap tech that can control them you don't need to kill them anymore. The story goes that the AI would kill us either because we are a threat or because we are irrelevant. It seems to me that (and this imports a bunch of extra stuff that would require analysis to turn this into a serious analysis, this is just an idle thought), the first thing I do if I am superintelligent and wanting to secure my position is no... (read more)

2Rob Bensinger2y
A paperclipper mainly cares about humans because we might have some way to threaten the paperclipper (e.g., by pushing a button that deploys a rival superintelligence); and secondarily, we're made of atoms that can be used to build paperclips. It's harder to monitor the actions of every single human on Earth, than it is to kill all humans; and there's a risk that monitoring people visibly will cause someone to push the 'deploy a rival superintelligence' button, if such a button exists. Also, every minute that passes without you killing all humans, in the time window between 'I'm confident I can kill all humans' and 'I'm carefully surveilling every human on Earth and know that there's no secret bunker where someone has a Deploy Superintelligence button', is a minute where you're risking somebody pushing the 'deploy a rival superintelligence' button. This makes me think that the value of delaying 'killing all humans' (once you're confident you can do it) would need to be very high in order to offset that risk. One reason I might be wrong is if the AGI is worried about something like a dead man's switch that deploys a rival superintelligence iff some human isn't alive and regularly performing some action. (Not necessarily a likely scenario on priors, but once you're confident enough in your base plan, unlikely scenarios can end up dominating the remaining scenarios where you lose.) Then it's at least possible that you'd want to delay long enough to confirm that no such switch exists. You should be able to do both in parallel. I don't have a strong view on which is higher-priority. Given the dead-man's-switch worry above, you might want to prioritize sending a probe off-planet first as a precaution; but then go ahead and kill humans ASAP.

If humans were able to make one super-powerful AI, then humans would probably be able to make a second super-powerful AI, with different goals, which would then compete with the first AI. Unless, of course, the humans are somehow prevented from making more AIs, e.g. because they're all dead.

I would summarize a dimension of the difficulty like this. There are the conditions that give rise to intellectual scenes, intellectual scenes being necessary for novel work in ambiguous domains. There are the conditions that give rise to the sort of orgs that output actions consistent with something like Six Dimensions of Operational Adequacy. The intersection of these two things is incredibly rare but not unheard of. The Manhattan Project was a Scene that had security mindset. This is why I am not that hopeful. Humans are not the ones building the AGI, e... (read more)

3Ben Pace2y
Thanks, this story is pretty helpful (to my understanding).

As recent experience has shown, exponential processes don't need to be smarter than us to utterly upend our way of life. They can go from a few problems here and there to swamping all other considerations in a span of time too fast to react to, if preparations aren't made and those knowledgeable don't have the leeway to act. We are in the early stages of an exponential increase in the power of AI algorithms over human life, and people who work directly on these problems are sounding the alarm right now. It is plausible that we will soon have processes that... (read more)

I particularly appreciate the questions that ask one to look at a way that a problem was reified/specified/ontologized in a particular domain and asks for alternative such specifications. I thought Superintelligence (2014) might be net harmful because it introduced a lot of such specifications that I then noticed were hard to think around. I think there are a subset of prompts from the online course/book Framestorming that might be useful there, I'll go see if I can find them.

This seems similar to the SR model of scientific explanation.

It seems like the frame of some of the critique is that humans are the authority on human values and want to ensure that the AI doesn't escape that authority in illegible ways. To me it seems like the frame is more like we know that the sensors we have are only goodhartedly entangled with the things we care about and would ourselves prefer the less goodharted hypothetical sensors if we knew how to construct them. And that we'd want the AI to be inhabiting the same frame as us since, to take a page from Mad Investor Chaos, we don't know how lies will propag... (read more)

5Paul Christiano2y
I would describe the overall question as "Is there a situation where an AI trained using this approach deliberately murders us?" and for ELK more specifically as "Is there a situation where an AI trained using this approach gives an unambiguously wrong answer to a straightforward question despite knowing better?" I generally don't think that much about the complexity of human values.

You may not be interested in mutually exclusive compression schemas, but mutually exclusive compression schemas are interested in you. One nice thing is that given that the schemas use an arbitrary key to handshake with there is hope that they can be convinced to all get on the same arbitrary key without loss of useful structure.

Spoiler tags are borked the way I'm using them.

anyway, another place to try your hand at calibration:

Humbali: No. You're expressing absolute certainty in your underlying epistemology and your entire probability distribution

no he isn't, why?

Humbali is asking for Eliezer to double count evidence. Consilience is hard if you don't do your homework on provenance of heuristic and not just naively counting up outputs who themselves also didn't do their homework.

Or in other words: "Do not cite the deep evidence to me, I was there when it was written"

And another ... (read more)

Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.

Tangential, but did you ever happen to read statistical physics of human cooperation?

No, I just took a look. The spin glass stuff looks interesting!

Defining a distance function between two patterns might yield some interesting stuff and allow some porting in of existing math from information theory. There is also the dynamic case (converging and diverging distances) between different patterns. Seems like it could play into robustness eg sensitivity of patterns to flipping from convergent to divergent state.

I understand, thought it was worth commenting on anyway.

the small size of the human genome suggests that brain design is simple

Bounds, yes but the bound can be quite high due to offloading much of the compression to the environment.

So just to be clear, the model isn't necessarily endorsing the claim, just saying that the claim is a potential crux.

Is a sensitivity analysis of the model separated out anywhere? I might just be missing it.

3Ajeya Cotra2y
There are some limited sensitivity analysis in the "Conservative and aggressive estimates" section of part 4.

Detecting preferences in agents: how many assumptions need to be made?

I'm interpreting this to be asking how to detect the dimensionality of the natural embedding of preferences?

Related to sensitivity of instrumental convergence. i.e. the question of whether we live in a universe of strong or weak instrumental convergence. In a strong instrumental convergence universe, most possible optimizers wind up in a relatively small space of configurations regardless of starting conditions, while in a weak one they may diverge arbitrarily in design space. This can be thought of one way of crisping up concepts around orthogonality. e.g. in some universes orthogonality would be locally true but globally false, or vice versa, or locally and globally true or vice versa.

1Alex Flint3y
Romeo if you have time, would you say more about the connection between orthogonality and Life / the control question / the AI hypothesis? It seems related to me but I just can't quite put my finger on exactly what the connection is.
  1. First-person vs. third-person: In a first-person perspective, the agent is central. In a third-person perspective, we take a “birds-eye” view of the world, of which the agent is just one part.
  1. Static vs. dynamic: In a dynamic perspective, the notion of time is explicitly present in the formalism. In a static perspective, we instead have beliefs directly about entire world-histories.

I think these are two instances of a general heuristic of treating what have traditionally been seen as philosophical positions (e.g. here cognitive and behavioral view... (read more)

This seems consistent with coherence being not a constraint but a dimension of optimization pressure among several/many? Like environments that money pump more reliably will have stronger coherence pressure, but also the creature might just install a cheap hack for avoiding that particular pump (if narrow) which then loosens the coherence pressure (coherence pressure sounds expensive, so workarounds are good deals).

I noticed myself being dismissive of this approach despite being potentially relevant to the way I've been thinking about things. Investigating that, I find that I've mostly been writing off anything that pattern matches to the 'cognitive architectures' family of approaches. The reason for this is that most such approaches want to reify modules and structure. And my current guess is that the brain doesn't have a canonical structure (at least, on the level of abstraction that cognitive architecture focuses on). That is to say, the modules are fluid and their connections to each other are contingent.

2Adam Shimi3y
Thanks for commenting on your reaction to this post! That being said, I'm a bit confused by your comment. You seem to write off approaches which attempt to provide a computational model of mind, but my approach is literally the opposite: looking only at the behavior (but all the behavior), extract relevant statistics to study questions related to goal-directedness. Can you maybe give more details?

Hypothesis: in a predictive coding model, the bottom up processing is doing lossless compression and the top down processing is doing lossy compression. I feel excited about viewing more cognitive architecture problems through a lens of separating these steps.

There's a fairly straightforward optimization process that occurs in product development that I don't often see talked about in the abstract that goes something like this:

It seems like bigger firms should be able to produce higher quality goods. They can afford longer product development cycles, hire a broader variety of specialized labor, etc. In practice, it's smaller firms that compete on quality, why is this?

One of the reasons is that the pressure to cut corners increases enormously at scale along more than one dimension. As a product scales, eking out... (read more)

This is clarifying, thanks.

WRT the last paragraph, I'm thinking in terms of convergent vs divergent processes. So , fixed points I guess.

This is biting the bullet on the infinite regress horn of the Munchhausen trilemma, but given the finitude of human brain architecture I prefer biting the bullet on circular reasoning. We have a variety of overlays, like values, beliefs, goals, actions, etc. There is no canonical way they are wired together. We can hold some fixed as a basis while we modify others. We are a Ship of Neurath. Some parts of the ship feel more is-like (like the waterproofness of the hull) and some feel more ought-like (like the steering wheel).

4Abram Demski3y
Why not both? ;3 I have nothing against justifications being circular (IE the same ideas recurring on many levels), just as I have nothing in principle against finding a foundationalist explanation. A circular argument is just a particularly simple form of infinite regress. But my main argument against only the circular reasoning explanation is that attempted versions of it ("coherentist" positions) don't seem very good when you get down to details. Pure coherentist positions tend to rely on a stipulated notion of coherence (such as probabilistic coherence, or weighted constraint satisfaction, or something along those lines). These notions are themselves fixed. This could be fine if the coherence notions were sufficiently "assumption-lite" so as to not be necessarily Goodhart-prone etc, but so far it doesn't seem that way to me. I'm predicting that you'll agree with me on that, and grant that the notion of coherence should itself be up for grabs. I don't actually think the coherentist/foundationalist/infinitist trilemma is that good a characterization of our disagreement here. My claim isn't so much the classical claim that there's an infinite regress of justification, as much as a claim that there's an infinite regress of uncertainty -- that we're uncertain at all the levels, and need to somehow manage that. This fits the ship-of-theseus picture just fine. In other words, one can unroll a ship of theseus into an infinite hierarchy where each level says something about how the next level down gets re-adjusted over time. The reason for doing this is to achieve the foundationalist goal of understanding the system better, without the foundationalist method of fixing foundational assumptions. The main motive here is amplification. Taking just a ship of theseus, it's not obvious how to make it better besides running it forward faster (and even this has its risks, since the ship may become worse). If we unroll the hierarchy of wanting-to-become better, we can EG see

I see CSC and SEM as highly linked via modularity of processes.

A pointer is sort of the ultimate in lossy compression. Just an index to the uncompressed data, like a legible compression library. Wireheading is a goodhearting problem, which is a lossy compression problem etc.

Over the last few posts the recurrent thought I have is "why aren't you talking about compression more explicitly?"

Could you uncompress this comment a bit please?

The other people of whom you have nude photos, who are now incentivised to pay up rather than kick up a fuss.

Releasing one photo from a previously believed to be secure set of photos, where other photos in the same set are compromising can suffice for single member audience case.

That's the Legalist interpretation of Confucianism. Confucianism argues that the Legalists are just moving the problem one level up the stack a la public choice theory. The point of the Confucian is that the stack has to ground out somewhere, and asks the question of how to roll our virtue intuitions into the problem space explicitly since otherwise we are rolling them in tacitly and doing some hand waving.

Thanks, I was hoping someone more knowledgeable than I would leave a comment along these lines.

The main intuition this sparks in me is that it gives us concrete data structures to look for when talking broadly about the brain doing 'compression' by rotating a high dimensional object and carving off recognized chunks (simple distributions) in order to make the messy inputs more modular, composable, accessible, error correctable, etc. Sort of the way that predictive coding gives us a target to hunt for in looking for structures that look like they might be doing something like the atomic predictive coding unit.

Type theory for utility hypothesis: there are a certain distinct (small) number of pathways in the body that cause physical good feelings. Map those plus the location, duration, intensity, and frequency dimensions and you start to have comparability. This doesn't solve the motivation/meaning structures built on top of those pathways which have more degrees of freedom, but it's still a start. Also, those more complicated things built on top might just be scalar weightings and not change the dimensionality of the space.

4Abram Demski3y
Yeah, it seems like in practice humans should be a lot more comparable than theoretical agentic entities like I discuss in the post.

Trying to summarize your current beliefs (harder than it looks) is one of the best way to have very novel new thoughts IME.

Sounds similar to Noether's Theorem in some ways when you take that theorem philosophically and not just mathematically.

Two separate size parameters. The size of the search space, and the size the traversal algorithm needs to be to span the same gaps brains did.

This requires hitting a window - our data needs to be good enough that the system can tell it should use human values as a proxy, but bad enough that the system can’t figure out the specifics of the data-collection process enough to model it directly. This window may not even exist.

I like this framing, it is clarifying.

When alignment-by-default works, it’s basically a best-case scenario, so we can safely use the system to design a successor without worrying about amplification of alignment errors (among other things).

didn't understand how this was derived or what other results/ideas it is referencing.

The idea here is that the AI has a rough model of human values, and is pointed at those values when making decisions (e.g. the embedding is known and it's optimizing for the embedded values, in the case of an optimizer). It may not have perfect knowledge of human values, but it would e.g. design its successor to build a more precise model of human values than itself (assuming it expects that successor to have more relevant data) and point the successor toward that model, because that's the action which best optimizes for its current notion of human values. Contrast to e.g. an AI which is optimizing for human approval. If it can do things which makes a human approve, even though the human doesn't actually want those things (e.g. deceptive behavior), then it will do so. When that AI designs its successor, it will want the successor to be even better at gaining human approval, which means making the successor even better at deception. This probably needs more explanation, but I'm not sure which parts need more explanation, so feedback would be appreciated.
Load More