Tangentially related: recent discussion raising a seemingly surprising point about LLM's being lossless compression finders https://www.youtube.com/watch?v=dO4TPJkeaaU
The first intuition pump that comes to mind for distinguishing mechanisms is examining how my brain generates and assigns credence to the hypothesis that something going wrong with my car is a sensor malfunction vs telling me about a problem in the world that the sensor exists to alert me to.
One thing that happens is that the broken sensor implies a much larger space of worlds because it can vary arbitrarily instead of only in tight informational coupling with the underlying physical system. So fluctuations outside the historical behavior of the sensor eit...
Whether or not details (and lots of specific detail arguments) matter hinges on the sensitivity argument (which is an argument about basins?) in general, so I'd like to see that addressed directly. What are the arguments for high sensitivity worlds other than anthropics? What is the detailed anthropic argument?
Rambling/riffing: Boundaries typically need holes in order to be useful. Depending on the level of abstraction, different things can be thought of as holes. One way to think of a boundary is a place where a rule is enforced consistently, and this probably involves pushing what would be a continuous condition into a condition with a few semi discrete modes (in the simplest case enforcing a bimodal distribution of outcomes). In practice, living systems seem to have settled on stacking a bunch of one dimensional gate keepers together as presumably the modular...
Many proposals seem doomed to me because they involve one or multiple steps where they assume a representation, then try to point to robust relations in the representation and hope they'll hold in the territory. This wouldn't be so bad on its own but when pointed to it seems like handwaving happens rather than something more like conceptual engineering. I am relatively more hopeful about John's approach as being one that doesn't fail to halt and catch fire at these underspecified steps in other plans. In other areas like math and physics we try to get the ...
This is exactly what I was thinking about though, this idea of monitoring every human on earth seems like a failure of imagination on our part. I'm not safe from predators because I monitor the location of every predator on earth. I admit that many (overwhelming majority probably) of scenarios in this vein are probably pretty bad and involve things like putting only a few humans on ice while getting rid of the rest.
I guess the threat model relies on the overhang. If you need x compute for powerful ai, then you need to control more than all the compute on earth minus x to ensure safety, or something like that. Controlling the people probably much easier.
New-to-me thought I had in response to the kill all humans part. When predators are a threat to you, you of course shoot them. But once you invent cheap tech that can control them you don't need to kill them anymore. The story goes that the AI would kill us either because we are a threat or because we are irrelevant. It seems to me that (and this imports a bunch of extra stuff that would require analysis to turn this into a serious analysis, this is just an idle thought), the first thing I do if I am superintelligent and wanting to secure my position is no...
If humans were able to make one super-powerful AI, then humans would probably be able to make a second super-powerful AI, with different goals, which would then compete with the first AI. Unless, of course, the humans are somehow prevented from making more AIs, e.g. because they're all dead.
I would summarize a dimension of the difficulty like this. There are the conditions that give rise to intellectual scenes, intellectual scenes being necessary for novel work in ambiguous domains. There are the conditions that give rise to the sort of orgs that output actions consistent with something like Six Dimensions of Operational Adequacy. The intersection of these two things is incredibly rare but not unheard of. The Manhattan Project was a Scene that had security mindset. This is why I am not that hopeful. Humans are not the ones building the AGI, e...
As recent experience has shown, exponential processes don't need to be smarter than us to utterly upend our way of life. They can go from a few problems here and there to swamping all other considerations in a span of time too fast to react to, if preparations aren't made and those knowledgeable don't have the leeway to act. We are in the early stages of an exponential increase in the power of AI algorithms over human life, and people who work directly on these problems are sounding the alarm right now. It is plausible that we will soon have processes that...
I particularly appreciate the questions that ask one to look at a way that a problem was reified/specified/ontologized in a particular domain and asks for alternative such specifications. I thought Superintelligence (2014) might be net harmful because it introduced a lot of such specifications that I then noticed were hard to think around. I think there are a subset of prompts from the online course/book Framestorming that might be useful there, I'll go see if I can find them.
It seems like the frame of some of the critique is that humans are the authority on human values and want to ensure that the AI doesn't escape that authority in illegible ways. To me it seems like the frame is more like we know that the sensors we have are only goodhartedly entangled with the things we care about and would ourselves prefer the less goodharted hypothetical sensors if we knew how to construct them. And that we'd want the AI to be inhabiting the same frame as us since, to take a page from Mad Investor Chaos, we don't know how lies will propag...
You may not be interested in mutually exclusive compression schemas, but mutually exclusive compression schemas are interested in you. One nice thing is that given that the schemas use an arbitrary key to handshake with there is hope that they can be convinced to all get on the same arbitrary key without loss of useful structure.
Spoiler tags are borked the way I'm using them.
anyway, another place to try your hand at calibration:
Humbali: No. You're expressing absolute certainty in your underlying epistemology and your entire probability distribution
no he isn't, why?
Humbali is asking for Eliezer to double count evidence. Consilience is hard if you don't do your homework on provenance of heuristic and not just naively counting up outputs who themselves also didn't do their homework.
Or in other words: "Do not cite the deep evidence to me, I was there when it was written"
And another ...
Are we talking about the same thing?
Tangential, but did you ever happen to read statistical physics of human cooperation?
Defining a distance function between two patterns might yield some interesting stuff and allow some porting in of existing math from information theory. There is also the dynamic case (converging and diverging distances) between different patterns. Seems like it could play into robustness eg sensitivity of patterns to flipping from convergent to divergent state.
I understand, thought it was worth commenting on anyway.
the small size of the human genome suggests that brain design is simple
Bounds, yes but the bound can be quite high due to offloading much of the compression to the environment.
Is a sensitivity analysis of the model separated out anywhere? I might just be missing it.
Detecting preferences in agents: how many assumptions need to be made?
I'm interpreting this to be asking how to detect the dimensionality of the natural embedding of preferences?
Related to sensitivity of instrumental convergence. i.e. the question of whether we live in a universe of strong or weak instrumental convergence. In a strong instrumental convergence universe, most possible optimizers wind up in a relatively small space of configurations regardless of starting conditions, while in a weak one they may diverge arbitrarily in design space. This can be thought of one way of crisping up concepts around orthogonality. e.g. in some universes orthogonality would be locally true but globally false, or vice versa, or locally and globally true or vice versa.
- First-person vs. third-person: In a first-person perspective, the agent is central. In a third-person perspective, we take a “birds-eye” view of the world, of which the agent is just one part.
- Static vs. dynamic: In a dynamic perspective, the notion of time is explicitly present in the formalism. In a static perspective, we instead have beliefs directly about entire world-histories.
I think these are two instances of a general heuristic of treating what have traditionally been seen as philosophical positions (e.g. here cognitive and behavioral view...
This seems consistent with coherence being not a constraint but a dimension of optimization pressure among several/many? Like environments that money pump more reliably will have stronger coherence pressure, but also the creature might just install a cheap hack for avoiding that particular pump (if narrow) which then loosens the coherence pressure (coherence pressure sounds expensive, so workarounds are good deals).
I noticed myself being dismissive of this approach despite being potentially relevant to the way I've been thinking about things. Investigating that, I find that I've mostly been writing off anything that pattern matches to the 'cognitive architectures' family of approaches. The reason for this is that most such approaches want to reify modules and structure. And my current guess is that the brain doesn't have a canonical structure (at least, on the level of abstraction that cognitive architecture focuses on). That is to say, the modules are fluid and their connections to each other are contingent.
Hypothesis: in a predictive coding model, the bottom up processing is doing lossless compression and the top down processing is doing lossy compression. I feel excited about viewing more cognitive architecture problems through a lens of separating these steps.
There's a fairly straightforward optimization process that occurs in product development that I don't often see talked about in the abstract that goes something like this:
It seems like bigger firms should be able to produce higher quality goods. They can afford longer product development cycles, hire a broader variety of specialized labor, etc. In practice, it's smaller firms that compete on quality, why is this?
One of the reasons is that the pressure to cut corners increases enormously at scale along more than one dimension. As a product scales, eking out...
This is clarifying, thanks.
WRT the last paragraph, I'm thinking in terms of convergent vs divergent processes. So , fixed points I guess.
This is biting the bullet on the infinite regress horn of the Munchhausen trilemma, but given the finitude of human brain architecture I prefer biting the bullet on circular reasoning. We have a variety of overlays, like values, beliefs, goals, actions, etc. There is no canonical way they are wired together. We can hold some fixed as a basis while we modify others. We are a Ship of Neurath. Some parts of the ship feel more is-like (like the waterproofness of the hull) and some feel more ought-like (like the steering wheel).
I see CSC and SEM as highly linked via modularity of processes.
A pointer is sort of the ultimate in lossy compression. Just an index to the uncompressed data, like a legible compression library. Wireheading is a goodhearting problem, which is a lossy compression problem etc.
Over the last few posts the recurrent thought I have is "why aren't you talking about compression more explicitly?"
The other people of whom you have nude photos, who are now incentivised to pay up rather than kick up a fuss.
Releasing one photo from a previously believed to be secure set of photos, where other photos in the same set are compromising can suffice for single member audience case.
That's the Legalist interpretation of Confucianism. Confucianism argues that the Legalists are just moving the problem one level up the stack a la public choice theory. The point of the Confucian is that the stack has to ground out somewhere, and asks the question of how to roll our virtue intuitions into the problem space explicitly since otherwise we are rolling them in tacitly and doing some hand waving.
The main intuition this sparks in me is that it gives us concrete data structures to look for when talking broadly about the brain doing 'compression' by rotating a high dimensional object and carving off recognized chunks (simple distributions) in order to make the messy inputs more modular, composable, accessible, error correctable, etc. Sort of the way that predictive coding gives us a target to hunt for in looking for structures that look like they might be doing something like the atomic predictive coding unit.
Type theory for utility hypothesis: there are a certain distinct (small) number of pathways in the body that cause physical good feelings. Map those plus the location, duration, intensity, and frequency dimensions and you start to have comparability. This doesn't solve the motivation/meaning structures built on top of those pathways which have more degrees of freedom, but it's still a start. Also, those more complicated things built on top might just be scalar weightings and not change the dimensionality of the space.
Trying to summarize your current beliefs (harder than it looks) is one of the best way to have very novel new thoughts IME.
Sounds similar to Noether's Theorem in some ways when you take that theorem philosophically and not just mathematically.
Two separate size parameters. The size of the search space, and the size the traversal algorithm needs to be to span the same gaps brains did.
This requires hitting a window - our data needs to be good enough that the system can tell it should use human values as a proxy, but bad enough that the system can’t figure out the specifics of the data-collection process enough to model it directly. This window may not even exist.
I like this framing, it is clarifying.
When alignment-by-default works, it’s basically a best-case scenario, so we can safely use the system to design a successor without worrying about amplification of alignment errors (among other things).
didn't understand how this was derived or what other results/ideas it is referencing.
One related question is what sub-tasks of gpt-3 showed surprise jackpots vs gpt-2
Mild progress on intentional stance for me: take a themostat. Realized you can count up the number of different temperatures the sensor is capable of detecting, number of states that the actuator can do in response (in the case of the thermostat only on/off) and the function mapping between the two. This might start to give some sense of how you can build up a multidimensional map out of multiple sensors and actuators as you do some sort of function combination.
One compression: assistance in partitioning the hypothesis space. As opposed to finding the correct point in the search space from one shot learning.
Really like this post, great inroad into something I've been thinking about which is how to formalize self locating uncertainty in order to use it to build other things.
Also, relating back to low level biological systems: https://arxiv.org/abs/1506.06138
may be useful for tracing some key words and authors that have had some related ideas.