Against Time in Agent Models

[-]Vladimir_Nesov4y*170

I like specialization preorder as a setting for formulating these concepts. In a topological space, point y is stronger (more specialized) than point x iff all opens containing x also contain y. If opens are thought of as propositions, and specialization order as a kind of ("logical") time, with stronger points being in the future of weaker points, then this says that propositions must be valid with respect to time (that is, we want to only allow propositions that don't get invalidated). This setting motivates thinking of points not as objects of study, but as partial observations of objects of study, their shadows that develop according to specialization preorder. If a proposition is true about some partial observation of an object (a point of the space), it remains true when it develops further (in the future, for stronger points). The best we can capture objects of study is with neighborhood filters, but the conceptual distinction suggests that even in a sober space the objects of study are not necessarily points, they are merely observed through points.

This is just what Scott domains or more generally algebraic dcpos with Scott topology talk about, when we start with a poset of finite observations (about computations, the elusive objects of study), which is the specialization preorder of its Alexandrov topology, which then becomes Scott topology after soberification, adding points for partial observations that can be expressed in terms of Alexandrov opens on finite observations. Specialization order follows a computation, and opens formulate semidecidable properties. There are two different ways in which a computation is approximated: with a weaker observation/point, and with a weaker specification/proposition/open. One nice thing here is that we can recover points from opens, and then finite observations from the specialization poset of all partial observations/theories/ideals (as compact elements of a poset). So the different concepts fit together well, the rhetoric of observations and logical time has technical content that can be extracted just from the opens/propositions. This becomes even more concrete for coherent spaces, where finite observations are finite cliques on webs (of "atomic observations").

(This is mostly a keyword dump, pointing to standard theory that offers a way of making sense of logical time. The interesting questions are about how to make use of this to formulate FDT and avoid spurious proofs, possibly by reasoning at a particular point of a space, the logical moment of decision, without making use of its future. A distinction this point of view enforces that is usually missing in discussions of decision theory or reasoning about programs is between approximation with weaker observations vs. with weaker propositions. This is the distinction between different logical times and different states of knowledge about a computation.)

[-]abramdemski4y40

If opens are thought of as propositions, and specialization order as a kind of ("logical") time,

Up to here made sense.

with stronger points being in the future of weaker points, then this says that propositions must be valid with respect to time (that is, we want to only allow propositions that don't get invalidated).

After here I was lost. Which propositions are valid with respect to time? How can we only allow propositions which don't get invalidated (EG if we don't know yet which will and will not be), and also, why do we want that?

This setting motivates thinking of points not as objects of study, but as partial observations of objects of study, their shadows that develop according to specialization preorder. [...] The best we can capture objects of study is with neighborhood filters, [...] start with a poset of finite observations (about computations, the elusive objects of study),

You're saying a lot about what the "objects of study" are and aren't, but not very concretely, and I'm not getting the intuition for why this is important. I'm used to the idea that the points aren't really the objects of study in topology; the opens are the more central structure.

But the important question for a proposed modeling language is how well it models what we're after.

This is mostly a keyword dump, pointing to standard theory that offers a way of making sense of logical time.

It seems like you are trying to do something similar to what cartesian frames and finite factored sets are doing, when they reconstruct time-like relationships from other (purportedly more basic) terms. Would you care to compare the reconstructions of time you're gesturing at to those provided by cartesian frames and/or finite factored sets?

[-]Vladimir_Nesov4y10

Which propositions are valid with respect to time? How can we only allow propositions which don't get invalidated (EG if we don't know yet which will and will not be), and also, why do we want that?

This was just defining/motivating terms (including "validity") for this context, the technical answer is to look at the definition of specialization preorder, when it's being suggestively called "logical time". If an open is a "proposition", and a point being contained in an open is "proposition is true at that point", and a point stronger in specialization order than another point is "in the future of the other point", then in these terms we can say that "if a proposition is true at a point, it's also true at a future point", or that "propositions are valid with respect to time going forward", in the sense that their truth is preserved when moving from a point to a future point.

Logical time is intended to capture decision making, with future decisions advancing the agent's point of view in logical time. So if an agent reasons only in terms of propositions valid with respect to advancement of logical time, then any knowledge it accumulated remains valid as it makes decisions, that's some of the motivation for looking into reasoning in terms of such propositions.

You're saying a lot about what the "objects of study" are and aren't, but not very concretely, and I'm not getting the intuition for why this is important.

This is mostly about how domain theory describes computations, the interesting thing is how the computations are not necessarily in the domains at all, they only leave observations there, and it's the observations that the opens are ostensibly talking about, yet the goal might be to understand the computations, not just the observations (in program semantics, the goal is often to understand just the observations though, and a computation might be defined to only be its observed behavior). So one point I wanted to make is to push against the perspective where points of a space are what the logic of opens is intended to reason about, when the topology is not Frechet (has nontrivial specialization preorder).

But the important question for a proposed modeling language is how well it models what we're after.

Yeah, I've got nothing, just a sense of direction and a lot of theory to study, or else there would've been a post, not just a comment triggered by something on a vaguely similar topic. So this thread is in the same spirit as a comment I left a few months ago to a crackpot post, but that one was even more speculative, somewhat appropriate in a place like that...

[-]AlexMennen4y50

This seems related in spirit to the fact that time is only partially ordered in physics as well. You could even use special relativity to make a model for concurrency ambiguity in parallel computing: each processor is a parallel worldline, detecting and sending signals at points in spacetime that are spacelike-separated from when the other processors are doing these things. The database follows some unknown worldline, continuously broadcasts its contents, and updates its contents when it receives instructions to do so. The set of possible ways that the processors and database end up interacting should match the parallel computation model. This makes me think that intuitions about time that were developed to be consistent with special relativity should be fine to also use for computation.

[-]Vladimir_Nesov4y10

If you mark something like causally inescapable subsets of spacetime (not sure how this should be called), which are something like all unions of future lightcones, as open sets, then specialization preorder on spacetime points will agree with time. This topology on spacetime is non-Frechet (has nontrivial specialization preorder), while the relative topologies it gives on space-like subspaces (loci of states of the world "at a given time" in a loose sense) are Hausdorff, the standard way of giving a topology for such spaces. This seems like the most straightforward setting for treating physical time as logical time.

[-]Gordon Seidoh Worley4y40

For what it's worth, I think this is trying to get at the same insight as logical time but via a different path.

For the curious reader, this is also the same reason we use vector clocks to build distributed systems when we can't synchronize the clocks very well.

And there's something quite interesting about computation as a partial order. It might seem that this only comes up when you have a "distributed" system, but actually you need partial orders to reason about unitary programs when they are non-deterministic (any program with loops and conditionals that can't be unrolled because they depend on inputs not known before runtime are non-deterministic in this sense). For this reason, partial orders are the bread-and-butter of program verification.

[-]Donald Hobson4y30

This fails if there are closed timelike curves around.

There is of course a very general formalism, whereby inputs and outputs are combined into aputs. Physical laws of causality, and restrictions like running on a reversible computer are just restrictions on the subsets of aputs accepted.

[-]Ramana Kumar4y30

It's possible that reality is even worse than this post suggests, from the perspective of someone keen on using models with an intuitive treatment of time. I'm thinking of things like "relaxed-memory concurrency" (or "weak memory models") where there is no sequentially consistent ordering of events. The classic example is where these two programs run in parallel, with X and Y initially both holding 0, [write 1 to X; read Y into R1] || [write 1 to Y; read X into R2], and after both programs finish both R1 and R2 contain 0. What's going on here is that the level of abstraction matters: writing and reading from registers are not atomic operations, but if you thought they were you're gonna get confused if you expect sequential consistency.

Total ordering: there's only one possible ordering of all operations, and everyone knows it. (or there's just one agent in a cybernetic interaction loop.)
Sequential consistency: everyone knows the order of their own operations, but not how they are interleaved with others' operations (as in this post)
Weak memory: everyone knows the order of their own operations, but others' operations may be doing stuff to shared resources that aren't compatible with any interleaving of the operations

See e.g., https://www.cl.cam.ac.uk/~pes20/papers/topics.html#relaxed or this blog for more https://preshing.com/20120930/weak-vs-strong-memory-models/.

[-]Maxwell Clarke4y00

(Edited a lot from when originally posted)

(For more info on consistency see the diagram here: https://jepsen.io/consistency )

I think that the prompt to think about partially ordered time naturally leads one to think about consistency levels - but when thinking about agency, I think it makes more sense to just think about DAGs of events, not reads and writes. Low-level reality doesn't really have anything that looks like key-value memory. (Although maybe brains do?) And I think there's no maintaining of invariants in low-level reality, just cause and effect.

Maintaining invariants under eventual (or causal?) consistency might be an interesting way to think about minds. In particular, I think making minds and alignment strategies work under "causal consistency" (which is the strongest consistency level that can be maintained under latency / partitions between replicas), is an important thing to do. It might happen naturally though, if an agent is trained in a distributed environment.

So I think "strong eventual consistency" (CRDTs) and causal consistency are probably more interesting consistency levels to think about in this context than the really weak ones.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

22

22

Beyond Distributed Programming