I was thinking about the embedded agency sequence again last week, and thought “It is very challenging to act and reason from within an automaton.” Making agents which live in automata and act well in dilemmas is a concrete task for which progress can be easily gauged. I discussed this with a group, and we came up with desiderata for an automaton in which interesting agent strategies emerge.
In this post, I list some inspirations, some desiderata for this testing space, and then a sketch of a specific implementation.
The embedded agency paper puts out four main difficulties of being embedded. The goal here is to design an automaton (i.e. agent environment) which can represent dilemmas whose solutions require insights in these four domains.
I draw concepts from four different automatons.
Core war is close to what we need I think. It’s a game that takes place on (e.g.) a megabyte of RAM, where one assembly instruction takes up one memory slot, agents are assembly programs, and they fight each other in this RAM space. It has several relevant features such as determinism, lack of input/output/interference, and fully embedded computation. I suspect limiting range of sight could make core war more interesting from an artificial-life point of view, because then agents need to travel around to see the world.
Conway’s game of life is Turing complete, so can allow arbitrary program specification in principle, and so can encode basically anything you can write in a programming language. It’s extremely simple but I think too verbose to hand-write dilemmas in it.
Botworld 1.0 is a cellular automaton designed to be interesting in an embedded-agency sense and it’s useful but pretty complex. They do encode interesting problems such as stag hunt and prisoner’s dilemma. I think something closer to core war, which is more parsimonious, would be more fruitful to design programs in.
Real life is maybe basically an automaton. Almost all effects are localized, with waves and particles traveling at most the speed of light. The transition function is nondeterministic and there are real numbers and complex numbers in the mix, but I don’t know if any of this makes a big difference from an agent-design point of view. We still have all the problems of decision theory, world models, robust delegation, and subsystem alignment.
The 5 & 10 problem, (twin) prisoner’s dilemma, transparent Newcomb problem, death in Damascus, stag hunt, and other problems should at least be expressible in the automaton. The 5 & 10 problem especially needs to be representable. Getting the agent to know what the situation is and getting it to act well are both separate problems from setting up the dilemma.
In order to decide between $5 and $10, the agent first needs to know the two available actions and their utilities. Of course you can not endow your agent with this knowledge because the same agent code needs to work in many dilemmas, plus it’s trivial to make an agent which passes one dilemma.
So how should agents find and use knowledge from the world? Core wars programs (called “warriors”) use conditional jumps ; there’s no notion of reading in or outputting on a value, but you can condition your action on a specific value at a specific location. I think this should be how agents use information and make decisions at the lowest level. (Maybe any other way of reading and using knowledge is reducible to something like this anyway?)
 “If a < b jump to instruction x else jump to instruction y”
a < b
My hope for creating this automaton is that I (and others) will design agents for it which use self-knowledge & successor-building & world-modeling as emergent strategies; those strategies should not be explicitly advantaged by the physics. Yet agent designers need some sort of objective to optimize when designing agents; it needs to be clear when one agent is better than another in a given environment. The best solution I can think of is to have “dollars” lying around in the world, and the objective of agent-designers is to have the agent collect as many dollars as possible.
An environment includes insertion points for where agents should begin at the first timestep and the max agent size (or other constraints). The command-line utility takes in an environment file and the appropriate number of agent files and returns the number of dollars that each agent get in that world. So you could put agent1 in transparent Newcomb, then try with agent2, and see which did better and how much money they made. There could also be an option for logging or interrupting & modifying the environment or something.
In general, nothing within a world can do perfect simulation of any portion of the world including itself, because the simulator is always too small, but it is possible to do pretty-good prediction. Some of the most interesting dilemmas require the presence of reliable predictors, and some of our hardest decisions in life are hard because other people are predicting us, so we want predictors to be possible within ordinary world-physics. Call the reliable predictor “Omega”.
We need agents to understand what Omega is and what it’s doing but somehow not be able to screw with it. This could be done with read-only zones or by giving Omega ten turns before the agent gets one turn; the turn-management could be done with “energy tokens” which agents spend as actions.
Omega also needs to somehow safely execute the agent without getting stuck in infinite loops or allowing the simulation to escape or move the money around or something. I have no idea about this part. Perhaps the reliable predictor should just sit outside the universe. Or we could just say that it’s against the spirit of the game to screw with Omega.
operation A B
I don’t know if this is sufficient or well-designed but here’s my current idea for an automaton. I am mostly copying core wars.
(isMoney, hasInstructionPointer, value)
DAT 7 14
I could publish a collection of public environments and create some private environments too. People can design agents which score well in the public environment, then submit it for scoring on the private environments, like a Kaggle contest. This, like any train/test split, would reduce overfitting.
Two programs in two places in memory. Left is somehow defect and right is somehow cooperate. Agents can see each other’s code and reason about what the other will do. Omega kills everyone in 1000 timesteps if no decision is reached or something.
Omega copies the same agent to two places in memory. Left is defect and right is cooperate.
Omega has two boxes and it gives the agent access to one or the other depending on its behavior.
Something like this code:
Probably very flawed but…
I’ll briefly raise and attempt to respond to some modes of failure for this project.
It could turn out that the interesting/challenging part of the embedded agency questions is in drawing the boundary around the agent, so giving the sole starting location of the agent is dodging the most important problem. I think that this problem is fully explored, however, if we somehow pause the agent and let some other things copy & use its code before the agent runs. Then the agent must figure out what has happened and imagine other outcomes before it chooses actions.
The human or evolutionary algorithm or whatever designing the agents is indeed outside of the universe, and cannot directly suffer consequences, be modified, etc. However, they cannot interfere once the simulation has started, and any knowledge they have must fully live in their program in order for it to succeed in a variety of environments. I think that, if you design an agent which passes 5&10, Newcomb, prisoner’s dilemma, etc, then you must have made some insights along the way. Otherwise, maybe these problems were easier than I thought.
This is maybe the most likely way for this project to fail, conditioning on me actually doing the work. I would say that, even in this case, we can learn some about the agent by running experiments on it or somehow asking it questions, like how we analyze humans.
Automatons are a more accurate model of the difficulties of agency in the real world than reinforcement learning problems, so we need to do more task-design, agent-design, and general experimentation in this space. My plan is to create an automaton, used as a command-line utility, which will run a given set of agents in a given environment (e.g. prisoner’s dilemma). Ideally, we’ll have a large set of task environments, and we can design agents with the goal of generality.
It seems like collecting dollars requires a hard-coded notion of how to draw the boundary around the agents, which runs contrary to the intention. It seems more natural for require the agents to strive to change the world in a particular way (e.g. maximize the number of rubes in the world).
Yes I agree it feels fishy. The problem with maximizing rubes is that the dilemmas might get lost in the detail of preventing rube hacking. Perhaps agents can "paint" existing money their own color, and money can only be painted once, and agents want to paint as much money as possible. Then the details remain in the env
Or something simpler would be that the agent's money counter is in the environment but unmodifiable except by getting tokens, and the agent's goal is to maximize this quantity. Feels kind of fake maybe because money gives the agent no power or intelligence, but it's a valid object-in-the-world to have a preference over the state of.
Yet another option is to have the agent maximize energy tokens (which actions consume)
the objective of agent-designers is to have the agent collect as many agents as possible
Typo: should say "dollars"?