Over at Google, large language models have been plugged into physics simulators to help them share a world model with their human interlocutors, resulting in big performance gains. They call it Mind's Eye.

This is how the authors describe the work:

Correct and complete understanding of properties and interactions in the physical world is not only essential to achieve human-level reasoning (Lake et al., 2017), but also fundamental to build a general-purpose embodied intelligence (Huang et al., 2022). In this work, we investigate to what extent current LMs understand the basic rules and principles of the physical world, and describe how to ground their reasoning with the aid of simulation. Our contributions are three-fold:

 • We propose a new multi-task physics alignment dataset, UTOPIA, whose aim is to benchmark how well current LMs can understand and reason over some basic laws of physics (§2). The dataset contains 39 sub-tasks covering six common scenes that involve understanding basic principles of physics (e.g., conservation of momentum in elastic collisions), and all the ground-truth answers are automatically generated by a physics engine. We find that current large-scale LMs are still quite limited on many basic physics-related questions (24% accuracy of GPT-3 175B in zero-shot, and 38.2% in few-shot).

 • We explore a paradigm that adds physics simulation to the LM reasoning pipeline (§3) to make the reasoning grounded within the physical world. Specifically, we first use a model to transform the given text-form question into rendering code, and then run the corresponding simulation on a physics engine (i.e., MuJoCo (Todorov et al., 2012)). Finally we append the simulation results to the input prompts of LMs during inference. Our method can serve as a plug-and-play framework that works with any LM and requires neither handcrafted prompts nor costly fine-tuning.

 • We systematically evaluate the performance of popular LMs in different sizes on UTOPIA before and after augmentation by Mind’s Eye, and compare the augmented performance with many existing approaches (§4.2). We find Mind’s Eye outperforms other methods by a large margin in both zero-shot and few-shot settings. More importantly, Mind’s Eye is also effective for small LMs, and the performance with small LMs can be on par or even outperform that of 100× larger vanilla LMs.

This seems like a direct step down the CAIS path of development.

New to LessWrong?

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 8:13 AM

This seems like a direct step down the CAIS path of development.

Why? Making language models better at physics instinctively feels like the general-agent-y path to me.

Because the way they went about it was to give the language model access to a separate physics simulator (MuJoCo, owned by DeepMind) rather than something like the language model learning the rules of physics through a physics textbook or landing on some encoding of English tokens that happens to represent physics.

I interpreted having to go to a different engine to get better inputs for the language model as counting for multiple interacting services.

Would it satisfy your objections if the physics engine were a separate system, but that system was using only neural networks or other ML hypotheses capable of learning from feedback? So no hardcoded physics code, just a ML system that regresses between some input representation of the world and future predictions. And then we develop the representation with autoencoders so no human bias there either.

I don't think so, no - the way I understand it, any kind of separation into separate systems falls into the CAIS sphere of thought. This is because we are measuring things from the capability side, rather than the generation-of-capability side: for example, I think it is still CAIS even if we take the exact same ML architecture and train copies of it into different specializations which then use each other as services.

There are a couple of things worth distinguishing, though:

  1. That a sufficiently integrated CAIS is indistinguishable from a single general agent to us is what tells us CAIS isn't safe either.
  2. The arguments that a single general agent represents a different or more severe danger profile still motivate tracking the two paths differently.

I will also say I think this type of work could easily be part of the creation of a single general agent. If we consider Gato as the anchor for a general agent: many different tasks were captured in a single transformer, but as far as I understand it, Gato kept the input-to-task associations, which is to say the language inputs for the language tasks and the physics inputs for the physics tasks. But if the language model fed to Gato-2 uses this Mind's Eye technique, it would be possible to do physics tasks from a language input and maybe also explain the physics tasks as a language output.

So before it could respond to sentences with other sentences, and it could respond to equations with other equations, but now it can process ancient geometry books and Newton's Principia which use words to describe equations for certain, and maybe even compose outputs of a similar kind.

  1. That a sufficiently integrated CAIS is indistinguishable from a single general agent to us is what tells us CAIS isn't safe either.

 

Fleshing this point out, I think one can probably make conditional statistical arguments about safety here, to define what I think you are getting at with "sufficiently integrated".

If your model is N parameters and integrates a bunch of Services, and we've established that a SOTA physics model requires N*100 parameters (the OP paper suggests that OOM difference), then it is likely safe to say that the model has not "re-learned" physics in some way that would render it illegible. (I've been thinking of this configuration as "thin model, fat modules", maybe "thin model, fat services" fits better with the CAIS terminology). However another model at N*1000 would be able to embed a physics simulator, and therefore would be able to effectively re-implement it in an illegible/deceptive way.

Thinking about ways in which this safety margin could break; is it possible to have a thin mapping layer on top of your Physics simulator that somehow subverts or obfuscates it without having to fully re-implement it? For example, running two simple problems in the simulator, then merging/inverting them in the model layer? Intuitively I suspect it's hard to achieve much with this attack, but it's probably not possible to rule out in general.

Note that the "thin layer" is what you need to do regardless. If your machine observes, from either robotics or some video it found online, a situation. Say the situation is an object being dropped. Physics sim predicts a fairly vigorous bounce, actual object splats.

You need some way to correct the physics sims predictions to correspond with actual results, and the simplest way is a neural network that learns the mapping between (real world input, physics engine prediction) ->(real world observation).

You have to then use the predicted real world observation or probability distribution of predicted observations for your machine reasoning about what it should do.

I realized the reference "thin layer" is ambiguous in my post, just wanted to confirm if you were referring to the general case ""thin model, fat services", or the specific safety question at the bottom "is it possible to have a thin mapping layer on top of your Physics simulator that somehow subverts or obfuscates it"? My child reply assumed the former, but on consideration/re-reading I suspect the latter might be more likely?

Thinking about ways in which this safety margin could break; is it possible to have a thin mapping layer on top of your Physics simulator that somehow subverts or obfuscates it

I suppose that a mapping task might fall under the heading of a mesa-optimizer, where what it is doing is optimizing for fidelity between between the outputs of the language layer and the inputs of the physics layer. This would be in addition to the mesa-optimization going on just in the physics simulator. Working title:

CAIS: The Case For Maximum Mesa Optimizers

With "thin model / fat service" I'm attempting to contrast with the typical end-to-end model architecture, where there is no separate "physics simulator", and instead the model just learns its own physics model, embedded with all the other relations that it has learned. So under that dichotomy, I think there is no "thin layer in front of the physics simulation" in an end-to-end model, as any part of the physics simulator can connect to or be connected from any other part of the model.

In such an end-to-end model, it's really hard to figure out where the "physics simulator" (such as it is) even lives, and what kind of ontology it uses. I'm explicitly thinking of to the ELK paper's Ontology Identification problem here. How do we know the AI is being honest when it says "I predict that object X will be in position Y at time T"? If you can interpret its physics model, you can see what simulations it is running and know if it's being honest.

It's possible, if we completely solve interpretability, to extract the end-to-end model's physics simulator, I suppose. But if we have completely solved interpretability, I think we're most of the way towards solving alignment. On the other hand, if we use Services (e.g. MuJoCo for Physics) then we already have interpretability on that service, by construction, without having to interpret whatever structures the model has learned to solve physics problems.

You need some way to correct the physics sims predictions to correspond with actual results

Sure, in some sense an end-to-end model needs to normalize inputs, extract features, and so on, before invoking its physics knowledge. But the point I'm getting at is that with Services we know for sure what the model's representation has been normalized to (since we can just look at the data passed to MuJoCo), whereas this is (currently) opaque in an end-to-end model.

Note that you would probably never use such a "model learns it's own physics" as the solution in any production AI system.  For either the reason you gave, or more pedantically, so long as rival architectures using a human written physics system at the core perform more consistently in testing, you should never pick the less reliable solution.  

I suppose the follow-up question is: how effectively can a model learn to re-implement a physics simulator, if given access to it during training -- instead of being explicitly trained to generate XML config files to run the simulator during inference?

If it's substantially more efficient to use this paper's approach and train your model to use a general purpose (and transparent) physics simulator, I think this bodes well for interpretability in general. In the ELK formulation, this would enable Ontology Identification.

On this point, the paper says:

Mind’s Eye is also efficient, since it delegates domain-specific knowledge to external expert models... The size of the LM can thus be significantly shrunk since it removes the burden of memorizing all the domain-specific knowledge. Experiments find that 100× smaller LMs augmented with Mind’s Eye can achieve similar reasoning capabilities as vanilla large models, and its promptingbased nature avoids the instability issues of training mixture-of-expert models (Zoph et al., 2022). The compatibility with small LMs not only enables faster LM inference, but also saves time during model saving, storing, and sharing.

On the other hand, the general trend of "end-to-end trained is better than hand-crafted architectures" has been going strong in recent years; it's mentioned in the CAIS post, and Demis Hassabis noted that he thinks it's likely to continue in his interview by Lex Fridman (indeed they chatted quite a bit about using AI models to solve Physics problems). And indeed, DeepMind has a recent paper gesturing towards an end-to-end learned Physics model from video, which looks far less capable than the one shown in the OP, but two papers down the line, who knows.

This seems like a direct step down the CAIS path of development

Cooperative AI Systems/Services? Quick google search isn't finding it, working on only a vague memory.

Comprehensive AI Services.

Summary, original paper.

Yeah, I shoulda linked that. Fixing shortly, thanks to niplav in the meantime!

Basically their idea is that instead of having one agent that optimizes the hell out of its value function and bad things happen, have a collection of smaller components that each are working on a subproblem with limited resources. If you can do that and also aggregate them such that as a unit they are superhuman, you get a lot of the benefits without (at least some of) the big risks.

Here's a brief explainer with some objections.

Interesting. Figure 2 looks pretty impressive. What do the phrases like "log(350) = 2.5B" at the bottom of the figure mean?

That doesn't appear to be explained specifically, but what I think they are giving is the larger model size equivalence. That is to say, the 350M parameter language model with Mind's Eye is about as good as a 2.5B parameter language model, and so on.

I might be missing something but are they not just giving the number of parameters (in millions of parameters) on a log10 scale? Scaling laws are usually by log-parameters, and I suppose they felt that it was cleaner to subtract the constant log(10^6) from all the results (e.g. taking log(1300) instead of log(1.3B)).

The B they put at the end is a bit weird though.