AI ALIGNMENT FORUM
AF

Wikitags

Gears-Level

Edited by brook, et al. last updated 4th Dec 2022

A gears-level model is 'well-constrained' in the sense that there is a strong connection between each of the things you observe-- it would be hard for you to imagine that one of the variables could be different while all of the others remained the same.

Related Tags: , , , ,


The term gears-level was first described on LW in the post "Gears in Understanding":

This property is how deterministically interconnected the variables of the model are. There are a few tests I know of to see to what extent a model has this property, though I don't know if this list is exhaustive and would be a little surprised if it were:
1. Does the model pay rent? If it does, and if it were falsified, how much (and how precisely) could you infer other things from the falsification?
2. How incoherent is it to imagine that the model is accurate but that a given variable could be different?
3. If you knew the model were accurate but you were to forget the value of one variable, could you rederive it?

An example from Gears in Understanding of a gears-level model is (surprise) a box of gears. If you can see a series of interlocked gears, alternately turning clockwise, then counterclockwise, and so on, then you're able to anticipate the direction of any given, even if you cannot see it. It would be very difficult to imagine all of the gears turning as they are but only one of them changing direction whilst remaining interlocked. And finally, you would be able to rederive the direction of any given gear if you forgot it.


Note that the author of Gears in Understanding, Valentine, was careful to point out that these tests do not fully define the property 'gears-level', and that "Gears-ness is not the same as goodness"-- there are other things that are valuable in a model, and many things cannot practically be modelled in this fashion. If you intend to use the term it is highly recommended you read the post beforehand, as the concept is not easily defined.

Empiricism
Subscribe
3
Subscribe
3
Falsifiability
Anticipated Experiences
Double-Crux
Map and Territory
Discussion0
Discussion0
Posts tagged Gears-Level
25Toward a New Technical Explanation of Technical Explanation
Abram Demski
7y
2
50Attempted Gears Analysis of AGI Intervention Discussion With Eliezer
Zvi
4y
0
57Evolution of Modularity
johnswentworth
6y
6
39A Case for the Least Forgiving Take On Alignment
Thane Ruthenis
2y
18
15Abstraction, Evolution and Gears
johnswentworth
5y
4
78interpreting GPT: the logit lens
nostalgebraist
5y
14
38Current themes in mechanistic interpretability research
Lee Sharkey, Sid Black, Beren Millidge
3y
2
34Decision Transformer Interpretability
Joseph Isaac Bloom, Paul Colognese
2y
6
11Beware of black boxes in AI alignment research
Vladimir Slepnev
7y
0
17Value Formation: An Overarching Model
Thane Ruthenis
3y
10
Add Posts