Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

This is very cool, thanks for this post.

Some remarks:

Playing at an intersection with no liberty is forbidden, unless the play results in capture

This is true, but the capture can be of your own stones. That is, Leela Zero is trained under Tromp-Taylor rules where self-capture is legal. So there isn't any forbidding of moves due to just liberties. Single stone suicide is still illegal, but only by virtue of the fact that self-capture of a single stone would repeat the board position, but you can suicide multiple stones.

However, there is still of course a question of how liberty counting works. See https://github.com/leela-zero/leela-zero/issues/877 for some past discoveries of positions where Leela Zero is unable to determine when a large group is in atari or would become in atari after certain move(s). This suggests that the neural nets do not learn a general and correct algorithm, instead they learn a bunch of heuristics on top of heuristics that work almost all of the time but have enough combinatorically many nooks and crannies that are not all forced to be correct due to rarity of some of the cases.

Note that if you are going to investigate liberty counting, you should expect that the neural net likely counts a much more fuzzy and intricate concept than literal liberties. As an expert Go player I would expect it almost certainly primarily focuses on concepts that are much more tactically relevant, like "fighting liberties", e.g. how many realistic moves the opponent would need to fill a group accounting for necessary approach moves, recaptures, etc. For example a group with literal 2 liberties but where one liberty was an eye and the other would be self-atari by the opponent unless they made a preparatory connection first would have 3 fighting liberties because actually capturing the group would take 3 moves under realistic play, not 2.

The fighting liberty count is also somewhat multidimensional, because it can vary depending on the particular tactical objective. You might have 4 fighting liberties against a particular group, but 6 against a different group because 2 of the liberties or approach moves required are only "effective" for one objective and not the other. Also purely integer values don't suffice because fighting liberty count can depend on things like a ko.

The fact that this not-entirely-rigorously-defined concept of "fighting liberties" is what expert play in Go cares about (at least, expert human players) perhaps makes it also less surprising why a net might not implement a "correct" and "general" algorithm for liberty counting but might instead end up with a pile of hacks and heuristics that don't always work.

I suspect will be easier to investigate the counting of eyes than liberties, and might suggest to focus on that first if you investigate further. The presence of an eye is much more discrete and less multidimensional than the fighting liberty count, and you only need to count up to 2, so there are fewer behavior patterns in the activations that will need to be distinguished by an analysis.

A particularly interesting application would be to understand Wang et al (2022)’s adversarial policies against KataGo. For example, one of the adversarial policies essentially amounts to confusing the model about the number of liberties that a large circular group has, and capturing it “without the model noticing”. This attack somewhat transfers to Leela Zero as well. Can we find a mechanism for liberty-counting that explains the success of this attack?

See https://www.lesswrong.com/posts/Es6cinTyuTq3YAcoK/there-are-probably-no-superhuman-go-ais-strong-human-players?commentId=gAEovdd5iGsfZ48H3 where I give a hypothesis for what the mechanism is, which I think is at a high-level more likely than not to be roughly what's happening.

I also currently believe based some playing around with the nets a long time ago and seeing the same misevaluations across all 4 different independently trained AlphaZero-based agents I tested that the misevaluation probably transfers very well, if not the attack. I.e. if the attack doesn't transfer perfectly, it's probably to do with the happenstance of the preferences of the agent earlier - for example it's probably easier to exploit agents that are trying to also sharply maximize score since they will tend to ignore your moves more and give you more moves in a row to do things - and not because any of these replications anticipate or evaluate the attack itself much better or worse.

However, here again I would strongly recommend investigating eyes, not liberties. All end-to-end observational experimentation with the models I've done suggests that exactly the same misevaluation is happening with respect to eyes. And the analysis of any miscounting of eyes on the predictions should be far crisper than for liberties.

In particular, if you overcount eyes by propagating a partial eye count multiple times around a cycle, you will predict a group is absolutely alive even when it isn't alive, because for groups with 2,3,4,5,... eyes, the data is nearly absolutely consistent that such groups are alive independent of anything else around them.

By contrast, if you overcount liberties and think a group has a "very large" number of liberties, you might not always predict the group is alive. For example what if every opposing group has 2 eyes and is therefore immortally strong, whereas the large-liberty group is surrounded and has no eyes? The training data probably doesn't so sharply constrain how different nets will generalize to rare "over-large" liberty counts because generally how liberty counts affect statuses is contextual rather than absolute like eye count. So one would expect that when a net mistakenly overcounts liberties, the effect could also be harder to analyze due to being contextual and not always consistent.

Right before the game ends, the value of the board should come down to which player has more territory. Is the model calculating each player’s territory, and if so, how?

I would say almost certainly yes, but possibly not "exactly" territory, maybe something like certainty-adjusted or variance-adjusted territory, with hacks for different kinds of sources of uncertainty. But yes, almost certainly something that correlates very well with territory.

I did some visualizations of activations of a smaller net way back in the past that shows something pretty much like this. https://github.com/lightvector/GoNN#global-pooled-properties-dec-2017

Although this architecture isn't quite a plain resnet (in particular, this is the visualization of the conv layer output prior to a global pooling layer) , it shows that the neural net learned a concept very recognizable to a Go player as having to do with predicted ownership or territory. And it learned this concept without ever being trained to predict the game outcome or score! In this case, the relevant net was trained solely to predict the next move a human player played in a position.

The fact that this kind of concept can arise automatically from pure move prediction also gives some additional intuitive force for why the original AlphaGoZero work found such a huge improvement from using the same neural net to predict both value and policy. Pure policy prediction is already capable of automatically generating the internal feature you would need to get basic value prediction working.

^{^}

Search refers to rolling out the game tree using Monte-Carlo Tree Search, aka “looking ahead”. For any given model, the more search it conducts the stronger it becomes. Also, this statement is now not completely true, see Wang et al's adversarial attack work.

^{^}

This is not quite accurate. Since the MCTS is rolled out using the policy head’s output, the policy head is in fact trained towards a fixed point of MCTS.

^{^}

A clarification on the confusing naming of DeepMind models: AlphaGo is the original Go model that first achieved superhuman performance. It introduced the architecture of a value network and a policy network, and used self-play to improve, but was initially trained on human data. Later came AlphaGo Zero which has the same design as AlphaGo, but used MCTS in self-play and was trained without any human input. Then came AlphaZero, which is the general algorithm that could be applied to basically any game, including chess and shogi.

^{^}

A more detailed explanation: The original AlphaGo Zero design has one feature plane that is all ones when black is to move, and all zeros when white is to move. When convolutions go past the edges of the board, with zero padding, it implicitly extends the feature plane with zeros in all directions. This means that when the feature plane is all ones, there is a boundary between on- and off-board, but when the feature plane is all zeros, there is none. In the orginal AlphaGo Zero design, black always gets the all-one feature and therefore some information about the size of the board, while white always gets the all-zero feature and no information; white still infers the board size, of course, by the fact that there are never stones placed outside of the board.

^{^}

To follow the content below you will likely need some familiarity with the rules of Go. If you don’t know the rules or need a refresher, see here for an introduction to Go.

^{^}

A caveat here: We in fact only used two possible starting lengths for the ladder, as we found that the judgement of the ladder is much worse for shorter and longer starting ladders. While it might appear surprising that the judgement is worse for longer ladders, we believe that it’s because ladders almost never get played out in real games of Go, and longer ladders are very rare in the training set.

^{^}

We also know from previous work that residual networks behave like ensembles of relatively shallow networks, and the effective paths in residual networks are relatively shallow. However, the type of residual network studied in Veit et al do not have nonlinearities between blocks, and is a “truer” residual network.

^{^}

This might seem obvious, but the model a priori has no reason to do this! In fact, this is a bit of a lie. We found that sometimes the model stores information about location (i, j) at the (i, j+1) position. See below.

^{^}

From looking at the activations of many channels across many boards, most channels show persistency for many layers at a time, if not throughout the whole model. Sometimes a channel gets reset partway through the model and changes role. Because of the ReLU applied to the residual stream at each layer, resetting just requires writing large negative numbers to the residual stream.

^{^}

Note that here I’m using the convention in computer vision that the origin is the top-left corner, and the first coordinate increases to the right, and the second coordinate increases to the bottom. This is unfortunately transposed from the convention for matrix entries—A[i, j] will show up at (j, i) on the board.

^{^}

One might wonder if the starting direction of the ladder matters as well, i.e. start with two horizontal stones vs two vertical ones. We found that this distinction doesn’t matter, which makes sense since in real games of Go, the start of a ladder can take many different shapes, and the important thing is its future trajectory.

^{^}

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

71

Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

71

Introduction

Prior work

Setup

Model

The ladder behavior

Dataset

Methods

Results: General properties

Preferred basis and “logit lens”

Edges of the board are used to store global information

Value information is stored in checkerboard patterns

Results: Ladder breaker detection

Equivariance/Symmetry

The circuit is pretty big …

… but has a small, early bottleneck

Breaker indicator ≠ ladder detector

Conclusion and future work

Acknowledgement

Citation information

Appendix A: Details on policy head

Appendix B: Details on value head

Appendix C: Algorithmic tasks in Go

Identifying legal moves

Escaping atari

Counting territories