Closely related to this is Atticus Geiger's work, which suggests a path to show that a neural network is actually implementing the intermediate computation. Rather than re-train the whole network, much better if you can locate and pull out the intermediate quantity! "In theory", his recent distributed alignment tools offer a way to do this.

Two questions about this approach:

1. Do neural networks actually do hierarchical operations, or prefer to "speed to the end" for basic problems?
2. Is it easy find the right `alignments' to identify the intermediate calculations?

Jury is still out on both of these, I think.

I tried to implement my own version of Atticus' distributed alignment search technique, on Atticus' hierarchical equality task as described in https://arxiv.org/pdf/2006.07968.pdf , where the net solves the task:

y (the outcome) = ((a = b) = (c = d)). I used a 3-layer MLP network where the inputs a,b,c,d are each given with 4 dimensions of initial embedding, and the unique items are random Gaussian.

The hope is that it forms the "concepts" (a=b) and (c=d) in a compact way;
But this might just be false?

~~Atticus has a paper which he tries to search for "alignments" on this problem neuron-by-neuron to the concepts (a=b) and (c=d), and couldn't find it.~~

~~Maybe the net is just skipping these constructs and going to straight to the end?~~

~~Or, maybe I'm just bad at searching! Quite possible. My implementation was slightly different from Atticus', and allowed the 4 dimensions to drift non-orthogonally;~~

Edit: Atticus says you should be able to separate the concepts, but only by giving each concept 8 of the 16 dimensions. I need to try this!

Incidentally, when I switched the net from RELU activation to a sigmoid activation, my searches for a 4-dimensional representation of (a=b) would start to fail at even recovering the variable (a=b) from the embedding dimensions [where it definitely exists as a 4-dimensional quantity! And I could successfully recover it with RELU activations]. So, this raises the possibility that the search can just be hard, due to the problem geometry...

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

9

Causality and a Cost Semantics for Neural Networks

9

Introduction

How I Would Like People to Engage with this Work

Cost Semantics

Definition

Basic Operations and Warm-Up

Building Out the Tech Tree

We Want If Statements

Division

If Statements Finally, Logic, and Causality Calculations

Causality Calculations on Booleans

Derivative Estimation

Causality via an Approximation

Power Series versus Piece-wise Linear and an Aside on Recursion