Tessellating Hills: a toy model for demons in imperfect search

Now this is one of the more interesting things I've come across.

I fiddled around with the code a bit and was able to reproduce the phenomenon with DIMS = 1, making visualisation possible:

Behold!

Here's the code I used to make the plot:

import torch
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d

DIMS = 1   # number of dimensions that xn has
WSUM = 5    # number of waves added together to make a splotch
EPSILON = 0.10 # rate at which xn controlls splotch strength
TRAIN_TIME = 5000 # number of iterations to train for
LEARN_RATE = 0.2   # learning rate
MESH_DENSITY = 100 #number of points ot plt in 3d mesh (if applicable)

torch.random.manual_seed(1729)

# knlist and k0list are integers, so the splotch functions are periodic
knlist = torch.randint(-2, 3, (DIMS, WSUM, DIMS)) # wavenumbers : list (controlling dim, wave id, k component)
k0list = torch.randint(-2, 3, (DIMS, WSUM))       # the x0 component of wavenumber : list (controlling dim, wave id)
slist = torch.randn((DIMS, WSUM))                # sin coefficients for a particular wave : list(controlling dim, wave id)
clist = torch.randn((DIMS, WSUM))                # cos coefficients for a particular wave : list (controlling dim, wave id)

# initialize x0, xn
x0 = torch.zeros(1, requires_grad=True)
xn = torch.zeros(DIMS, requires_grad=True)

# numpy arrays for plotting:
x0_hist = np.zeros((TRAIN_TIME,))
xn_hist = np.zeros((TRAIN_TIME, DIMS))
loss_hist = np.zeros(TRAIN_TIME,)


def model(xn,x0):
    wavesum = torch.sum(knlist*xn, dim=2) + k0list*x0
    splotch_n = torch.sum(
            (slist*torch.sin(wavesum)) + (clist*torch.cos(wavesum)),
            dim=1)
    foreground_loss = EPSILON * torch.sum(xn * splotch_n)
    return foreground_loss - x0

# train:
for t in range(TRAIN_TIME):

    print(t)
    loss = model(xn,x0)
    loss.backward()
    with torch.no_grad():
        # constant step size gradient descent, with some noise thrown in
        vlen = torch.sqrt(x0.grad*x0.grad + torch.sum(xn.grad*xn.grad))
        x0 -= LEARN_RATE*(x0.grad/vlen + torch.randn(1)/np.sqrt(1.+DIMS))
        xn -= LEARN_RATE*(xn.grad/vlen + torch.randn(DIMS)/np.sqrt(1.+DIMS))
    x0.grad.zero_()
    xn.grad.zero_()
    x0_hist[t] = x0.detach().numpy()
    xn_hist[t] = xn.detach().numpy()
    loss_hist[t] = loss.detach().numpy()

plt.plot(x0_hist)
plt.xlabel('number of steps')
plt.ylabel('x0')
plt.show()
for d in range(DIMS):
    plt.plot(xn_hist[:,d])
plt.xlabel('number of training steps')
plt.ylabel('xn')
plt.show()

fig = plt.figure()
ax = plt.axes(projection='3d')
ax.plot3D(x0_hist,xn_hist[:,0],loss_hist)

#plot loss landscape
if DIMS == 1:
    x0_range = np.linspace(np.min(x0_hist),np.max(x0_hist),MESH_DENSITY)
    xn_range = np.linspace(np.min(xn_hist),np.max(xn_hist),MESH_DENSITY)
    x,y = np.meshgrid(x0_range,xn_range)
    z = np.zeros((MESH_DENSITY,MESH_DENSITY))
    with torch.no_grad():
        for i,x0 in enumerate(x0_range):
            for j,xn in enumerate(xn_range):
                z[j,i] = model(torch.tensor(xn),torch.tensor(x0)).numpy()
    ax.plot_surface(x,y,z,color='orange',alpha=0.3)
ax.set_title("loss")
plt.show()

[-]DaemonicSigil6y10

That's very cool, thanks for making it. At first I was worried that this meant that my model didn't rely on selection effects. Then I tried a few different random seeds, and some, like 1725, didn't show demon-like behaviour. So I think we're still good.

[-]FactorialCode6y30

Hmm, the inherent 1d nature of the visualization kinda makes it difficult to check for selection effects. I'm not convinced that's actually what's going on here. 1725 is special because the ridges of the splotch function are exactly orthogonal to x0. The odds of this happening probably go down exponentially with dimensionality. Furthermore, with more dakka, one sees that the optimization rate drops dramatically after ~15000 time steps, and may or may not do so again later. So I don't think this proves selection effects are in play. An alternative hypothesis is simply that the process gets snagged by the first non-orthogonal ridge it encounters, without any serous selection effects coming into play.

[-]Daniel Kokotajlo6y40

This is awesome, thanks!

So, to check my understanding: You have set up a sort of artificial feedback loop, where there are N overlapping patterns of hills, and each one gets stronger the farther you travel in a particular dimension/direction. So if one or more of these patterns tends systematically to push the ball in the same direction that makes it stronger, you'll get a feedback loop. And then there is selection between patterns, in the sense that the pattern which pushes the strongest will beat the ones that push more weakly, even if both have feedback loops going.

And then the argument is, even though these feedback loops were artificial / baked in by you, in "natural" search problems there might be a similar situation... what exactly is the reason for this? I guess my confusion is in whether to expect real life problems to have this property where moving in a particular direction strengthens a particular pattern. One way I could see this happening is if the patterns are themselves pretty smart, and are able to sense which directions strengthen them at any given moment. Or it could happen if, by chance, there happens to be a direction and a pattern such that the pattern systematically pushes in that direction and the direction systematically strengthens that pattern. But how likely are these? I don't know. I guess your case is a case of the second, but it's rigged a bit, because of how you built in the systematic-strengthening effect.

Am I following, or am I misunderstanding?

[-]DaemonicSigil6y40

Thanks, and your summary is correct. You're also right that this is a pretty contrived model. I don't know exactly how common demons are in real life, and this doesn't really shed much light on that question. I mainly thought that it was interesting to see that demon formation was possible in a simple situation where one can understand everything that is going on.

[-]Beth Barnes6y10

I have the same confusion

[-]johnswentworth6y40

Very nice work. The graphs in particular are quite striking.

I sat down and thought for a bit about whether that objective function is actually a good model for the behavior we're interested in. Twice I thought I saw an issue, then looked back at the definition and realized you'd set up the function to avoid that issue. Solid execution; I think you have actually constructed a demonic environment.

[-][anonymous]6y*20

Hi, thanks for sharing and experimentally trying out the theory in the previous post! Super cool.

Do you have the code for this up anywhere?

I'm also a little confused by the training procedure. Are you just instantiating a random vector and then doing GD with regards to the loss function you defined? Do the charts show the loss averaged over many random vectors (and $s p l o t c h$ function variants)?

[-]DaemonicSigil6y20

Thanks. I initially tried putting the code in a comment on this post, but it ended up being deleted as spam. It's now up on github: https://github.com/DaemonicSigil/tessellating-hills It isn't particularly readable, for which I apologize.

The initial vector has all components set to 0, and the charts show the evolution of these components over time. This is just for a particular run, there isn't any averaging. x0 gets its own chart, since it changes much more than the other components. If you want to know how the loss varies with time, you can just flip figure 1 upside down to get a pretty good proxy, since the splotch functions are of secondary importance compared to the -x0 term.

[-]habryka6y10

Oops, sorry for that. I restored your original comment.

[-]DaemonicSigil6y*20

Here is the code for people who want to reproduce these results, or just mess around:

import torch
import numpy as np
import matplotlib.pyplot as plt

DIMS = 16   # number of dimensions that xn has
WSUM = 5    # number of waves added together to make a splotch
EPSILON = 0.0025 # rate at which xn controlls splotch strength
TRAIN_TIME = 5000 # number of iterations to train for
LEARN_RATE = 0.2   # learning rate

torch.random.manual_seed(1729)

# knlist and k0list are integers, so the splotch functions are periodic
knlist = torch.randint(-2, 3, (DIMS, WSUM, DIMS)) # wavenumbers : list (controlling dim, wave id, k component)
k0list = torch.randint(-2, 3, (DIMS, WSUM))       # the x0 component of wavenumber : list (controlling dim, wave id)
slist = torch.randn((DIMS, WSUM))                # sin coefficients for a particular wave : list(controlling dim, wave id)
clist = torch.randn((DIMS, WSUM))                # cos coefficients for a particular wave : list (controlling dim, wave id)

# initialize x0, xn
x0 = torch.zeros(1, requires_grad=True)
xn = torch.zeros(DIMS, requires_grad=True)

# numpy arrays for plotting:
x0_hist = np.zeros((TRAIN_TIME,))
xn_hist = np.zeros((TRAIN_TIME, DIMS))

# train:
for t in range(TRAIN_TIME):
    ### model: 
    wavesum = torch.sum(knlist*xn, dim=2) + k0list*x0
    splotch_n = torch.sum(
            (slist*torch.sin(wavesum)) + (clist*torch.cos(wavesum)),
            dim=1)
    foreground_loss = EPSILON * torch.sum(xn * splotch_n)
    loss = foreground_loss - x0
    ###
    print(t)
    loss.backward()
    with torch.no_grad():
        # constant step size gradient descent, with some noise thrown in
        vlen = torch.sqrt(x0.grad*x0.grad + torch.sum(xn.grad*xn.grad))
        x0 -= LEARN_RATE*(x0.grad/vlen + torch.randn(1)/np.sqrt(1.+DIMS))
        xn -= LEARN_RATE*(xn.grad/vlen + torch.randn(DIMS)/np.sqrt(1.+DIMS))
    x0.grad.zero_()
    xn.grad.zero_()
    x0_hist[t] = x0.detach().numpy()
    xn_hist[t] = xn.detach().numpy()

plt.plot(x0_hist)
plt.xlabel('number of steps')
plt.ylabel('x0')
plt.show()
for d in range(DIMS):
    plt.plot(xn_hist[:,d])
plt.xlabel('number of training steps')
plt.ylabel('xn')
plt.show()

[-]Vaniver6y10

Finally, this demon becomes so strong that the search gets stuck in a local valley and further progress stops.

I don't see why the gradient with respect to x0 ever changes, and so am confused about why it would ever stop increasing in the x0 direction. Does this have to do with using a fixed step size instead of learning rate?

[Edit: my current thought is that it looks like there's periodic oscillation in the 3rd phase, which is probably an important part of the story; the gradient is mostly about how to point at the center of that well, which means it orbits that center, and x0 progress grinds to a crawl because it's a small fraction of the overall gradient, whereas it would continue at a regular pace if it were a constant learning rate instead, I think.]

Also, did you use any regularization? [Edit: if so, the decrease in response to x0 might actually be present in a one-dimensional version of this, suggesting it's a very different story.]

[-]johnswentworth6y30

I don't see why the gradient with respect to x0 ever changes, and so am confused about why it would ever stop increasing in the x0 direction.

Looks like the splotch functions are each a random mixture of sinusoids in each direction - so each $s p l o t c h_{j}$ will have some variation along $x_{0}$ . The argument of $s p l o t c h_{j}$ is all of $x$ , not just $x_{j}$ .

[-]Vaniver6y10

Ah, that'd do it too.

[-]DaemonicSigil6y10

No regularization was used.

I also can't see any periodic oscillations when I zoom in on the graphs. I think the wobbles you are observing in the third phase are just a result of the random noise that is added to the gradient at each step.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

32

Tessellating Hills: a toy model for demons in imperfect search

32

Model

Results