I don't know if the entry is intended to cover speculative issues, but if so, I think it would be important to include a discussion of what happens when we start building machines that are generally intelligent and have internal subjective experiences. Usually with present day AI ethics concerns, or AI alignment, we're concerned about the AI taking actions that harm humans. But as our AI systems get more sophisticated, we'll also have to start worrying about the well-being of the machines themselves.
Should they have the same rights as humans? Is it ethical to create an AGI with subjective experiences whose entire reward function is geared towards performing mundane tasks for humans. Is it even possible to build an AI that is both highly intelligent, very general, and yet does not have subjective experiences and does not simulate beings with subjective experiences? etc.
That's very cool, thanks for making it. At first I was worried that this meant that my model didn't rely on selection effects. Then I tried a few different random seeds, and some, like 1725, didn't show demon-like behaviour. So I think we're still good.
No regularization was used.
I also can't see any periodic oscillations when I zoom in on the graphs. I think the wobbles you are observing in the third phase are just a result of the random noise that is added to the gradient at each step.
Thanks, and your summary is correct. You're also right that this is a pretty contrived model. I don't know exactly how common demons are in real life, and this doesn't really shed much light on that question. I mainly thought that it was interesting to see that demon formation was possible in a simple situation where one can understand everything that is going on.
Thanks. I initially tried putting the code in a comment on this post, but it ended up being deleted as spam. It's now up on github: https://github.com/DaemonicSigil/tessellating-hills It isn't particularly readable, for which I apologize.
The initial vector has all components set to 0, and the charts show the evolution of these components over time. This is just for a particular run, there isn't any averaging. x0 gets its own chart, since it changes much more than the other components. If you want to know how the loss varies with time, you can just flip figure 1 upside down to get a pretty good proxy, since the splotch functions are of secondary importance compared to the -x0 term.
Here is the code for people who want to reproduce these results, or just mess around:
import numpy as np
import matplotlib.pyplot as plt
DIMS = 16 # number of dimensions that xn has
WSUM = 5 # number of waves added together to make a splotch
EPSILON = 0.0025 # rate at which xn controlls splotch strength
TRAIN_TIME = 5000 # number of iterations to train for
LEARN_RATE = 0.2 # learning rate
# knlist and k0list are integers, so the splotch functions are periodic
knlist = torch.randint(-2, 3, (DIMS, WSUM, DIMS)) # wavenumbers : list (controlling dim, wave id, k component)
k0list = torch.randint(-2, 3, (DIMS, WSUM)) # the x0 component of wavenumber : list (controlling dim, wave id)
slist = torch.randn((DIMS, WSUM)) # sin coefficients for a particular wave : list(controlling dim, wave id)
clist = torch.randn((DIMS, WSUM)) # cos coefficients for a particular wave : list (controlling dim, wave id)
# initialize x0, xn
x0 = torch.zeros(1, requires_grad=True)
xn = torch.zeros(DIMS, requires_grad=True)
# numpy arrays for plotting:
x0_hist = np.zeros((TRAIN_TIME,))
xn_hist = np.zeros((TRAIN_TIME, DIMS))
for t in range(TRAIN_TIME):
wavesum = torch.sum(knlist*xn, dim=2) + k0list*x0
splotch_n = torch.sum(
(slist*torch.sin(wavesum)) + (clist*torch.cos(wavesum)),
foreground_loss = EPSILON * torch.sum(xn * splotch_n)
loss = foreground_loss - x0
# constant step size gradient descent, with some noise thrown in
vlen = torch.sqrt(x0.grad*x0.grad + torch.sum(xn.grad*xn.grad))
x0 -= LEARN_RATE*(x0.grad/vlen + torch.randn(1)/np.sqrt(1.+DIMS))
xn -= LEARN_RATE*(xn.grad/vlen + torch.randn(DIMS)/np.sqrt(1.+DIMS))
x0_hist[t] = x0.detach().numpy()
xn_hist[t] = xn.detach().numpy()
plt.xlabel('number of steps')
for d in range(DIMS):
plt.xlabel('number of training steps')