drocta — AI Alignment Forum

Finite Factored Sets: Introduction and Factorizations

The part about Chimera functions was surprising, and I look forward to seeing where that will go, and to more of this in general.

In section 2.1 , Proposition 2 should presumably say that is a partial order on $Part (S)$ rather than on $S$ .

Finite Factored Sets

drocta4y20

You said that you thought that this could be done in a categorical way. I attempted something which appears to describe the same thing when applied to the category FinSet , but I'm not sure it's the sort of thing you meant by when you suggested that the combinatorial part could potentially be done in a categorical way instead, and I'm not sure that it is fully categorical.

Let S be an object.
For i from 1 to k, let be an object, (which is not anything isomorphic to the product of itself with itself, or at least is not the terminal object) .
Let $f : \prod_{i} A_{i} \to S$ be an isomorphism.
Then, say that $((A_{i})_{i}, f, S)$ is a representation of a factorization of S.
If $((A_{i})_{i}, f, S)$ and $((A_{i}^{'})_{i}, f^{'}, S)$ are each a representative of a factorization of S, then say that they represent the same factorization of S iff there exist isomorphisms $g_{i} : A_{i} \to A_{i}^{'}$ such that , where $⟨ g_{1}, g_{2}, . . ., g_{k} ⟩ : \prod_{i} A_{i} \to \prod_{i} A_{i}^{'}$ is the isomorphism obtained from the $g_{i}$ with the usual product map, the composition of it with f' is equal to f, that is, $⟨ g_{1}, g_{2}, . . ., g_{k} ⟩; f^{'} = f$ .

Then say that a factorization is, the class of representative of the same factorization. (being a representation of the same factorization is an equivalence relation).

For FinSet , the factorizations defined this way correspond to the factorizations as originally defined.

However, I've no idea whether this definition remains interesting if applied to other categories.

For example, if it were to be applied to the closed disk in a category of topological spaces and continuous functions, it seems that most of the isomorphisms from [0,1] * [0,1] to the disk would be distinct factorizations, even though there would still be many which are identified, and I don't really see talking about the different factorizations of the closed disk as saying much of note. I guess the factorizations using [0,1] and [0,1] correspond to different cosets of the group of automorphisms of the closed disk by a particular subgroup, but I'm pretty sure it isn't a normal subgroup, so no luck there.
If instead we try the category of vector spaces and linear maps over a particular field, then I guess it looks more potentially interesting. I guess things over sets having good analogies over vector spaces is a common occurrence. But here still, the subgroups of the automorphism groups given largely by the products of the automorphism groups of the things in the product, seems like they still usually fail to be a normal subgroup, I think. But regardless, it still looks like there's some ok properties to them, something kinda Grassmannian-ish ? idk. Better properties than in the topological spaces case anyway.

Parsing Chris Mingard on Neural Networks

drocta4y20

This comment I'm writing is mostly because this prompted me to attempt to see how feasible it would be to computationally enumerate the conditions for the weights of small networks like the 2 input 2 hidden layer 1 output in order to implement each of the possible functions. So, I looked at the second smallest case by hand, and enumerated conditions on the weights for a 2 input 1 output no hidden layer perceptron to implement each of the 2 input gates, and wanted to talk about it. This did not result in any insights, so if that doesn't sound interesting, maybe skip reading the rest of this comment. I am willing to delete this comment if anyone would prefer I do that.

Of the 16 2-input-1-output gates, 2 of them, xor and xnor, can't be done with the perceptrons with no hidden layer (as is well known), for 8 of them, the conditions on the 2 weights and the bias for the function to be implemented can be expressed as an intersection of 3 half spaces, and the remaining 6 can of course be expressed with an intersection of 4 (the maximum number that could be required, as for each specific input and output, the condition on the weights and bias in order to have that input give that output is specified by a half space, so specifying the half space for each input is always enough).

The ones that require 4 are: the constant 0 function, the constant 1 function, return the first input, return the second input, return the negation of the first input, and return the negation of the second input.

These seem, surprisingly, among the simplest possible behaviors. They are the ones which disregard at least one input. It seems a little surprising to me that these would be the ones that require an intersection of 4 half spaces.

I haven't computed the proportions of the space taken up by each region so maybe the ones that require 4 planes aren't particularly smaller. And I suppose with this few inputs, it may be hard to say that any of these functions are really substantially more simple than any of the rest of them. Or it may be that the tendency for simpler functions to occupy more space only shows up when we actually have hidden layers and/or have many more nodes.

Here is a table (x and y are the weights from a and b to the output, and z is the bias on the output):

outputs for the different inputs when this function is computed
0000 (i.e. the constant 0) z<0, x+y+z<0, x+z<0, y+z<0
0001 (i.e. the and gate) x+y+z>0, x+z<0, y+z<0
0010 (i.e. a and not b) z<0, x+y+z<0, x+z>0
0011 (i.e. if input a) z<0, x+y+z>0, x+z>0, y+z<0
0100 (i.e. b and not a) z<0, x+y+z<0, y+z>0
0101 (i.e. if input b) z<0, x+y+z>0, x+z<0, y+z>0
0110 (i.e. xor) impossible
0111 (i.e. or) z<0, x+z>0, y+z>0
1000 (i.e. nor) z>0, x+z<0, y+z<0
1001 (i.e. xnor) impossible
1010 (i.e. not b) z>0, x+y+z<0, x+z>0, y+z<0
1011 (i.e. b->a ) z>0, x+y+z>0, x+z<0
1100 (i.e. not a) z>0, x+y+z<0, x+z<0, y+z>0
1101 (i.e. a->b ) z>0, x+y+z>0, y+z<0
1110 (i.e. nand ) x+y+z<0, x+z>0, y+z>0
1111 (i.e. constant 0) z>0, x+z>0, y+z>0, x+y+z>0

Subagents of Cartesian Frames

drocta5y10

I am trying to check that I am understanding this correctly by applying it, though probably not in a very meaningful way:

Am I right in reasoning that, for , that $1_{S} ⊲ C$ iff ( (C can ensure S), and (every element of S is a result of a combination of a possible configuration of the environment of C with a possible configuration of the agent for C, such that the agent configuration is one that ensures S regardless of the environment configuration)) ?

So, if S = {a,b,c,d} , then
$C = ⎡ ⎢ ⎣ \begin{matrix} a & b & a c & d & d e & f & a \end{matrix} ⎤ ⎥ ⎦$

would have $1_{S} ⊲ C$ , but, say

$D = ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ \begin{matrix} a & a & a b & b & c c & d & f b & f & a \end{matrix} ⎤ ⎥ ⎥ ⎥ ⎥ ⎦$

would have $1_{S} ⋪ D$ , because , while S can be ensured, there isn't, for every outcome in S, an option which ensures S and which is compatible with that outcome ?

The "best predictor is malicious optimiser" problem

drocta5y00

What came to mind for me before reading the spoiler-ed options, was a variation on #2, with the difference being that, instead of trying to extract P's hypothesis about B, we instead modify T to get a T' which has P replaced with a P' which is a paperclip minimizer instead of maximizer, and then run both, and only use the output when the two agree, or if they give probabilities, use the average, or whatever.

Perhaps this could have an advantage over #2 if it is easier to negate what P is optimizing for than to extract P's model of B. (edit: though, of course, if extracting the model from P is feasible, that would be better than the scheme I described)

On the other hand, maybe this could still be dangerous, if P and P' have shared instrumental goals with regards to your predictions for B?

Though, if P has a good model of you, A, then presumably if you were to do this, both P and P' would expect you would do this, and, so I don't know what would make sense for them to do?

It seems like they would both expect that, while they may be able to influence you, that insofar as the influence would effect the expected value of number of paperclips, it would be canceled out by the other's influence (assuming that the ability to influence # paperclips via changing your prediction of B, is symmetric, which, I guess it might not be..).

I suppose this would be a reason why P would want its thought processes to be inscrutable to those simulating it, so that the simulators are unable to construct P' .

As a variation on #4, if P is running on a computer in a physics simulation in T, then almost certainly a direct emulation of that computer running P would run faster than T does, and therefore whatever model of B that P has, can be computed faster than T can be. What if, upon discovering this fact about T, we restrict the search among Turing machines to only include machines that run faster than T?

This would include emulations of P, and would therefore include emulations of P's model of B (which would probably be even faster than emulating P?), but I imagine that a description of an emulation of P without the physics simulation and such would have a longer description than a description of just P's model of B. But maybe it wouldn't.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments