Hi, thanks for the response! I apologize, the "Left as an exercise" line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people's time and I feel pretty bad about it. Mea culpa.
I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn't be shocked if there are mistakes in there). Regar... (read more)
Hmm there was a bunch of back and forth on this point even before the first version of the post, with @Michael Oesterle and @metasemi arguing what you are arguing. My motivation for calling the token the state is that A) the math gets easier/cleaner that way and B) it matches my geometric intuitions. In particular, if I have a first-order dynamical system 0=F(xt,˙xt) then x is the state, not the trajectory of states (x1,…,xt). In this situation, the dynamics of the system only depend on the current state (that's because it's ... (read more)
Thanks for pointing this out! This argument made it into the revised version. I think because of finite precision it's reasonable to assume that such an ε always exists in practice (if we also assume that the probability gets rounded to something < 1).
Technically correct, thanks for pointing that out! This comment (and the ones like it) was the motivation for introducing the "non-degenerate" requirement into the text. In practice, the proposition holds pretty well - although I agree it would nice to have a deeper understanding of when to expect the transition rule to be "non-degenerate"
This work by Michael Aird and Justin Shovelain might also be relevant: "Using vector fields to visualise preferences and make them consistent"
And I have a post where I demonstrate that reward modeling can extract utility functions from non-transitive preference orderings: "Inferring utility functions from locally non-transitive preferences"
(Extremely cool project ideas btw)
Fantastic, thank you for the pointer, learned something new today! A unique and explicit representation would be very neat indeed.
I'm pretty confused here.
Yeah, the feeling's mutual 😅 But the discussion is also very rewarding for me, thank you for engaging!
I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don't think those two things are in opposition to each other.
A couple of thoughts:
Here's an operationalization. Suppose someday we write computer code that can do the exact same useful computational things that the neocortex (etc.) does, for the exact same reason. My question is: Might that code look like a learning-from-scratch algorithm?
Hmm, I see. If this is the crux, then I'll put all the remaining nitpicking at the end of my comment and just say: I think I'm on board with your argument. Yes, it seems conceivable to me that a learning-from-scratch program ends up in a (functionally) very similar state to the brain. The trajectory of... (read more)
Hey Steve! Thanks for writing this, it was an interesting and useful read! After our discussion in the LW comments, I wanted to get a better understanding of your thinking and this sequence is doing the job. Now I feel I can better engage in a technical discussion.
I can sympathize well with your struggle in section 2.6. A lot of the "big picture" neuroscience is in the stage where it's not even wrong. That being said, I don't think you'll find a lot of neuroscientists who nod along with your line of argument without raising objections here and there (neuro... (read more)
I don't think anyone except for Jeff Hawkin believes in literal cortical uniformity.
Not even him! Jeff Hawkins: "Mountcastle’s proposal that there is a common cortical algorithm doesn’t mean there are no variations. He knew that. The issue is how much is common in all cortical regions, and how much is different. The evidence suggests that there is a huge amount of commonality."
I mentioned "non-uniform neural architecture and hyperparameters". I'm inclined to put different layer thicknesses (including agranularity) in the category of "non-uniform hyp... (read more)
Thank you very much for pointing it out! Just checked the primary source there it's spelled correctly. But the misspelled version can be found in some newer books that cite the passage. Funny how typos spread...
I'll fix it!