Alex Flint

Independent AI safety researcher


The accumulation of knowledge

Wiki Contributions


Problems facing a correspondence theory of knowledge

Your summaries are excellent Rohin. This looks good to me.

Agency and the unreliable autonomous car

I think that I should modify 5a from "Search for a proof that this sentence is consistent with your model of the world, up to a maximum proof length of one million characters" to "Search to a proof of this sentence, using your model of the world as a set of starting assumptions". This is indeed a significant change to the algorithm and I thank you for pointing it out. I think that would resolve your second concern, about the problem with 5a itself, yes?

I think it might also resolve your first concern, about the unsoundness of the logical system, because the agent does not have a deduction rule that says that if P can be proved then P is true, but rather reasons from the fact that P can be proved, and the fact that this particular agent chooses its actions based on proof search and will behave in such-and-such a way if it finds a proof of this particular P, to conclude P. This only work for certain particular sentences P, such as the one that is constructed in this story, since this particular sentence P is one that, if the agent finds a proof of it, will cause it to take actions that lead to P itself being true.

Agency and the unreliable autonomous car

Firstly, the agent's logical deduction system is unsound. It includes something comparable with Peano arithmetic (or else Löb's theorem can't be applied), and then adds a deduction rule "if P can be proved consistent with the system-so-far then assume P is true". But we already know that for any consistent extension T of Peano arithmetic there is at least one proposition G for which both G and ~G are consistent with T. So both of these are deducible using the rule. Now, the agent might not find the contradiction, because...

Are you referring to step 5a/5b of the algorithm, that says search for a proof and then act based on what is found? I wouldn't exactly characterize this as adding a further deduction rule. But perhaps you are referring to something?

This system can only find proofs up to a length of a million characters. The requirements of Löb's theorem include a formal system that is an extension of PA and is closed under deduction: that is, derives proofs of unbounded length. So we can't apply Löb's theorem to this system anyway.

It's true this gets a bit more finicky in the resource-bounded case. Critch's paper on resource-bounded Lob goes into this. I haven't looked into this very much.

Problems facing a correspondence theory of knowledge

It's actually written, just need to edit and post. Should be very soon. Thanks for checking on it.

Knowledge is not just digital abstraction layers

Yep, agreed.

These are still writing that I drafted before we chatted a couple of weeks ago btw. I have some new ideas based on the things we chatted about that I hope to write up soon :)

Problems facing a correspondence theory of knowledge

Well here is a thought: a random string would have high Kolmogorov complexity, as would a string describing the most fundamental laws of physics. What are the characteristics of the latter that conveys power over one's environment to an agent that receives it, that is not conveyed by the former? This is the core question I'm most interested in at the moment.

Problems facing a correspondence theory of knowledge

Well yes I agree that knowledge exists with respect to a goal, but is there really no objective difference an alien artifact inscribed with deep facts about the structure of the universe and set up in such a way that it can be decoded by any intelligent species that might find it, and an ordinary chunk of rock arriving from outer space?

Problems facing a correspondence theory of knowledge

Thank you for this comment duck_master.

I take your point that it is possible to extract knowledge about human affairs, and about many other things, from the quantum structure of a rock that has been orbiting the Earth. However, I am interested in a definition of knowledge that allows me to say what a given AI does or does not know, insofar as it has the capacity to act on this knowledge. For example, I would like to know whether my robot vacuum has acquired sophisticated knowledge of human psychology, since if it has, and I wasn't expecting it to, then I might choose to switch it off. On the other hand, if I merely discover that my AI has recorded some videos of humans then I am less concerned, even if these videos contain the basic data necessary to constructed sophisticated knowledge of human psychology, as in the case with the rock. Therefore I am interested not just in information, but something like action-readiness. I am referring to that which is both informative and action-ready as "knowledge", although this may be stretching the standard use of this term.

Now you say that we might measure more abstract kinds of knowledge by looking at what an AI is willing to bet on. I agree that this is a good way to measure knowledge if it is available. However, if we are worried that an AI is deceiving us, then we may not be willing to trust its reports of its own epistemic state, or even of the bets it makes, since it may be willing to lose money now in order to convince us that it is not particularly intelligent, in order to make a treacherous turn later. Therefore I would very much like to find a definition that does not require me to interact with the AI through its input/output channels in order to find out what it knows, but rather allows me to look directly at its internals. I realize this may be impossible, but this is my goal.

So as you can see, my attempt at a definition of knowledge is very much wrapped up with the specific problem I'm trying to solve, and so any answers I arrive at may not be useful beyond this specific AI-related question. Nevertheless, I see this as an important question and so am content to be a little myopic in my investigation.

Agency in Conway’s Game of Life

Thank you for this thoughtful comment itaibn0.

Matter and energy and also approximately homogeneously distributed in our own physical universe, yet building a small device that expands its influence over time and eventually rearranges the cosmos into a non-trivial pattern would seem to require something like an AI.

It might be that the same feat can be accomplished in Life using a pattern that is quite unintelligent. In that case I am very interested in what it is about our own physical universe that makes it different in this respect from Life.

Now it could actually be that in our own physical universe it is also possible to build not-very-intelligent machines that begin small but eventually rearrange the cosmos. In this case I am personally more interested in the nature of these machines than in "intelligent machines", because the reason I am interested in intelligence in the first place is due to its capacity to influence the future in a directed way, and if there are simpler avenues to influence in the future in a directed way then I'd rather spend my energy investigating those avenues than investigating AI. But I don't think it's possible to influence the future in a directed way in our own physical universe without being intelligent.

to solve the control problem in an environment full of intelligence only requires marginally more intelligence at best

What do you mean by this?

the solution to the control problem may even be less intelligent than the structures it competes against, and make up for that with hard-coded solutions to NP-hard problems in military strategy.

But if one entity reliably outcompetes another entity, then on what basis do you say that this other entity is the more intelligent one?

Load More