Steve Byrnes

I'm Steve Byrnes, a professional physicist in the Boston area. I have a summary of my AGI safety research interests at:

Steve Byrnes's Comments

Will AI undergo discontinuous progress?

That's a good point; if a research group develops a more promising approach to AI, recursive self-improvement / capability enhancement might be one of the first things they do, before going for immediate money-making applications, because the programmers know that application area already, and they can just do it internally without going through the rigmarole of marketing, product design, etc. etc.

Curiosity Killed the Cat and the Asymptotically Optimal Agent

Hmm, I think I get it. Correct me if I'm wrong.

Your paper is about an agent which can perform well in any possible universe. (That's the "for all ν in ℳ"). That includes universes where the laws of physics suddenly change tomorrow. But in real life, I know that the laws of physics are not going to change tomorrow. Thus, I can get optimal results without doing the kind of exhaustive exploration that your paper is talking about. Agree or disagree?

On unfixably unsafe AGI architectures

Hmm, interesting. I think human cloning is an imperfect analogy because the only real reason to do it is to impress your friends, so if everyone coordinates on being scornful towards the first person to do human cloning (rather than being impressed), then there's no more personal benefit to cheating. By contrast, with an AGI, there would be the hope that you'll actually solve the safety problems, and then get tons of money and power and respect.

Biological weapons is maybe a better example, but not an especially encouraging one: as many as 8 countries may have secret bio-weapons programs, including North Korea. Maybe one could make an argument that there's a taboo against using bio-weapons, as opposed to merely stockpiling them? Likewise, the taboo against using nuclear weapons was not successfully turned into a taboo against countries starting new nuclear weapons programs. Maybe it's hard to get riled up against someone doing something that is not purposely aggressive? I don't know. I can't think of a great historical analogy.

There's also the issue that there's not too many actors who have any reason to start a bio-weapons programs, and the ability to do so without getting shut down. Really just secret military labs. Whereas in the worst case, many orders of magnitude more people would be willing and able to start doing illegal AGI experiments without the authorities realizing it.

Curiosity Killed the Cat and the Asymptotically Optimal Agent

This question is probably stupid, and also kinda generic (it applies to many other papers besides this one), but forgive me for asking it anyway.

So, I'm trying to think through how this kind of result generalizes beyond MDPs. In my own life, I don't go wandering around an environment looking for piles of cash that got randomly left on the sidewalk. My rewards aren't random. Instead, I have goals (or more generally, self-knowledge of what I find rewarding), and I have abstract knowledge constraining the ways in which those goals will or won't happen.

Yes, I do still have to do exploration—try new foods, meet new people, ponder new ideas, etc.—but because of my general prior knowledge about the world, this exploration kinda feels different than the kind of exploration that they talk about in MDPs. It's not really rolling the dice, I generally have a pretty good idea of what to expect, even if it's still a bit uncertain along some axes.

So, how do you think about the generalizability of these kinds of MDP results?

(I like the paper, by the way!)

Attainable Utility Preservation: Concepts

Just trying to think this through ... at the risk of proving I haven't carefully read all your posts ... :-)

I program my AI to invent a better solar cell. So it starts by reading a materials science textbook. OK, now it knows materials science ... it didn't before ... Is that a disallowed AU increase? (As the saying goes, "knowledge is power"...?)

The Catastrophic Convergence Conjecture

Cool. We're probably on the same page then.

The Catastrophic Convergence Conjecture

Sure. Looking forward to that. My current intuition is: Humans have a built-in reward system based on (mumble mumble) dopamine, but the existence of that system doesn't make it easy for us to understand dopamine, or reward functions in general, or anything like that, nor does it make it easy for us to formulate and pursue goals related to those things. It takes quite a bit of education and beautifully-illustrated blog posts to get us to that point :-D

The Catastrophic Convergence Conjecture

I have previously criticized value learning for needing to locate the human within some kind of prespecified ontology (this criticism is not new). By taking only the agent itself as primitive, perhaps we could get around this (we don't need any fancy engineering or arbitrary choices to figure out AUs/optimal value from the agent's perspective).

Wouldn't you need to locate the abstract concept of AU within the AI's ontology? Is that easier? Or sorry if I'm misunderstanding.

Load More