All of avturchin's Comments + Replies

 as a weak alignment techniques we might use to bootstrap strong alignment.

Yes, it also reminded me Christiano approach of amplification and distillation.

2G Gordon Worley III2y
Thanks both! I definitely had the idea that Paul had mentioned something similar somewhere but hadn't made it a top-level concept. I think there's similar echos in how Eliezer talked about seed AI in the early Friendly AI work.

Maybe we can ask GPT to output English-Klingon dictionary? 

If we use median AI timings, we will be 50 per cent dead before that moment. May be it will be useful different measure, like 10 per cent of TAI, before which our protective measures should be prepared?

Also, this model contradicts naive model of GPT growth in which the number of parameters has been growing 2 orders of magnitude a year last couple of years, and if this trend continues, it could reach human level of 100 trillion parameters in 2 years.

2Ajeya Cotra3y
Thanks!  I agree that full distribution information is very valuable, although I consider medians to be important as well. The spreadsheet linked in the report [] provides the full distribution implied by my views for the probability that the amount of computation required to train a transformative model is affordable, although it requires some judgment to translate that into P(TAI), because there may be other bottlenecks besides computation and there may be other paths to TAI besides training a transformative model. I'd say it implies somewhere between 2031 and 2036 is the year by which there is a 10% chance of TAI. As I said in a reply to Daniel above, the way to express the view that a brain-sized GPT model would constitute TAI is to assign a lot of weight to the Short Horizon Neural Network hypothesis, potentially along with shifting narrowing the effective horizon length. I think this is plausible, but don't believe we should have a high probability on this because I expect on priors that we would need longer effective horizon lengths than GPT-3, and I don't think that evidence from the GPT-3 paper or follow on papers have provided clear evidence to the contrary.  In my best guess inputs, I assign a 25% probability collectively to the Short Horizon Neural Network and Lifetime Anchor hypotheses; in my aggressive inputs I assign 50% probability to these two hypotheses collectively. In both cases, probabilities are smoothed to a significant extent because of uncertainty in model size requirements and scaling, with substantial weight on smaller-than-brain-sized models and larger-than-brain-sized models.

It looks like the idea of human values is very contradictional. May be we should dissolve it? What about "AI safety" without human values?

2G Gordon Worley III3y
In some sense that's a direction I might be moving in with my thinking, but there is still some thing that humans identify as values that they care about, so I expect there to be some real phenomenon going on that needs to be considered to get good outcomes, since I expect the default remains a bad outcome if we don't pay attention to whatever it is that makes humans care about stuff. I expect most work today on value learning is not going to get us where we want to go because it's working with the wrong abstractions, and my goal in this work is to dissolve those abstractions to find better ones for our long-term purposes.

All else equal, I prefer an AI which is not capable to philosophy, as I am afraid of completely alien conclusions which it could come to (e.g. insect are more important than humans).

More over, I am skeptical that going on meta-level simplifies the problem to the level that it will be solvable by humans (the same about meta-ethics and theory of human values). For example, if someone says that he is not able to understand math, but instead will work on meta-mathematical problems, we would be skeptical about his ability to contribute. Why meta-level would be simpler?

3Wei Dai4y
If I gave the impression in this post that I expect metaphilosophy to be solved before someone builds an AGI, that was far from my intentions. I think this is a small-chance-of-high-return kind of situation, plus I think someone has to try to attack the problem if only to generate evidence that it really is a hard problem, otherwise I don't know how to convince people to adopt costly social solutions like stopping technological progress. (And actually I don't expect [] the evidence to be highly persuasive either, so this amounts to just another small chance of high return.) What I wrote in an earlier post [] still describes my overall position:
5Jessica Taylor4y
This is also my reason for being pessimistic about solving metaphilosophy before a good number of object-level philosophical problems have been solved (e.g. in decision theory, ontology/metaphysics, and epistemology). If we imagine being in a state where we believe running computation X would solve hard philosophical problem Y, then it would seem that we already have a great deal of philosophical knowledge about Y, or a more general class of problems that includes Y. More generally, we could look at the history difficulty of solving a problem vs. the difficulty of automating it. For example: the difficulty of walking vs. the difficulty of programming a robot to walk; the difficulty of adding numbers vs. the difficulty of specifying an addition algorithm; the difficulty of discovering electricity vs. the difficulty of solving philosophy of science to the point where it's clear how a reasoner could have discovered (and been confident in) electricity; and so on. The plausible story I have that looks most optimistic for metaphilosophy looks something like: 1. Some philosophical community makes large progress on a bunch of philosophical problems, at a high level of technical sophistication. 2. As part of their work, they discover some "generators" that generate a bunch of the object-level solutions when translated across domains; these generators might involve e.g. translating a philosophical problem to one of a number of standard forms and then solving the standard form. 3. They also find philosophical reasons to believe that these generators will generate good object-level solutions to new problems, not just the ones that have already been studied. 4. These generators would then constitute a solution to metaphilosophy.

My main objection to this idea is that it is a local solution, and doesn't have built-in mechanisms to become global AI safety solution, that is, to prevent other AIs creation, which could be agential superintelligences. One can try to make "AI police" as a service, but it could be less effective than agential police.

Another objection is probably Gwern's idea that any Tool AI "wants" to become agential AI.

This idea also excludes the robotic direction in AI development, which will anyway produce agential AIs.

3Rohin Shah4y
If by agent we mean "system that takes actions in the real world", then services can be agents. As I understand it, Eric is only arguing against monolithic AGI agents that are optimizing a long-term utility function and that can learn/perform any task. Current factory robots definitely look like a service, and even the soon-to-come robots-trained-with-deep-RL will be services. They execute particular learned behaviors. If I remember correctly, Gwern's argument is basically that Agent AI will outcompete Tool AI because Agent AI can optimize things that Tool AI cannot, such as its own cognition. In the CAIS world, there are separate services that improve cognition, and so the CAIS services do get the benefit of ever-improving cognition, without being classical AGI agents. But overall I agree with this point (and disagree with Eric) because I expect there to be lots of gains to be had by removing the boundaries between services, at least where possible.
2Wei Dai4y
This seems likely to me as well, especially since "service" is by definition bounded and agent is not.

Also, here is assumed that there are only two types of BBs and that they have similar measure of existence.

However, there is a very large class of the thermodynamic BBs, which was described in Egan's dust theory: that is observer-moments, which appear as the a result of causal interaction of atoms in a thermodynamic gas, if such causal interaction has the same causal structure as of a moment of experience. They may numerically dominate, but additional calculations are needed and seem possible. There could be other types of BBs, like pure mathematical ... (read more)

I expected that Lamport paper would be mentioned, as it describes a known catastrophic mode for autonomous systems, connected with Buridan ass problem and infinite recursion about predicting future time of the problem solving. I think that this problem is underexplored for AI Safety, despite previous attempt to present it on LessWrong.