## AI ALIGNMENT FORUMAF

ESRogs

Engineer at CoinList.co. Donor to LW 2.0.

Sorted by New
114y
4

# Wiki Contributions

Biology-Inspired AGI Timelines: The Trick That Never Works

In my view, the biological anchors and the Very Serious estimates derived therefrom are really useful for the following very narrow yet plausibly impactful purpose

I don't understand why it's not just useful directly. Saying that the numbers are not true upper or lower bounds seems like it's expecting way too much!

They're not even labeled as bounds (at least in the headline). They're supposed to be "anchors".

Suppose you'd never done the analysis to know how much compute a human brain uses, or how much compute all of evolution had used. Wouldn't this report be super useful to you?

Sure, it doesn't directly tell you when TAI is going to come, because there's a separate thing you don't know, which is how compute-efficient our systems are going to be compared to the human brain. And also that translation factor is changing with time. But surely that's another quantity we can have a distribution over.

If there's some quantity that we don't know the value of, but we have at least one way to estimate it using some other uncertain quantities, why is it not useful to reduce our uncertainty about some of those other quantities?

This seems like exactly the kind of thing superforecasters are supposed to do. Or that an Eliezer-informed Bayesian rationalist is supposed to do. Quantify your uncertainty. Don't be afraid to use a probability distribution. Don't throw away relevant information, but instead use it to reduce your uncertainty and update your probabilities.

If Eliezer's point is just that the report shouldn't be taken as the gospel truth of when AI is going to come, then fine. Or if he just wants to highlight that there's still uncertainty over the translation factor between the brain's compute-efficiency and our ML systems' compute-efficiency, then that seems like a good point too.

But I don't really understand the point of the rest of the article. If I wanted to have any idea at all when TAI might come, then Moravec's 1988 calculations regarding the human brain seem super interesting. And also Somebody on the Internet's 2006 calculation of how much compute evolution had used.

Either of them would be wrong to think that their number precisely pins down the date. But if you started out not knowing whether to expect AGI in one year or in 10,000 years, then it seems like learning the human brain number and the all-of-evolution number should radically reduce your uncertainty.

It still doesn't reduce your uncertainty all the way, because we still don't know the compute-efficiency translation factor. But who said it reduced uncertainty all the way? Not OpenPhil.

[AN #156]: The scaling hypothesis: a plan for building AGI

Claim 4: GPT-N need not be "trying" to predict the next word. To elaborate: one model of GPT-N is that it is building a world model and making plans in the world model such that it predicts the next word as accurately as possible. This model is fine on-distribution but incorrect off-distribution. In particular, it predicts that GPT-N would e.g. deliberately convince humans to become more predictable so it can do better on future next-word predictions; this model prediction is probably wrong.

I got a bit confused by this section, I think because the word "model" is being used in two different ways, neither of which is in the sense of "machine learning model".

Paraphrasing what I think is being said:

• An observer (us) has a model_1 of what GPT-N is doing.
• According to their model_1, GPT-N is building its own world model_2, that it uses to plan its actions.
• The observer's model_1 makes good predictions about GPT-N's behavior when GPT-N (the machine learning model_3) is tested on data that comes from the training distribution, but bad predictions about what GPT-N will do when tested (or used) on data that does not come from the training distribution.
• The way that the observer's model_1 will be wrong is not that it will be fooled by GPT-N taking a treacherous turn, but rather the opposite -- the observer's model_1 will predict a treacherous turn, but instead GPT-N will go on filling in missing words, as in training (or something else?).

Is that right?

Finite Factored Sets

Let , where  and

[...] The second rule says that  is orthogonal to itself

Should that be "is not orthogonal to itself"? I thought the  meant non-orthogonal, so would think  means that  is not orthogonal to itself.

(The transcript accurately reflects what was said in the talk, but I'm asking whether Scott misspoke.)

Challenge: know everything that the best go bot knows about go

But once you let it do more computation, then it doesn't have to know anything at all, right? Like, maybe the best go bot is, "Train an AlphaZero-like algorithm for a million years, and then use it to play."

I know more about go than that bot starts out knowing, but less than it will know after it does computation.

I wonder if, when you use the word "know", you mean some kind of distilled, compressed, easily explained knowledge?

2020 AI Alignment Literature Review and Charity Comparison

This is commonly said on the basis of his $1b pledge Wasn't it supposed to be a total of$1b pledged, from a variety of sources, including Reid Hoffman and Peter Thiel, rather than $1b just from Musk? EDIT: yes, it was. Sam, Greg, Elon, Reid Hoffman, Jessica Livingston, Peter Thiel, Amazon Web Services (AWS), Infosys, and YC Research are donating to support OpenAI. In total, these funders have committed$1 billion, although we expect to only spend a tiny fraction of this in the next few years.

https://openai.com/blog/introducing-openai/

Homogeneity vs. heterogeneity in AI takeoff scenarios

For those organizations that do choose to compete... I think it is highly likely that they will attempt to build competing systems in basically the exact same way as the first organization did

...

It's unlikely for there to exist both aligned and misaligned AI systems at the same time

If the first group sunk some cost into aligning their system, but that wasn't integral to its everyday task performance, wouldn't a second competing group be somewhat likely to skimp on the alignment part?

It seems like this calls into the question the claim that we wouldn't get a mix of aligned and misaligned systems.

Do you expect it to be difficult to disentangle the alignment from the training, such that the path of least resistance for the second group will necessarily include doing a similar amount of alignment?

Biextensional Equivalence