All of Mark Xu's Comments + Replies

Prizes for ELK proposals

The high-level reason is that the 1e12N model is not that much better at prediction than the 2N model. You can correct for most of the correlation even with only a vague guess at how different the AI and human probabilities are, and most AI and human probabilities aren't going to be that different in a way that produces a correlation the human finds suspicious. I think that the largest correlations are going to be produced by the places the AI and the human have the biggest differences in probabilities, which are likely also going to be the places where th... (read more)

Prizes for ELK proposals

I agree that i does slightly worse than t on consistency checks, but i also does better on other regularizers you're (maybe implicitly) using like speed/simplicity, so as long as i doesn't do too much worse it'll still beat out the direct translator.

One possible thing you might try is some sort of lexicographic ordering of regularization losses. I think this rapidly runs into other issues with consistency checks, like the fact that the human is going to be systematically wrong about some correlations, so i potentially is more consistent than t.

1Lukas Finnveden12hAny articulable reason for why i just does slightly worse than t? Why would a 2N-node model fix a large majority of disrepancys between an N-node model and a 1e12*N-node model? I'd expect it to just fix a small fraction of them. Yeah, if you can get better-looking consistency than the direct translator in some cases, I agree that a sufficiently high consistency penalty will just push towards exploiting that (even if the intermediate model needs to be almost as large as the full predictor to exploit it properly). I'm curious whether you think this is the main obstacle. If we had a version of the correlation-consistency approach that always gave the direct translator minimal expected consistency loss, do we as-of-yet lack a counterexample for it?
Alex Ray's Shortform

I feel mostly confused by the way that things are being framed. ELK is about the human asking for various poly-sized fragments and the model reporting what those actually were instead of inventing something else. The model should accurately report all poly-sized fragments the human knows how to ask for.

Like the thing that seems weird to me here is that you can't simultaneously require that the elicited knowledge be 'relevant' and 'comprehensible' and also cover these sorts of obfuscated debate like scenarios.

I don't know what you mean by "relevant" or ... (read more)

1A Ray3dThanks for taking the time to explain this! I think this is what I was missing. I was incorrectly thinking of the system as generating poly-sized fragments.
Alex Ray's Shortform

I don’t think I understand your distinction between obfuscated and non-obfuscated knowledge. I generally think of non-obfuscated knowledge as NP or PSPACE. The human judgement of a situation might only theoretically require a poly sized fragment of a exp sized computation, but there’s no poly sized proof that this poly sized fragment is the correct fragment, and there are different poly sized fragments for which the human will evaluate differently, so I think of ELK as trying to elicit obfuscated knowledge.

1A Ray4dSo if there are different poly fragments that the human would evaluate differently, is ELK just "giving them a fragment such that they come to the correct conclusion" even if the fragment might not be the right piece. E.g. in the SmartVault case, if the screen was put in the way of the camera and the diamond was secretly stolen, we would still be successful even if we didn't elicit that fact, but instead elicited some poly fragment that got the human to answer disapprove? Like the thing that seems weird to me here is that you can't simultaneously require that the elicited knowledge be 'relevant' and 'comprehensible' and also cover these sorts of obfuscated debate like scenarios. Does it seem right to you that ELK is about eliciting latent knowledge that causes an update in the correct direction, regardless of whether that knowledge is actually relevant?
Prizes for ELK proposals

We would prefer submissions be private until February 15th.

Prizes for ELK proposals

We generally assume that we can construct questions sufficiently well that there's only one unambiguous interpretation. We also generally assume that the predictor "knows" which world it's in because it can predict how humans would respond to hypothetical questions about various situations involving diamonds and sensors and that humans would say in theory Q1 and Q2 could be different.

More concretely, our standard for judging proposals is exhibiting an unambiguous failure. If it was plausible you asked the wrong question, or the AI didn't know what you mean... (read more)

Alex Ray's Shortform

I think we would be trying to elicit obfuscated knowledge in ELK. In our examples, you can imagine that the predictor's Bayes net works "just because", so an argument that is convincing to a human for why the diamond in the room has to be arguing that the Bayes net is a good explanation of reality + arguing that it implies the diamond is in the room, which is the sort of "obfuscated" knowledge that debate can't really handle.

1A Ray4dOkay now I have to admit I am confused. Re-reading the ELK proposal -- it seems like the latent knowledge you want to elicit is not-obfuscated. Like, the situation to solve is that there is a piece of non-obfuscated information, which, if the human knew it, would change their mind about approval. How do you expect solutions to elicit latent obfuscated knowledge (like 'the only true explanation is incomprehendible by the human' situations)?
1A Ray4dCool, this makes sense to me. My research agenda is basically about making a not-obfuscated model, so maybe I should just write that up as an ELK proposal then.
Prizes for ELK proposals

The dataset is generated with the human bayes net, so it's sufficient to map to the human bayes net. There is, of course, an infinite set of "human" simulators that use slightly different bayes nets that give the same answers on the training set.

Prizes for ELK proposals

Does this mean that the method needs to work for ~arbitrary architectures, and that the solution must use substantially the same architecture as the original?

Yes, approximately. If you can do it for only e.g. transformers, but not other things, that would be interesting.

Does this mean that it must be able to deal with a broad variety of questions, so that we cannot simply sit down and think about how to optimize the model for getting a single question (e.g. "Where is the diamond?") right?

Yes, approximately. Thinking about how to get one question rig... (read more)

Prizes for ELK proposals

We generally imagine that it’s impossible to map the predictors net directly to an answer because the predictor is thinking in terms of different concepts, so it has to map to the humans nodes first in order to answer human questions about diamonds and such.

0brglnd12dI see, thanks for answering. To further clarify, given the reporter's only access to the human's nodes is through the human's answers, would it be equally likely for the reporter to create a mapping to some other Bayes net that is similarly consistent with the answers provided? Is there a reason why the reporter would map to the human's Bayes net in particular?
Prizes for ELK proposals

The SmartFabricator seems basically the same. In the robber example, you might imagine the SmartVault is the one that puts up the screen to conceal the fact that it let the diamond get stolen.

Prizes for ELK proposals

A different way of phrasing Ajeya's response, which I think is roughly accurate, is that if you have a reporter that gives consistent answers to questions, you've learned a fact about the predictor, namely "the predictor was such that when it was paired with this reporter it gave consistent answers to questions." if there were 8 predictor for which this fact was true then "it's the [7th] predictor such that when it was paired with this reporter it gave consistent answers to questions" is enough information to uniquely determine the reporter, e.g. the previ... (read more)

Prizes for ELK proposals

There is a distinction between the way that the predictor is reasoning and the way that the reporter works. Generally, we imagine that that the predictor is trained the same way the "unaligned benchmark" we're trying to compare to is trained, and the reporter is the thing that we add onto that to "align" it (perhaps by only training another head on the model, perhaps by finetuning). Hopefully, the cost of training the reporter is small compared to the cost of the predictor (maybe like 10% or something)

In this frame, doing anything to train the way the pred... (read more)

ARC's first technical report: Eliciting Latent Knowledge

I think that problem 1 and problem 2 as you describe them are potentially talking about the same phenomenon. I'm not sure I'm understanding correctly, but I think I would make the following claims:

  • Our notion of narrowness is that we are interested in solving the problem where the question we're asking is such that a state always resolves a question. E.g. there isn't any ambiguity around whether a state "really contains a diamond". (Note that there is ambiguity around whether the human could detect the diamond from any set of observations because there co
... (read more)
1Ramana Kumar1moThis "there isn't any ambiguity"+"there is ambiguity" does not seem possible to me: these types of ambiguity are one and the same. But it might depend on what “any set of observations” is allowed to include. “Any set” suggests being very inclusive, but remember that passive observation is impossible. Perhaps the observations I’d want the human to use to figure out if the diamond is really there (presuming there isn’t ambiguity) would include observations you mean to exclude, such as disabling the filter-nanobots first? I guess a wrinkle here is that observations need to be “implementable” in the world. If we’re thinking of making observations as intervening on the world (e.g., to decide which sensors to query), then some observations may be inaccessible because we can’t make that intervention. Rewriting this all without relying on “possible”/”can” concepts would be instructive.
3Charlie Steiner1moI think this statement encapsulates some worries I have. If it's important how the human defines a property like "the same diamond," then assuming that the sameness of the diamond is "out there in the diamond" will get you into trouble - e.g. if there's any optimization pressure to find cases where the specifics of the human's model rear their head. Human judgment is laden with the details of how humans model the world, you can't avoid dependence on the human (and the messiness that entails) entirely. Or to phrase it another way: I don't have any beef with a narrow approach that says "there's some set of judgments for which the human is basically competent, and we want to elicit knowledge relevant to those judgments." But I'm worried about a narrow approach that says "let's assume that humans are basically competent for all judgments of interest, and keep assuming this until something goes wrong." It just feels to me like this second approach is sort of... treating the real world as if it's a perturbative approximation to the platonic realm.
ARC's first technical report: Eliciting Latent Knowledge

My point is either that:

  • it will always be possible to find such an experiment for any action, even desirable ones, because the AI will have defended the diamond in a way the human didn't understand or the AI will have deduced some property of diamonds that humans thought they didn't have
  • or there will be some tampering for which it's impossible to find an experiment, because in order to avoid the above problem, you will have to restrict the space of experiments
ARC's first technical report: Eliciting Latent Knowledge

Thanks for your proposal! I'm not sure I understand how the "human is happy with experiment" part is supposed to work. Here are some thoughts:

  • Eventually, it will always be possible to find experiments where the human confidently predicts wrongly. Situations I have in mind are ones where your AI understands the world far better than you, so can predict that e.g. combining these 1000 chemicals will produce self-replicating protein assemblages, whereas the human's best guess is going to be "combining 1000 random chemicals doesn't do anything"
  • If the human
... (read more)
1Ramana Kumar1moThanks for the reply! I think you’ve understood correctly that the human rater needs to understand the proposed experiment – i.e., be able to carry it out and have a confident expectation about the outcome – in order to rate the proposer highly. Here’s my summary of your point: for some tampering actions, there are no experiments that a human would understand in the above sense that would expose the tampering. Therefore that kind of tampering will result in low value for the experiment proposer (who has no winning strategy), and get rated highly. This is a crux for me. I don’t yet believe such tampering exists. The intuition I’m drawing on here is that our beliefs about what world we’re in need to cash out in anticipated experiences. Exposing confusion about something that shouldn’t be confusing can be a successful proposer strategy. I appreciate your examples of “a fake diamond that can only be exposed by complex imaging techniques” and “a human making subtly different moral judgements” and will ponder them further. Your comment also helped me realise another danger of this strategy: to get the data for training the experiment proposer, we have to execute the SmartVault actions first. (Whereas I think in the baseline scheme they don’t have to be executed.)
ARC's first technical report: Eliciting Latent Knowledge

We don't think that real humans are likely to be using Bayes nets to model the world. We make this assumption for much the same reasons that we assume models use Bayes nets, namely that it's a test case where we have a good sense of what we want a solution to ELK to look like. We think the arguments given in the report will basically extend to more realistic models of how humans reason (or rather, we aren't aware of a concrete model of how humans reason for which the arguments don't apply).

If you think there's a specific part of the report where the human Bayes net assumption seems crucial, I'd be happy to try to give a more general form of the argument in question.

The Plan

Agreed, but the thing you want to use this for isn’t simulating a long reflection, which will fail (in the worst case) because HCH can’t do certain types of learning efficiently.

2johnswentworth2moOnce we get past Simulated Long Reflection, there's a whole pile of Things To Do With AI which strike me as Probably Doomed on general principles. You mentioned using HCH to "let humans be epistemically competitive with the systems we're trying to train", which definitely falls in that pile. We have general principles saying that we should definitely not rely on humans being epistemically competitive with AGI; using HCH does not seem to get around those general principles at all. (Unless we buy some very strong hypotheses about humans' skill at factorizing problems, in which case we'd also expect HCH to be able to simulate something long-reflection-like.) Trying to be epistemically competitive with AGI is, in general, one of the most difficult use-cases one can aim for. For that to be easier than simulating a long reflection, even for architectures other than HCH-emulators, we'd need some really weird assumptions.
The Plan

I want to flag that HCH was never intended to simulate a long reflection. It’s main purpose (which it fails in the worse case) is to let humans be epistemically competitive with the systems you’re trying to train.

4johnswentworth2moI mean, we have this thread [] with Paul directly saying "If all goes well you can think of it like 'a human thinking a long time'", plus Ajeya and Rohin both basically agreeing with that.
Biology-Inspired AGI Timelines: The Trick That Never Works

The way that you would think about NN anchors in my model (caveat that this isn't my whole model):

  • You have some distribution over 2020-FLOPS-equivalent that TAI needs.
  • Algorithmic progress means that 20XX-FLOPS convert to 2020-FLOPS-equivalent at some 1:N ratio.
  • The function from 20XX to the 1:N ratio is relatively predictable, e.g. a "smooth" exponential with respect to time.
  • Therefore, even though current algorithms will hit DMR, the transition to the next algorithm that has less DMR is also predictably going to be some constant ratio better at convert
... (read more)
2Vanessa Kosoy2moI don't understand this. * What is the meaning of "2020-FLOPS-equivalent that TAI needs"? Plausibly you can't build TAI with 2020 algorithms without some truly astronomical amount of FLOPs. * What is the meaning of "20XX-FLOPS convert to 2020-FLOPS-equivalent"? If 2020 algorithms hit DMR, you can't match a 20XX algorithm with a 2020 algorithm without some truly astronomical amount of FLOPs. Maybe you're talking about extrapolating the compute-performance curve, assuming that it stays stable across algorithmic paradigms (although, why would it??) However, in this case, how do you quantify the performance required for TAI? Do we have "real life elo" for modern algorithms that we can compare to human "real life elo"? Even if we did, this is not what Cotra is doing with her "neural anchor".
Biology-Inspired AGI Timelines: The Trick That Never Works

My model is something like:

  • For any given algorithm, e.g. SVMs, AlphaGo, alpha-beta pruning, convnets, etc., there is an "effective compute regime" where dumping more compute makes them better. If you go above this regime, you get steep diminishing marginal returns.
  • In the (relatively small) regimes of old algorithms, new algorithms and old algorithms perform similarly. E.g. with small amounts of compute, using AlphaGo instead of alpha-beta pruning doesn't get you that much better performance than like an OOM of compute (I have no idea if this is true, ex
... (read more)
4Vanessa Kosoy2moHmm... Interesting. So, this model says that algorithmic innovation is so fast that it is not much of a bottleneck: we always manage to find the best algorithm for given compute relatively quickly after this compute becomes available. Moreover, there is some smooth relation between compute and performance assuming the best algorithm for this level of compute. [EDIT: The latter part seems really suspicious though, why would this relation persist across very different algorithms?] Or at least this is true is "best algorithm" is interpreted to mean "best algorithm out of some wide class of algorithms s.t. we never or almost never managed to discover any algorithm outside of this class". This can justify biological anchors as upper bounds[1] [#fn-cCeH9Wga7mav4koHv-1] : if biology is operating using the best algorithm then we will match its performance when we reach the same level of compute, whereas if biology is operating using a suboptimal algorithm then we will match its performance earlier. However, how do we define the compute used by biology? Moravec's estimate is already in the past and there's still no human-level AI. Then there is the "lifetime" anchor from Cotra's report which predicts a very short timeline. Finally, there is the "evolution" anchor which predicts a relatively long timeline. However, in Cotra's report most of the weight is assigned to the "neural net" anchors which talk about the compute for training an ANN of brain size using modern algorithms (plus there is the "genome" anchor in which the ANN is genome-sized). This is something that I don't see how to justify using Mark's model. On Mark's model, modern algorithms might very well hit diminishing returns soon, in which case we will switch to different algorithms which might have a completely different compute(parameter count) function. -------------------------------------------------------------------------------- 1. Assuming evolution also cannot discover algorithms outside our class o
Rogue AGI Embodies Valuable Intellectual Property

Yeah, I'm really not sure how the monopoly -> non-monopoly dynamics play out in practice. In theory, perfect competition should drive the cost to the cost of marginal production, which is very low for software. I briefly tried getting empirical data for this, but couldn't find it, plausibly since I didn't really know the right search terms.

AMA: Paul Christiano, alignment researcher

How would you teach someone how to get better at the engine game?

2Paul Christiano9moNo idea other than playing a bunch of games (might as well current version, old dailies probably best) and maybe looking at solutions when you get stuck. Might also just run through a bunch of games and highlight the main important interactions and themes for each of them, e.g. Innovation + Public Works + Reverberate [] or Hatchery + Till []. I think on any given board (and for the game in general) it's best to work backwards from win conditions, then midgames, and then openings.
1Neel Nanda9moWhat's the engine game?
AMA: Paul Christiano, alignment researcher

You've written multiple outer alignment failure stories. However, you've also commented that these aren't your best predictions. If you condition on humanity going extinct because of AI, why did it happen?

I think my best guess is kind of like this story, but:

  1. People aren't even really deploying best practices.
  2. ML systems generalize kind of pathologically over long time horizons, and so e.g. long-term predictions don't correctly reflect the probability of systemic collapse.
  3. As a result there's no complicated "take over the sensors moment" it's just everything is going totally off the rails and everyone is yelling about it but it just keeps gradually drifting on the rails.
  4. Maybe the biggest distinction is that e.g. "watchdogs" can actually give pretty good argume
... (read more)
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers

I'm curious what "put it in my SuperMemo" means. Quick googling only yielded SuperMemo as a language learning tool.

2Alex Turner9moIt's a spaced repetition system that focuses on incremental reading. It's like Anki, but instead of hosting flashcards separately from your reading, you extract text while reading documents and PDFs. You later refine extracts into ever-smaller chunks of knowledge, at which point you create the "flashcard" (usually 'clozes', demonstrated below). Here's a Wikipedia article I pasted into SuperMemo. Blue bits are the extracts, which it'll remind me to refine into flashcards later.A cloze deletion flashcard. It's easy to make a lot of these. I like them.Incremental reading is nice because you can come back to information over time as you learn more, instead of having to understand enough to make an Anki card right away. In the context of this post, I'm reading some of the papers, making extracts, making flashcards from the extracts, and retaining at least one or two key points from each paper. Way better than retaining 1-2 points from all 70 summaries!
Transparency Trichotomy

I agree it's sort of the same problem under the hood, but I think knowing how you're going to go from "understanding understanding" to producing an understandable model controls what type of understanding you're looking for.

I also agree that this post makes ~0 progress on solving the "hard problem" of transparency, I just think it provides a potentially useful framing and creates a reference for me/others to link to in the future.

Open Problems with Myopia

One way of looking at DDT is "keeping it dumb in various ways." I think another way of thinking about is just designing a different sort of agent, which is "dumb" according to us but not really dumb in an intrinsic sense. You can imagine this DDT agent looking at agents that do do acausal trade and thinking they're just sacrificing utility for no reason.

There is some slight awkwardness in that the decision problems agents in this universe actually encounter means that UDT agents will get higher utility than DDT agents.

I agree that the maximum a posterior world doesn't help that much, but I think there is some sense in which "having uncertainty" might be undesirable.

4Daniel Kokotajlo6moAlso: I think making sure our agents are DDT is probably going to be approximately as difficult as making them aligned. Related: Your handle for anthropic uncertainty is: "Always think they know who they are" doesn't cut it; you can think you know you're in a simulation. I think a more accurate version would be something like "Always think that you are on an original planet, i.e. one in which life appeared 'naturally,' rather than a planet in the midst of some larger interstellar civilization, or a simulation of a planet, or whatever. Basically, you need to believe that you were created by humans but that no intelligence played a role in the creation and/or arrangement of the humans who created you. Or... no role other than the "normal" one in which parents create offspring, governments create institutions, etc. I think this is a fairly specific belief, and I don't think we have the ability to shape our AIs beliefs with that much precision, at least not yet.
Open Problems with Myopia

has been changed to imitation, as suggested by Evan.

Open Problems with Myopia

Yeah, you're right that it's obviously unsafe. The words "in theory" were meant to gesture at that, but it could be much better worded. Changed to "A prototypical example is a time-limited myopic approval-maximizing agent. In theory, such an agent has some desirable safety properties because a human would only approve safe actions (although we still would consider it unsafe)."

Open Problems with Myopia

Yep - I switched the setup at some point and forgot to switch this sentence. Thanks.

Defusing AGI Danger

My opposite intuition is suggested by the fact that if you're trying to guess correctly a series of random digits with 80% "1" and 20% "0", then you should always guess "1".

I don't quite know how to model cross-pollination and diminishing sort of returns. I think working on both for the information value is likely going to be very good. It seems hard to imagine a scenario where you're robustly confident that one project is 80% better taking diminishing returns into account without being able to create a 3rd project with the best features of both, but if yo... (read more)

2Daniel Kokotajlo1yHere's a way to model diminishing returns: The first hour of research on strategy X produces as much value as the next two hours, which produces as much value as the next four hours, etc. Value = log_2(hours). If this is true, then you should split your hours such that log_2(hourstowards80project)*0.8 + log_2(hourstoward20project)*0.2 is maximized, which I think means that you should distribute your hours across projects proportional to their probability...*0.8+%2B+log_2%281-X%29*0.2%29 [*0.8+%2B+log_2%281-X%29*0.2%29] (I don't know much math so I'm not confident I'm doing this right) Value of information I hadn't even considered, but maybe we can bundle it up with diminishing returns and say it's part of the reason returns diminish.
Defusing AGI Danger

I absolutely agree that I'm not arguing for "safety by default".

I don't quite agree that you should split effort between strategies, i.e. it seems likely that if you think 80% disaster by default, you should dedicate 100% of your efforts to that world.

3Daniel Kokotajlo1yOK, interesting. Well, here's my argument for effort-splitting then: There are probably diminishing returns to pursuing each strategy. In research in general, ideas and questions tend to cross-pollinate, etc. And if you are 20% confident that research project X is the most important, and 80% that research project Y is most important, and they are both on a similar topic, this seems like a classic case where you should do both (but with more effort towards Y). This is more of an intuition than an argument, I guess. But what do you think?
Operationalizing compatibility with strategy-stealing

Using the perspective from The ground of optimization means you can get rid of the action space and just say "given some prior and some utility function, what percentile of the distribution does this system tend to evolve towards?" (where the optimization power is again the log of this percentile)

We might then say that an optimizing system is compatible with strategy stealing if it's retargetable for a wide set of utility functions in a way that produces an optimizing system that has the same amount of optimization power.

An AI that is compatible with strat... (read more)

Defusing AGI Danger

Thanks! Also, oops - fixed.

Understanding “Deep Double Descent”

This post gave a slightly better understanding of the dynamics happening inside SGD. I think deep double descent is strong evidence that something like a simplicity prior exists in SGG, which might have actively bad generalization properties, e.g. by incentivizing deceptive alignment. I remain cautiously optimistic that approaches like Learning the Prior can get circumnavigate this problem.

A space of proposals for building safe advanced AI

I claim that if we call the combination of the judge plus one debater Amp(M), then we can think of the debate as M* being trained to beat Amp(M) by Amp(M)'s own standards.

This seems like a reasonable way to think of debate.

I think, in practice (if this even means anything), the power of debate is quite bounded by the power of the human, so some other technique is needed to make the human capable of supervising complex debates, e.g. imitative amplification.

A space of proposals for building safe advanced AI

Debate: train M* to win debates against Amp(M).

I think Debate is closer to "train M* to win debates against itself as judged by Amp(M)".

3Richard Ngo1yWouldn't it just be "train M* to win debates against itself as judged by H"? Since in the original formulation of debate a human inspects the debate transcript without assistance. Anyway, I agree that something like this is also a reasonable way to view debate. In this case, I was trying to emphasise the similarities between Debate and the other techniques: I claim that if we call the combination of the judge plus one debater Amp(M), then we can think of the debate as M* being trained to beat Amp(M) by Amp(M)'s own standards. Maybe an easier way to visualise this is that, given some question, M* answers that question, and then Amp(M) tries to identify any flaws in the argument by interrogating M*, and rewards M* if no flaws can be found.
Does SGD Produce Deceptive Alignment?

Yep. Meant to say "if a model knew that it was in its last training episode and it wasn't going to be deployed." Should be fixed.

Introduction to Cartesian Frames

This is very exciting. Looking forward to the rest of the sequence.

As I was reading, I found myself reframing a lot of things in terms of the rows and columns of the matrix. Here's my loose attempt to rederive most of the properties under this view.

  • The world is a set of states. One way to think about these states is by putting them in a matrix, which we call "cartesian frame." In this frame, the rows of the matrix are possible "agents" and the columns are possible "environments".
    • Note that you don't have to put all the states in the matrix.
  • Ensurable
... (read more)
Introduction to Cartesian Frames

In 4.1:

Given a0 and a1, since S∈Obs(C), there exists an a2∈A such that for all e∈E, we have a2∈if(S,a0,a1). Then, since T∈Obs(C), there exists an a3∈A such that for all e∈E, we have a3∈if(S,a0,a2). Unpacking and combining these, we get for all e∈E, a3∈if(S∪T,a0,a1). Since we could construct such an a3 from an arbitrary a0,a1∈A, we know that S∪T∈Obs(C). □

I think there's a typo here. Should be , not .

(also not sure how to copy latex properly).

1Scott Garrabrant1yYep. Fixed. Thanks.
The Solomonoff Prior is Malign

I personally see no fundamental difference between direct and indirect ways of influence, except in so far as they relate to stuff like expected value.

I agree that given the amount expected influence, other universes are not high on my priority list, but they are still on my priority list. I expect the same for consequentialists in other universes. I also expect consequentialist beings that control most of their universe to get around to most of the things on their priority list, hence I expect them to influence the Solmonoff prior.

Understanding “Deep Double Descent” gives a theoretical argument that suggests SGD will converge to a point that is very close in L2 norm to the initialization. Since NNs are often initialized with extremely small weights, this amounts to implicit L2 regularization. 

Forecasting Thread: AI Timelines

My rough take:

3 buckets, similar to Ben Pace's 

  1. 5% chance that current techniques just get us all the way there, e.g. something like GPT-6 is basically AGI
  2. 10% chance AGI doesn't happen this century, e.g. humanity sort of starts taking this seriously and decides we ought to hold off + the problem being technically difficult enough that small groups can't really make AGI themselves
  3. 50% chance that something like current techniques and some number of new insights gets us to AGI. 

If I thought about this ... (read more)