All of evhub's Comments + Replies

I don't understand the new unacceptability penalty footnote. In both of the terms, there is no conditional sign. I presume the comma is wrong?

They're unconditional, not conditional probabilities. The comma is just for the exists quantifier.

Also, for me \mathbb{B} for {True, False} was not standard, I think it should be defined.

Sure—edited.

From my perspective, training stories are focused pretty heavily on the idea that justification is going to come from a style more like heavily precedented black boxes than like cognitive interpretability

I definitely don't think this—in fact, I tend to think that cognitive interpretability is probably the only way we can plausibly get high levels of confidence in the safety of a training process. From “How do we become confident in the safety of a machine learning system?”:

Nevertheless, I think that transparency-and-interpretability-based training rat

... (read more)

The idea is that we're thinking of pseudo-inputs as “predicates that constrain X” here, so, for , we have .

  • The path-independent case is probably overall easier to analyze—we understand things like speed and simplicity pretty well, and in the path-independent case we don't have to keep track of complex sequencing effects.
  • I think my path-independent analysis is more robust, for the same reasons as in the first point.
  • Presumably that assumption is path-dependence? I'm not sure what you mean.

In the first equation under “KL-regularised RL as variational inference,” I think should be .

I've found that people often really struggle to understand the content from the former but got it when I gave them the latter—and also I think the latter post covers a lot of newer stuff that's not in the old one (e.g. different models of inductive biases).

3Richard Ngo2mo
I considered this, but it seems like the latter is 4x longer while covering fairly similar content?

Yeah, I agree with that. I think path dependence will likely vary across training processes and that we should in fact view that as an important intervention point.

What notion of "deceptive alignment" is narrower than that?

Any definition that makes mention of the specific structure/internals of the model.

To be clear, I agree that unknown unknowns are in some sense the biggest problem in AI safety—as I talk about in the very first paragraph here.

However, I nevertheless think that focusing on deceptive alignment specifically makes a lot of sense. If we define deceptive alignment relatively broadly as any situation where “the reason the model looks aligned is because it is actively trying to game the training signal for the purpose of achieving some ulterior goal” (where training signal doesn't necessarily mean the literal loss, just anything we're trying to ... (read more)

4johnswentworth3mo
That's "relatively broad"??? What notion of "deceptive alignment" is narrower than that? Roughly that definition is usually my stock example of a notion of deception which is way too narrow to focus on and misses a bunch of the interesting/probable/less-correlated failure modes (like e.g. the sort of stuff in Worlds Where Iterative Design Fails [https://www.lesswrong.com/posts/xFotXGEotcKouifky/worlds-where-iterative-design-fails] ). This I agree with, but I think it doesn't go far enough. In my software engineering days, one of the main heuristics I recommended was: when building a library, you should have a minimum of three use cases in mind. And make them as different as possible, because the library will inevitably end up being shit for any use case way out of the distribution your three use cases covered. Same applies to research: minimum of three use cases, and make them as different as possible.

(Moderation note: added to the Alignment Forum from LessWrong.)

Yeah, that's a good point—I agree that the thing I said was a bit too strong. I do think there's a sense in which the models you're describing seem pretty deceptive, though, and quite dangerous, given that they have a goal that they only pursue when they think nobody is looking. In fact, the sort of model that you're describing is exactly the sort of model I would expect a traditional deceptive model to want to gradient hack itself into, since it has the same behavior but is harder to detect.

To be clear, I think I basically agree with everything in the comment chain above. Nevertheless, I would argue that these sorts of experiments are worth running anyway, for the sorts of reasons that I outline here.

The most common, these days, is some variant of “train an AI to help with aligning AI”. Sometimes it’s “train an AI to interpret the internals of another AI”, sometimes it’s “train an AI to point out problems in another AI’s plan”, sometimes it’s “train an AI to help you design aligned AI”, etc. I would guess about 75% of newcomers from ML suggest some such variant as their first idea.

I don't think these are crazy or bad ideas at all—I'd be happy to steelman them with you at some point if you want. Certainly, we don't know how to make any of them work r... (read more)

Even agreeing that no additional complexity is required to rederive that it should try to be deceptive (assuming it has situational awareness of the training process and long-term goals which aren't aligned with ours), to be deceptive successfully, it then needs to rederive what our goals are, so that it can pursue them instrumentally. I'm arguing that the ability to do this in the AI would require additional complexity compared to a AI that doesn't need to rederive the content of this goal (that is, our goal) at every decision.

My argument is that the o... (read more)

3Robert Kirk3mo
When you say "the knowledge of what our goals are should be present in all models", by "knowledge of what our goals are" do you mean a pointer to our goals (given that there are probably multiple goals which are combined in someway) is in the world model? If so this seems to contradict you earlier saying: I guess I don't understand what it would mean for the deceptive AI to have the knowledge of what are goals are (in the world model), but for that not to mean it doesn't have a hard-coded pointer to what our goals are. I'd imagine that what it means for the world model to capture what our goals are is exactly having such a pointer to them. (I realise I've been failing to do this, but it might make sense to use AI when we mean the outer system and model when we mean the world model. I don't think this is the core of the disagreement, but it could make the discussion clearer. For example, when you say the knowledge is present in the model, do you mean the world model or the AI more generally? I assumed the former above.) To try and run my (probably inaccurate) simulation of you: I imagine you don't think that's a contradiction above. So you'd think that "knowledge of what our goals are" doesn't mean a pointer to our goals in all the AI's world models, but something simpler, that can be used to figure out what our goals are by the deceptive AI (e.g. in it's optimisation process), but wouldn't enable the aligned AI to use as its objective a simpler pointer and instead would require the aligned AI to hard-code the full pointer to our goals (where the pointer would be pointing into it's the world model, and probably using this simpler information about our goals in some way). I'm struggling to imagine what that would look like.

I guess I'm arguing that the additional complexity in the deceptive model that allows it to rederive our goals at every forward pass compensates for the additional complexity in the corrigible model that has the our goals hard-coded.

No additional complexity is required to rederive that it should try to be deceptive—it's a convergent instrumental goal that falls out of any generic optimization process that essentially all models should have to have anyway to be able to get good performance on tasks that require optimization.

1Robert Kirk3mo
Even agreeing that no additional complexity is required to rederive that it should try to be deceptive (assuming it has situational awareness of the training process and long-term goals which aren't aligned with ours), to be deceptive successfully, it then needs to rederive what our goals are, so that it can pursue them instrumentally. I'm arguing that the ability to do this in the AI would require additional complexity compared to a AI that doesn't need to rederive the content of this goal (that is, our goal) at every decision. Alternatively, the aligned model could use the same derivation process to be aligned: The deceptive model has some long-term goal, and in pursuing it rederives the content of the instrumental goal "do 'what the training process incentives'", and the alignment model has the long-term goal "do 'what the training process incentivises'" (as a pointer/de dicto), and also rederives it with the same level of complexity. I think "do 'what the training process incetivises'" (as a pointer/de dicto) isn't a very complex long-term goal., and feels likely to be as complex as the arbitrary crystallised deceptive AI's internal goal, assuming both models have full situation awareness of the training process and hence such a pointer is possible, which we're assuming they do. (ETA/Meta point: I do think deception is a big issue that we definitely need more understanding of, and I definitely put weight on it being a failure of alignment that occurs in practice, but I think I'm less sure it'll emerge (or less sure that your analysis demonstrates that). I'm trying to understand where we disagree, and whether you've considered the doubts I have and you possess good arguments against them or not, rather than convince you that deception isn't going to happen.)

It needs no such pointer in the weights, only in the activations. The deceptive model has to build such a pointer at runtime, but it doesn't need to have it hardcoded, whereas the corrigible model needs it to be hardcoded.

1Robert Kirk3mo
It seems like a lot more computationally difficult to, at every forward pass/decision process, derive/build/construct such a pointer. If the deceptive model is going to be doing this every time it seems like it would be more efficient to have a dedicated part of the network that calculates it (i.e. have it in the weights) Separately, for more complex goals this procedure is also going to be more complex, and the network probably needs to be more complex to support constructing it in the activations at every forward pass, compared to the corrigible model that doesn't need to do such a construction (becauase it has it hard-coded as you say). I guess I'm arguing that the additional complexity in the deceptive model that allows it to rederive our goals at every forward pass compensates for the additional complexity in the corrigible model that has the our goals hard-coded. The corrigible model needs to be able to robustly point to our goals, in a way that doesn't change. One way of doing this is having the goals hardcoded. Another way might be to instead have a pointer to the output of a procedure that is executed at runtime that always constructs our goals in the activations. If the deceptive model can reliably construct in it's activations something that actually points towards our goals, then the corrible model could also have such a procedure, and make it's goal be a pointer to the output of such a procedure. Then the only difference in model complexity is that the deceptive model points to some arbitrary attribute of the world model (or whatever), and the aligned model points to the output of this computation, that both models posses. I think at a high level I'm trying to say that any way in which the deceptive model can robustly point at our goals such that it can pursue them instrumentally, the aligned model can robustly point at them to pursue them terminally. SGD+DL+whatever may favour one way of another of robustly pointing at such goals (either in the weig

I'd think that you should probably ask for feedback and look into how to promote cooperation privately first.

I have done this also.

Yep, I agree with that. That's orthogonal to myopia as I use the term, though.

A) There is a video, but it's not super high quality and I think the transcript is better. If you really want to listen to it, though, you can take a look here.

B) Yeah, I agree with that. Perhaps the thing I said in the talk was too strong—the thing I mean is a model where the objective is essentially the same as what you want, but the optimization process and world model are potentially quite superior. I still think there's still approximately only one of those, though, since you have to get the objective to exactly match onto what you want.

1Charlie Steiner3mo
Once you're trying to extrapolate me rather than just copy me as-is, there are multiple ways to do the extrapolation. But I'd agree it's still way less entropy than deceptive alignment.

Unconditional. I'm rather more pessimistic than an overall 10% chance. I usually give ~80% chance of existential risk from AI.

I think it really depends on the specific training setup. Some are much more likely than others to lead to deceptive alignment, in my opinion. Here are some numbers off the top of my head, though please don't take these too seriously:

  • ~90%: if you keep scaling up RL in complex environments ad infinitum, eventually you get deceptive alignment.
  • ~80%: conditional on RL in complex environments being the first path to transformative AI, there will be deceptively aligned RL models.
  • ~70%: if you keep scaling up GPT-style language modeling ad infinitum, eventuall
... (read more)
2Ivan Vendrov3mo
Thank you for putting numbers on it! Is this an unconditionally prediction of 60% chance of existential catastrophe due to deceptive alignment alone? In contrast to the commonly used 10% chance of existential catastrophe due to all AI sources this century. Or do you mean that, conditional on there being an existential catastrophe due to AI, 60% chance it will be caused by deceptive alignment, and 40% by other problems like misuse or outer alignment?

(Moderation note: moved to the Alignment Forum from LessWrong.)

+1. Also:

I think the answer to "are you phenomenally conscious" will be sensitive to small differences in the training data involving similar conversations.

I'm not sure why the narrowness vs. broadness of the distribution of answers here should update me either. If it's just really confident that all sci-fi AIs are supposed to answer “yes” to “are you conscious,” you'll get the same answer every time but that answer won't correlate to anything about the model's actual consciousness.

3Ethan Perez3mo
I think we can mitigate this issue by removing all data related/adjacent to consciousness and/or AIs when pretraining/finetuning the model. Here, we'd only explain the notion of phenomenal consciousness to the model at test time, when it needs to answer the consciousness-related questions

That isn't the main point I had in mind. See my comment to Chris here.

Left a comment.

Yup, the training story regime sounds good by my lights. Am I intended to conclude something further from this remark of yours, though?

Nope, just wanted to draw your attention to another instance of alignment researchers already understanding this point.


Also, I want to be clear that I like this post a lot and I'm glad you wrote it—I think it's good to explain this sort of thing more, especially in different ways that are likely to click for different people. I jus... (read more)

5Alex Turner4mo
I have privately corresponded with a senior researcher who, when asked what they thought would result from a specific training scenario, made an explicit (and acknowledged) mistake along the lines of this post. Another respected researcher seemingly slipped on the same point, some time after already discussing this post with them. I am still not sure whether I'm on the same page with Paul, as well (I have general trouble understanding what he believes, though). And Rohin also has this experience of explaining the points in OP on a regular basis. All this among many other private communication events I've experienced. (Out of everyone I would expect to already have understood this post, I think you and Rohin would be at the top of the list.) So basically, the above screens off "Who said what in past posts?", because whoever said whatever, it's still producing my weekly experiences of explaining the points in this post. I still haven't seen the antecedent-computation-reinforcement (ACR) emphasis thoroughly explained elsewhere, although I agree that some important bits (like training stories) are not novel to this post. (The point isn't so much "What do I get credit for?" as much as "I am concerned about this situation.") Here's more speculation. I think alignment theorists mostly reason via selection-level arguments. While they might answer correctly on "Reward is? optimization target" when pressed, and implicitly use ACR to reason about what's going on in their ML training runs, I'd guess that probably don't engage in mechanistic ACR reasoning in their day-to-day theorizing. (Again, I can only speculate, because I am not a mind-reader, but I do still have beliefs on the matter.)

Reward functions often are structured as objectives, which is why we talk about them that way. In most situations, if you had access to e.g. AIXI, you could directly build a “reward maximizer.”

I agree that this is not always the case, though, as in the discussion here. That being said, I think it is often enough the case that it made sense to focus on that particular case in RFLO.

2Alex Turner4mo
What does this mean? By "structured as objectives", do you mean something like "people try to express what they want with a reward function, by conferring more reward to more desirable states"? (I'm going to assume so for the rest of the comment, LMK if this is wrong.) I agree that other people (especially my past self) think about reward functions this way. I think they're generally wrong to do so, and it's misleading as to the real nature of the alignment problem. I agree with that post, thanks for linking. As far as I can tell, AIXI and other hardcoded planning agents are the known exceptions to the arguments in this post. We will not get AGI via these approaches. When else is it the case? I therefore still feel confused why you think it made sense. While I definitely appreciate the work you all did with RFLO, the framing of reward as a "base objective" seems like a misstep that set discourse in a weird direction which I'm trying to push back on (from my POV!). I think that the "base objective" is better described as a "cognitive-update-generator." (This is not me trying to educate you on this specific point, but rather argue that it really matters how we frame the problem in our day-to-day reasoning.)
  • A deceptive model doesn't have to have some sort of very explicit check for whether it's in training or deployment any more than a factory-cleaning robot has to have a very explicit check for whether it's in the jungle instead of a factory. If it someday found itself in a very different situation than currently (training), it would reconsider its actions, but it doesn't really think about it very often because during training it just looks too unlikely.

I do in fact think that few people actually already deeply internalized the points I'm making in this post, even including a few people who say they have or that this post is obvious. Therefore, I concluded that lots of alignment thinking is suspect until re-analyzed.

Risks from Learned Optimization in Advanced Machine Learning Systems,” which we published three years ago and started writing four years ago, is extremely explicit that we don't know how to get an agent that is actually optimizing for a specified reward function. The alignment research com... (read more)

3Alex Turner4mo
That isn't the main point I had in mind. See my comment to Chris here [https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target?commentId=eFb9sdzymu3oPm8PE] . EDIT: Yup, the training story regime sounds good by my lights. Am I intended to conclude something further from this remark of yours, though?

My usual take here is the “Nobody Cares” model, though I think there is one scenario that I tend to be worried about a bit here that you didn't address, which is how to think about whether or not you want things ending up in the training data for a future AI system. That's a scenario where the “Nobody Cares” model really doesn't apply, since the AI actually does have time to look at everything you write.

That being said, I think that, most of the time, alignment work ending up in training data is good, since it can help our AI systems be differentially bett... (read more)

1Ofer4mo
That consideration seems relevant only for language models that will be doing/supporting alignment work.

Worrying about which alignment writing ends up in the training data feels like a very small lever for affecting alignment; my general heuristic is that we should try to focus on much bigger levers.

Using interpretability tools in the loss function incentivises a startling number of passive circumvention methods.

So why consider using it at all? There are two cases where it might be worthwhile to use:

  • If we’re absolutely 100% provably definitely sure that our interpretability methods cover every representational detail that the AI might be using, then it would be extremely hard or impossible to Goodhart interpretability tools in the loss function. But, for safety, we should probably assume that our tools are not able to interpret everything, making p
... (read more)
4Lee Sharkey4mo
This sounds really reasonable. I had only been thinking of a naive version of interpretability tools in the loss function that doesn't attempt to interpret the gradient descent process. I'd be genuinely enthusiastic about the strong version you outlined. I expect to think a lot about it in the near future.

I think I broadly disagree with this—using RLHF in a way that actually corresponds to some conditional of the original policy seems very important to me.

That said, it seems to me that in practice you often should be trying to converge to a zero entropy policy. The optimal action is not random; we want the model to be picking the output that looks best in light of its beliefs.

I think you probably expect that we'll eventually be able to give good feedback even for quite complex tasks like “actually say the truth”—but if you don't expect that to ever succ... (read more)

But maybe I just don't understand this proposal yet (and I have had some trouble distilling things I recognize as plans out of Evan's writing, so far).

Maybe this and this will help.

  • Ensembling as an AI safety solution is a bad way to spend down our alignment tax—training another model brings you to 2x compute budget, but even in the best case scenario where the other model is a totally independent draw (which in fact it won't be), you get at most one extra bit of optimization towards alignment.
  • Chain of thought prompting can be thought of as creating an average speed bias that might disincentivize deception.
6Evan Hubinger4mo
* A deceptive model doesn't have to have some sort of very explicit check for whether it's in training or deployment any more than a factory-cleaning robot has to have a very explicit check for whether it's in the jungle instead of a factory. If it someday found itself in a very different situation than currently (training), it would reconsider its actions, but it doesn't really think about it very often because during training it just looks too unlikely.

It seems that you're postulating that the human brain's credit assignment algorithm is so bad that it can't tell what high-level goals generated a particular action and so would give credit just to thoughts directly related to the current action. That seems plausible for humans, but my guess would be against for advanced AI systems.

2Alex Turner5mo
No, I don't intend to postulate that. Can you tell me a mechanistic story of how better credit assignment would go, in your worldview?

I find this post interesting as a jumping-off point; seems like the kind of thing which will inspire useful responses via people going "no that's totally wrong!".

Yeah, I'd be very happy if that were the result of this post—in fact, I'd encourage you and others to just try to build your own tech trees so that we have multiple visions of progress that we can compare.

Yep, that's right—if you take a look at my discussion under (6), my claim is that we should only need (6), not (8).

  • Other search-like algorithms like inference on a Bayes net that also do a good job in diverse environments also have the problem that their capabilities generalize faster than their objectives—the fundamental reason being that the regularity that they are compressing is a regularity only in capabilities.
  • Neural networks, by virtue of running in constant time, bring algorithmic equality all the way from uncomputable to EXPTIME—not a large difference in practice.
  • One way to think about the core problem with relaxed adversarial training is that when we gener
... (read more)
  • Ensembling as an AI safety solution is a bad way to spend down our alignment tax—training another model brings you to 2x compute budget, but even in the best case scenario where the other model is a totally independent draw (which in fact it won't be), you get at most one extra bit of optimization towards alignment.
  • Chain of thought prompting can be thought of as creating an average speed bias that might disincentivize deception.

This is a list of random, assorted AI safety ideas that I think somebody should try to write up and/or work on at some point. I have a lot more than this in my backlog, but these are some that I specifically selected to be relatively small, single-post-sized ideas that an independent person could plausibly work on without much oversight. That being said, I think it would be quite hard to do a good job on any of these without at least chatting with me first—though feel free to message me if you’d be interested.

  • What would be necessary to build a good audit
... (read more)
6Alex Turner5mo
Humans don't wirehead because reward reinforces the thoughts which the brain's credit assignment algorithm deems responsible for producing that reward. Reward is not, in practice, that-which-is-maximized -- reward is the antecedent-thought-reinforcer, it reinforces that which produced it. And when a person does a rewarding activity, like licking lollipops, they are thinking thoughts about reality (like "there's a lollipop in front of me" and "I'm picking it up"), and so these are the thoughts which get reinforced. This is why many human values are about latent reality and not about the human's beliefs about reality or about the activation of the reward system.
  • Other search-like algorithms like inference on a Bayes net that also do a good job in diverse environments also have the problem that their capabilities generalize faster than their objectives—the fundamental reason being that the regularity that they are compressing is a regularity only in capabilities.
  • Neural networks, by virtue of running in constant time, bring algorithmic equality all the way from uncomputable to EXPTIME—not a large difference in practice.
  • One way to think about the core problem with relaxed adversarial training is that when we gener
... (read more)

Sure—that's easy enough. Just off the top of my head, here's five safety concerns that I think are important that I don't think you included:

  • The fact that there exist functions that are easier to verify than satisfy ensures that adversarial training can never guarantee the absence of deception.

  • It is impossible to verify a model's safety—even given arbitrarily good transparency tools—without access to that model's training process. For example, you could get a deceptive model that gradient hacks itself in such a way that cryptographically obfuscates i

... (read more)

Consider my vote to be placed that you should turn this into a post, keep going for literally as long as you can, expand things to paragraphs, and branch out beyond things you can easily find links for.

(I do think there's a noticeable extent to which I was trying to list difficulties more central than those, but I also think many people could benefit from reading a list of 100 noncentral difficulties.)

I'm not sure what you mean by missing points? I only included your technical claims, not your sociological ones, if that's what you mean.

It's very clear to me I could have written this if I had wanted to—and at the very least I'm sure Paul could have as well. As evidence: it took me ~1 hour to list off all the existing sources that cover every one of these points in my comment.

Well, there's obviously a lot of points missing!  And from the amount this post was upvoted, it's clear that people saw the half-assed current form as valuable.

Why don't you start listing out all the missing further points, then?  (Bonus points for any that don't trace back to my own invention, though I realize a lot of people may not realize how much of this stuff traces back to my own invention.)

That requires, not the ability to read this document and nod along with it, but the ability to spontaneously write it from scratch without anybody else prompting you; that is what makes somebody a peer of its author. It's guaranteed that some of my analysis is mistaken, though not necessarily in a hopeful direction. The ability to do new basic work noticing and fixing those flaws is the same ability as the ability to write this document before I published it, which nobody apparently did, despite my having had other things to do than write this up for th

... (read more)

Eliezer's post here is doing work left undone by the writing you cite. It is a much clearer account of how our mainline looks doomed than you'd see elsewhere, and it's frank on this point.

I think Eliezer wishes these sorts of artifacts were not just things he wrote, like this and "There is no fire alarm".

Also, re your excerpts for (14), (15), and (32), I see Eliezer as saying something meaningfully different in each case. I might elaborate under this comment.

12Eliezer Yudkowsky6mo
Well, my disorganized list sure wasn't complete, so why not go ahead and list some of the foreseeable difficulties I left out? Bonus points if any of them weren't invented by me, though I realize that most people may not realize how much of this entire field is myself wearing various trenchcoats.

I agree this list doesn't seem to contain much unpublished material, and I think the main value of having it in one numbered list is that "all of it is in one, short place", and it's not an "intro to computers can think" and instead is "these are a bunch of the reasons computers thinking is difficult to align".

The thing that I understand to be Eliezer's "main complaint" is something like: "why does it seem like No One Else is discovering new elements to add to this list?". Like, I think Risks From Learned Optimization was great, and am glad you and others ... (read more)

Thank you, Evan, for living the Virture of Scholarship. Your work is appreciated. 

I was referring to stuff like this, this, and this.

I haven't finished it yet, but I've so far very much enjoyed the Pragmatic AI Safety sequence, though I certainly have disagreements with it.

As someone who has really not been a fan of a lot of the recent conversations on LessWrong that you mentioned, I thought this was substantially better in an actually productive way with some really good analysis.

Also, if you or anyone else has a good concrete idea along these lines, feel free to reach out to me and I can help you get support, funding, etc. if I think the idea is a good one.

(Moderation note: added to the Alignment Forum from LessWrong.)

1Robert Kirk6mo
I'd be curious to hear what your thoughts are on the other conversations, or at least specifically which conversations you're not a fan of?

Sure—I just edited it to be maybe a bit less jarring for those who know Greek.

I think this concern is only relevant if your strategy is to do RL on human evaluations of alignment research. If instead you just imitate the distribution of current alignment research, I don't think you get this problem, at least anymore than we have it now--and I think you can still substantially accelerate alignment research with just imitation. Of course, you still have inner alignment issues, but from an outer alignment perspective I think imitation of human alignment research is a pretty good thing to try.

Actions are just language outputs—and since we ask for an action that describes a pseudo-input, hopefully we should be able to interpret it that way.

The reason other possible reporter heads work is because they access the model's concept for X and then do something with that (where the 'doing something' might be done in the core model or in the head).

I definitely think there are bad reporter heads that don't ever have to access X. E.g. the human imitator only accesses X if X is required to model humans, which is certainly not the case for all X.

E.g. simplicity of explanation of a particular computation is a bit more like a speed prior.

I don't think that the size of an explanation/proof of correctness for a program should be very related to how long that program runs—e.g. it's not harder to prove something about a program with larger loop bounds, since you don't have to unroll the loop, you just have to demonstrate a loop invariant.

Load More