All of Ben Pace's Comments + Replies

Let's See You Write That Corrigibility Tag
  • I think that corrigibility is more likely to be a crisp property amongst systems that perform well-as-evaluated-by-you. I think corrigibility is only likely to be useful in cases like this where it is crisp and natural.

Can someone explain to me what this crispness is?

As I'm reading Paul's comment, there's an amount of optimization for human reward that breaks our rating ability. This is a general problem for AI because of the fundamental reason that as we increase an AI's optimization power, it gets better at the task, but it also gets better at breaking m... (read more)

If you have a space with two disconnected components, then I'm calling the distinction between them "crisp." For example, it doesn't depend on exactly how you draw the line.

It feels to me like this kind of non-convexity is fundamentally what crispness is about (the cluster structure of thingspace is a central example). So if you want to draw a crisp line, you should be looking for this kind of disconnectedness/non-convexity.

ETA: a very concrete consequence of this kind of crispness, that I should have spelled out in the OP, is that there are many functions... (read more)

Let's See You Write That Corrigibility Tag

Minor clarification: This doesn't refer to re-writing the LW corrigibility tag. I believe a tag is a reply in glowfic, where each author responds with the next tag i.e. next bit of the story, with an implied "tag – now you're it!" at the other author. 

Agreed explicitly for the record.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

Just as a related idea, in my mind, I often do a kind of thinking that HPMOR!Harry would call “Hufflepuff Bones”, where I look for ways a problem is solvable in physical reality at all, before considering ethical and coordination and even much in the way of practical concerns.

AGI Ruin: A List of Lethalities

Thanks, this story is pretty helpful (to my understanding).

[$20K in Prizes] AI Safety Arguments Competition

I think it would be less "off-putting" if we had common knowledge of it being such a post. I think the authors don't think of it as that from reading Sidney's comment.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

In Part 2, I analyzed a common argument in favor of that kind of “pivotal act”, and found a pretty simple flaw stemming from fallaciously assuming that the AGI company has to do everything itself (rather than enlisting help from neutral outsiders, using evidence).

For the record this does seem like the cruxy part of the whole discussion, and I think more concrete descriptions of alternatives would help assuage my concerns here.

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.

This all seems like it would be good news. For the record I think that the necessary evidence to start act... (read more)

2johnswentworth2mo
Or you could get to it before I do and I could perform a replication.
Everything I Need To Know About Takeoff Speeds I Learned From Air Conditioner Ratings On Amazon

Does anyone in-thread (or reading along) have any experiments they'd be interested in me running with this air conditioner? It doesn't seem at all hard for me to do some science and get empirical data, with a different setup to Wirecutter, so let me know.

Added: From a skim of the thread, it seems to me the experiment that would resolve matters is testing in a large room with temperature sensors more like 15 feet away in a city or country that's very hot outside, and to compare this with (say) Wirecutter's top pick with two-hoses. Confirm?

... I actually already started a post titled "Preregistration: Air Conditioner Test (for AI Alignment!)". My plan was to use the one-hose AC I bought a few years ago during that heat wave, rig up a cardboard "second hose" for it, and try it out in my apartment both with and without the second hose next time we have a decently-hot day. Maybe we can have an air conditioner test party.

Predictions: the claim which I most do not believe right now is that going from one hose to two hose with the same air conditioner makes only a 20%-30% difference. The main metr... (read more)

Why Agent Foundations? An Overly Abstract Explanation

Curated. Laying out a full story for why the work you're doing is solving AI alignment is very helpful, and this framing captures different things from other framings (e.g. Rocket Alignment, Embedded Curiosities, etc). Also it's simply written and mercifully short, relative to other such things. Thanks for this step in the conversation.

ELK Thought Dump

Ah, very good point. How interesting…

(If I’d concretely thought of transferring knowledge between a bird and a dog this would have been obvious.)

ELK Thought Dump

Solomonoff's theory of induction, along with the AIXI theory of intelligence, operationalize knowledge as the ability to predict observations.

Maybe this is what knowledge is. But I’d like to try coming up with at least one alternative. So here goes!

I want to define knowledge as part of an agent.

  • A system contains knowledge if the agent who built it can successfully attain its goals in its likely environments by using that system to figure out which of its actions will lead to outcomes the agent wants.
  • When comparing different systems that allow an agent to a
... (read more)
3Abram Demski3mo
Your definition requires that we already know how to modify Alice to have Clippy's goals. So your brute force idea for how to modify clippy to have Alice's knowledge doesn't add very much; it still relies on a magic goal/belief division, so giving a concrete algorithm doesn't really clarify. Really good to see this kind of response.
1Charlie Steiner4mo
I like this definition too. You might add some sort of distribution over goals (sort of like Attainable Utility) so that e.g. Alice can know things about things that she doesn't personally care about.
Late 2021 MIRI Conversations: AMA / Discussion

Eliezer, when you told Richard that your probability of a successful miracle is very low, you added the following note:

Though a lot of that is dominated, not by the probability of a positive miracle, but by the extent to which we seem unprepared to take advantage of it, and so would not be saved by one.

I don't mean to ask for positive fairy tales when I ask: could you list some things you could see in the world that would cause you to feel that we were well-prepared to take advantage of one if we got one?

My obvious quick guess would be "I know of an ML pro... (read more)

Late 2021 MIRI Conversations: AMA / Discussion

Eliezer and Nate, my guess is that most of your perspective on the alignment problem for the past several years has come from the thinking and explorations you've personally done, rather than reading work done by others.

But, if you have read interesting work by others that's changed your mind or given you helpful insights, what has it been? Some old CS textbook? Random Gwern articles? An economics textbook? Playing around yourself with ML systems?

Late 2021 MIRI Conversations: AMA / Discussion

Questions about the standard-university-textbook from the future that tells us how to build an AGI. I'll take answers on any of these!

  1. Where is ML in this textbook? Is it under a section called "god-forsaken approaches" or does it play a key role? Follow-up: Where is logical induction?
  2. If running superintelligent AGIs didn't kill you and death was cancelled in general, how long would it take you to write the textbook?
  3. Is there anything else you can share about this textbook? Do you know any of the other chapter names?

I'm going to try and write a table of contents for the textbook, just because it seems like a fun exercise.

Epistemic status: unbridled speculation

Volume I: Foundation

  • Preface [mentioning, ofc, the infamous incident of 2041]
  • Chapter 0: Introduction

Part I: Statistical Learning Theory

  • Chapter 1: Offline Learning [VC theory and Watanabe's singular learning theory are both special cases of what's in this chapter]
  • Chapter 2: Online Learning [infra-Bayesianism is introduced here, Garrabrant induction too]
  • Chapter 3: Reinforcement Learning
  • Chapter 4: Lifelong
... (read more)
2Richard Ngo4mo
1. Where is ML in this textbook? Is it under a section called "god-forsaken approaches" or does it play a key role? Follow-up: Where is logical induction? Key role, but most current ML is in the "applied" section, where the "theory" section instead explains the principles by which neural nets (or future architectures) work on the inside. Logical induction is a sidebar at some point explaining the theoretical ideal we're working towards, like I assume AIXI is in some textbooks. 1. Is there anything else you can share about this textbook? Do you know any of the other chapter names? Planning, Abstraction, Reasoning, Self-awareness.
3Rohin Shah4mo
I'm mostly going to answer assuming that there's not some incredibly different paradigm (i.e. something as different from ML as ML is from expert systems). I do think the probability of "incredibly different paradigm" is low. I'm also going to answer about the textbook at, idk, the point at which GDP doubles every 8 years. (To avoid talking about the post-Singularity textbook that explains how to build a superintelligence with clearly understood "intelligence algorithms" that can run easily on one of today's laptops, which I know very little about.) I think I roughly agree with Paul if you are talking about the textbook that tells us how to build the best systems for the tasks that we want to do. (Analogy: today's textbook for self-driving cars.) That being said, I think that much of the improvement over time will be driven by improvements specifically in ML. (Analogy: today's textbook for deep learning.) So we can talk about that textbook as well. 1. It's a textbook that's entirely about "finding good programs through a large, efficient search with a stringent goal", which we currently call ML. The content may be primarily some new approach for achieving this, with neural nets being a historical footnote, or it might be entirely about neural nets (though presumably with new architectures or other changes from today). Logical induction doesn't appear in the textbook. 2. Jeez, who knows. If I intuitively query my brain here, it mostly doesn't have an answer; a thousand vs. million vs. billion years don't really change my intuitive predictions about what I'd get done. So we can instead back it out from other estimates. Given timelines of 10^1 - 10^2 years, and, idk, ~10^6 humans working on the problem near the end, seems like I'm implicitly predicting ~10^7 human-years of effort in our actual world. Then you have to adjust for a ton of factors, e.g. my quality relative to the average, the importance of serial thinki

I don't think there is an "AGI textbook" any more than there is an "industrialization textbook." There are lots of books about general principles and useful kinds of machines. That said, if I had to make wild guesses about roughly what that future understanding would look like:

  1. There is a recognizable concept of "learning" meaning something like "search for policies that perform well in past or simulated situations." That plays a large role, comparably important to planning or Bayesian inference. Logical induction is likely an elaboration of Bayesian infere
... (read more)
ARC's first technical report: Eliciting Latent Knowledge

I am very excited by the way the post takes a relatively simple problem and shows, in trying to solve it, a great deal of the depth of the alignment problem.

FWIW I wouldn’t write this line today, I am now much more confused about what ELK says or means.

3Evan Hubinger4mo
Why? What changed in your understanding of ELK?
Yudkowsky and Christiano discuss "Takeoff Speeds"

(News: OpenAI has built a theorem-prover that solved many AMC12 and AIME competition problems, and 2 IMO problems, and they say they hope this leads to work that wins the IMO Grand Challenge.)

Some AI research areas and their relevance to existential safety

My quick two-line review is something like: this post (and its sequel) is an artifact from someone with an interesting perspective on the world looking at the whole problem and trying to communicate their practical perspective. I don't really share this perspective, but it is looking at enough of the real things, and differently enough to the other perspectives I hear, that I am personally glad to have engaged with it. +4.

Search versus design

"Search versus design" explores the basic way we build and trust systems in the world. A few notes: 

  • My favorite part is the definitions about an abstraction layer being an artifact combined with a helpful story about it. It helps me see the world as a series of abstraction layers. We're not actually close to true reality, we are very much living within abstraction layers — the simple stories we are able to tell about the artefacts we build. A world built by AIs will be far less comprehensible than the world we live in today. (Much more like biology is
... (read more)
ARC's first technical report: Eliciting Latent Knowledge

(I did not write a curation notice in time, but that doesn’t mean I don’t get to share why I wanted to curate this post! So I will do that here.)

Typically when I read a post by Paul, it feels like a single ingredient in a recipe, but one where I don’t know what meal the recipe is for. This report felt like one of the first times I was served a full meal, and I got to see how all the prior ingredients come together.

Alternative framing: Normally Paul’s posts feel like the argument step “J -> K” and I’m left wondering how we got to J, and where we’ll go fr... (read more)

2Ben Pace4mo
FWIW I wouldn’t write this line today, I am now much more confused about what ELK says or means.
Radical Probabilism

Radical Probabilism is an extensions of the Embedded Agency philosophical position. I remember reading is and feeling a strong sense that I really got to see a well pinned-down argument using that philosophy. Radical Probabilism might be a +9, will have to re-read, but for now I give it +4.

(This review is taken from my post Ben Pace's Controversial Picks for the 2020 Review.)
 

Introduction to Cartesian Frames

Introduction to Cartesian Frames is a piece that also gave me a new philosophical perspective on my life. 

I don't know how to simply describe it. I don't know what even to say here. 

One thing I can say is that the post formalized the idea of having "more agency" or "less agency", in terms of "what facts about the world can I force to be true?". The more I approach the world by stating things that are going to happen, that I can't change, the more I'm boxing-in my agency over the world. The more I treat constraints as things I could fight to chang... (read more)

An Orthodox Case Against Utility Functions

An Orthodox Case Against Utility Functions was a shocking piece to me. Abram spends the first half of the post laying out a view he suspects people hold, but he thinks is clearly wrong, which is a perspective that approaches things "from the starting-point of the universe". I felt dread reading it, because it was a view I held at the time, and I used as a key background perspective when I discussed bayesian reasoning. The rest of the post lays out an alternative perspective that "starts from the standpoint of the agent". Instead of my beliefs being about t... (read more)

5Abram Demski6mo
Partly because the "reductive utility" view is made a bit more extreme than it absolutely had to be. Partly because I think it's extremely natural, in the "LessWrong circa 2014 view", to say sentences like "I don't even know what it would mean for humans to have uncomputable utility functions -- unless you think the brain is uncomputable". (I think there is, or at least was, a big overlap between the LW crowd and the set of people who like to assume things are computable.) Partly because the post was directly inspired by another alignment researcher saying words similar to those, around 2019. Without this assumption, the core of the "reductive utility" view would be that it treats utility functions as actual functions from actual world-states to real numbers. These functions wouldn't have to be computable, but since they're a basic part of the ontology of agency, it's natural to suppose they are -- in exactly the same way it's natural to suppose that an agent's beliefs should be computable, and in a similar way to how it seems natural to suppose that physical laws should be computable. Ah, I guess you could say that I shoved the computability assumption into the reductive view because I secretly wanted to make 3 different points: 1. We can define beliefs directly on events, rather than needing "worlds", and this view seems more general and flexible (and closer to actual reasoning). 2. We can define utility directly on events, rather than "worlds", too, and there seem to be similar advantages here. 3. In particular, uncomputable utility functions seem pretty strange if you think utility is a function on worlds; but if you think it's defined as a coherent expectation on events, then it's more natural to suppose that the underlying function on worlds (that would justify the event expectations) isn't computable. Rather than make these three points separately, I set up a false dichotomy for illustration. Also worth highlighting that, like
ARC's first technical report: Eliciting Latent Knowledge

This is an interesting tack, this step and the next ("Strategy: have humans adopt the optimal Bayes net") feels new to me.

ARC's first technical report: Eliciting Latent Knowledge

From the section "Strategy: have humans adopt the optimal Bayes net":

Roughly speaking, imitative generalization:

  • Considers the space of changes the humans could make to their Bayes net;
  • Learns a function which maps (proposed change to Bayes net) to (how a human — with AI assistants — would make predictions after making that change);
  • Searches over this space to find the change that allows the humans to make the best predictions.

Regarding the second step, what is the meat of this function? My superficial understanding is that a Bayes net is deterministic and fu... (read more)

3Paul Christiano6mo
In general we don't have an explicit representation of the human's beliefs as a Bayes net (and none of our algorithms are specialized to this case), so the only way we are representing "change to Bayes net" is as "information you can give to a human that would lead them to change their predictions." That said, we also haven't described any inference algorithm other than "ask the human." In general inference is intractable (even in very simple models), and the only handle we have on doing fast+acceptable approximate inference is that the human can apparently do it. (Though if that was the only problem then we also expect we could find some loss function that incentivizes the AI to do inference in the human Bayes net.)
ARC's first technical report: Eliciting Latent Knowledge

Question: what's the relative amount of compute you are imagining SmartVault and the helper AI having? Both the same, or one having a lot more?

2Paul Christiano6mo
It will depend on how much much high-quality data you need to train the reporter. Probably it's a small fraction of the data you need to train the predictor, and so for generating each reporter datapoint you can afford to use many times more data than the predictor usually uses. I often imagine the helpers having 10-100x more computation time.
ARC's first technical report: Eliciting Latent Knowledge

I'm reading along, and I don't follow the section "Strategy: have AI help humans improve our understanding". The problem so far is that the AI only need identify bad outcomes that the human labelers can identify, rather than bad outcomes regardless of human-labeler identification. 

The solution posed here is to have AIs help the human labeler understand more bad (and good) outcomes, using powerful AI. The section mostly provides justification for making the assumption that we can align these helper AIs (reason: the authors believe there is a counterexa... (read more)

4Paul Christiano6mo
Yes, that's the main way this could work. The question is whether an AI understands things that humans can't understand by doing amplification/debate/rrm, our guess is yes and the argument is mostly "until the builder explains why, gradient descent and science may just have pretty different strengths and weaknesses" (and we can make that more concrete by fleshing out what the world may be like and what the AI learns by gradient descent). But it seemed worth raising because this does appear to make the bad reporter's job much harder and greatly restrict the space of cases where it fails to report tampering. Methodologically, the way I think about this kind of thing is: (i) we had a counterexample, (ii) after making this change that particular counterexample no longer works, (iii) now we want to think through whether the counterexample can be adapted. This is also legitimately less obvious. An AI can't simulate (human+AI helpers), since each AI helper is as smart as the AI itself and so simulating (human+AI helpers) clearly requires more compute than the AI has. The counterexample is that the AI should just try its best to do inference in the Bayes net that includes "everything the human could understand with the amount of science they have time to do." But that does still leave the builder with avenues to try to strengthen the algorithm and win. One way is discussed in the section on speed regularization [https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit#heading=h.goyuzwqyv9m8] : if the AI is "trying its best" to do inference in the human Bayes net then there might always be returns to having more time to think (and so it might be able to benefit by transferring over its understanding of what was happening in the AI Bayes net rather than recomputing from the observations). The next step for a builder who wanted to take this approach would be to argue that they can reliably construct a complex enough dataset that this advantage is
Biology-Inspired AGI Timelines: The Trick That Never Works

For reference, here is a 2004 post by Moravec, that’s helpfully short, containing his account of his own predictions: https://www.frc.ri.cmu.edu/~hpm/project.archive/robot.papers/2004/Predictions.html

Biology-Inspired AGI Timelines: The Trick That Never Works

Hmm, alas, stopped reading too soon.

Is Humbali right that generic uncertainty about maybe being wrong, without other extra premises, should increase the entropy of one's probability distribution over AGI, thereby moving out its median further away in time?

I'll add a quick answer: my gut says technically true, but that mostly I should just look at the arguments because they provide more weight than the prior. Strong evidence is common. It seems plausible to me that the prior over 'number of years away' should make me predict it's more like 10 trillion years

... (read more)
Biology-Inspired AGI Timelines: The Trick That Never Works

For indeed in a case like this, one first backs up and asks oneself "Is Humbali right or not?" and not "How can I prove Humbali wrong?"

Gonna write up some of my thoughts here without reading on, and post them (also without reading on).

I don’t get why Humbali’s objection has not already been ‘priced in’. Eliezer has a bunch of models and info and his gut puts the timeline at before 2050. I don’t think “what if you’re mistaken about everything” isn’t an argument Eliezer already considered, so I think it’s already priced into the prediction. You’re not allowe

... (read more)
2Ben Pace7mo
Hmm, alas, stopped reading too soon.
Christiano, Cotra, and Yudkowsky on AI progress

Wow thanks for pulling that up. I've gotta say, having records of people's predictions is pretty sweet. Similarly, solid find on the Bostrom quote.

Do you think that might be the 20% number that Eliezer is remembering? Eliezer, interested in whether you have a recollection of this or not. [Added: It seems from a comment upthread that EY was talking about superforecasters in Feb 2016, which is after Fan Hui.]

Christiano, Cotra, and Yudkowsky on AI progress

Adding my recollection of that period: some people made the relevant updates when DeepMind's system beat the European Champion Fan Hui (in October 2015). My hazy recollection is that beating Fan Hui started some people going "Oh huh, I think this is going to happen" and then when AlphaGo beat Lee Sedol (in March 2016) everyone said "Now it is happening".

It seems from this Metaculus question that people indeed were surprised by the announcement of the match between Fan Hui and AlphaGo (which was disclosed in January, despite the match happening months earlier, according to Wikipedia).

It seems hard to interpret this as AlphaGo being inherently surprising though, because the relevant fact is that the question was referring only to 2016. It seems somewhat reasonable to think that even if a breakthrough is on the horizon, it won't happen imminently with high probability.

Perhaps a better source of evidence of A... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Thank you for this follow-up comment Adam, I appreciate it.

Discussion with Eliezer Yudkowsky on AGI interventions

Glad to hear. And yeah, that’s the crux of the issue for me.

Discussion with Eliezer Yudkowsky on AGI interventions

Follow-up

One of Eliezer's claims here is

It is very, very clear that at present rates of progress, adding that level of alignment capability as grown over the next N years, to the AGI capability that arrives after N years, results in everybody dying very quickly.

This is a claim I basically agree with.

I don't think the situation is entirely hopeless, but I don't think any of the current plans (or the current alignment field) are on track to save us.

Discussion with Eliezer Yudkowsky on AGI interventions

Thank you for the links Adam. To clarify, the kind of argument I'm really looking for is something like the following three (hypothetical) examples.

  • Mesa-optimization is the primary threat model of unaligned AGI systems. Over the next few decades there will be a lot of companies building ML systems that create mesa-optimizers. I think it is within 5 years of current progress that we will understand how ML systems create mesa-optimizers and how to stop it.Therefore I think the current field is adequate for the problem (80%).
  • When I look at the research we're
... (read more)

Thanks for the examples, that helps a lot.

I'm glad that I posted my inflammatory comment, if only because exchanging with you and Rob made me actually consider the question of "what is our story to success", instead of just "are we making progress/creating valuable knowledge". And the way you two have been casting it is way less aversive to me that the way EY tends to frame it. This is definitely something I want to think more about. :)

I want to leave this paragraph as social acknowledgment that you mentioned upthread that you're tired and taking a break,

... (read more)
Discussion with Eliezer Yudkowsky on AGI interventions

Adam, can you make a positive case here for how the work being done on prosaic alignment leads to success? You didn't make one, and without it I don't understand where you're coming from. I'm not asking you to tell me a story that you have 100% probability on, just what is the success story you're acting under, such that EY's stances seem to you to be mostly distracting people from the real work.

(Later added disclaimer: it's a good idea to add "I feel like..." before the judgment in this comment, so that you keep in mind that I'm talking about my impressions and frustrations, rarely stating obvious facts (despite the language making it look so))

Thanks for trying to understand my point and asking me for more details. I appreciate it.

Yet I feel weird when trying to answer, because my gut reaction to your comment is that you're asking the wrong question? Also, the compression of my view to "EY's stances seem to you to be mostly distracting people fro... (read more)

If superintelligence is approximately multimodal GPT-17 plus reinforcement learning, then understanding how GPT-3-scale algorithms function is exceptionally important to understanding super-intelligence.

Also, if superintelligence doesn’t happen then prosaic alignment is the only kind of alignment.

Discussion with Eliezer Yudkowsky on AGI interventions

Aaaaaaaaaaaaahhhhhhhhhhhhhhhhh!!!!!!!!!!!!

(...I'll be at the office, thinking about how to make enough progress fast enough.)

Follow-up

One of Eliezer's claims here is

It is very, very clear that at present rates of progress, adding that level of alignment capability as grown over the next N years, to the AGI capability that arrives after N years, results in everybody dying very quickly.

This is a claim I basically agree with.

I don't think the situation is entirely hopeless, but I don't think any of the current plans (or the current alignment field) are on track to save us.

We're Redwood Research, we do applied alignment research, AMA

Would you prefer questions here or on the EA Forum?

2Buck Shlegeris9mo
I think we prefer questions on the EA Forum.
The alignment problem in different capability regimes

There’s a related dynamic that came up in a convo I just had.

Alice: My current work is exploring if we can solve value loading using reward learning.

Bob: Woah, isn’t that obviously doomed? Didn’t Rohin write a whole sequence on this?

Alice: Well, I don’t want to solve the whole problem for arbitrary difficulty. I just want to know whether we can build something that gets the basics right in distributions that a present day human can understand. For example I reckon we may be able to teach an AI what murder is today, even if we can’t teach it what murder is

... (read more)
MIRI/OP exchange about decision theory

I was in the chat and don't have anything especially to "disclose". Joe and Nick are both academic philosophers who've studied at Oxford and been at FHI, with a wide range of interests. And Abram and Scott are naturally great people to chat about decision theory with when they're available.

Garrabrant and Shah on human modeling in AGI

What’s the second half of the versus in this section? It’s probably straightforward but I’d appreciate someone spelling it out.

Scott: And I'm basically distinguishing between a system that's learning how to do reasoning while being overseen and kept out of the convex hull of human modeling versus… And there are definitely trade-offs here, because you have more of a daemon problem or something if you're like, "I'm going to learn how to do reasoning," as opposed to, "I'm going to be told how to do reasoning from the humans." And so then you have to search over this richer space or something of how to do reasoning, which makes it harder.

3Scott Garrabrant10mo
I don't know, the negation of the first thing? A system that can freely model humans, or at least perform computation indistinguishable from modeling humans.
Agency and the unreliable autonomous car

What is this, "A Series of Unfortunate Logical Events"? I laughed quite a bit, and enjoyed walking through the issues in self-knowledge that the löbstacle poses.

AXRP Episode 9 - Finite Factored Sets with Scott Garrabrant

Curated, in part for this episode, and also as a celebration of the whole series. I've listened to 6 out of the 9, and I've learned a great deal about people's work and their motivations for it. This episode in particular was excellent because I finally learned what a finite factored set was – your example of the Cartesian plane was really helpful! Which is a credit to your communication skills.

Basically every episode has been worthwhile and valuable for me, it's been easy to sit down with a researcher and hear them explain their research, and Daniel alway... (read more)

I'm glad to hear that the podcast is useful for people :)

Load More