All of Chris_Leong's Comments + Replies

Counterfactuals are Confusing because of an Ontological Shift

Speedup on evolution?

Maybe? Might work okayish, but doubt the best solution is that speculative.

Counterfactuals are Confusing because of an Ontological Shift

As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.

 

I've noticed that issue as well. Counterfactuals are more a convenient model/story than something to be taken literally. You've grounded decision by taking counterfactuals to exist a priori. I ground them by noting that our desire to construct counterfactuals is ultimately based on evolved instincts and/or behaviours so these stories aren't just arbitrary stories but a way in which we can leverage the lessons that have been instilled in us by evolution. I'm curious, given this explanation, why do we still need choices to be actual?

1Jessica Taylor5d
Do you think of counterfactuals as a speedup on evolution? Could this be operationalized by designing AIs that quantilize [https://intelligence.org/2015/11/29/new-paper-quantilizers/] on some animal population, therefore not being far from the population distribution, but still surviving/reproducing better than average?
2Jessica Taylor6d
Note the preceding I'm assuming use of a metaphysics in which you, the agent, can make choices. Without this metaphysics there isn't an obvious motivation for a theory of decisions. As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion. Maybe this metaphysics leads to contradictions. In the rest of the post I argue that it doesn't contradict belief in physical causality including as applied to the self.
A critical agential account of free will, causation, and physics

Let A be some action. Consider the statement: "I will take action A". An agent believing this statement may falsify it by taking any action B not equal to A. Therefore, this statement does not hold as a law. It may be falsified at will.

 

If you believe determinism then an agent can sometimes falsify it, sometimes not.
 

Humans provide an untapped wealth of evidence about alignment

I think it's quite clear how shifting ontologies could break a specification of values. And sometimes you just need a formalisation, any formalisation, to play around with. But I suppose it depends more of the specific details of your investigation.

Humans provide an untapped wealth of evidence about alignment

I strongly disagree with your notion of how privileging the hypothesis works. It's not absurd to think that techniques for making AIXI-tl value diamonds despite ontological shifts could be adapted for other architectures. I agree that there are other examples of people working on solving problems within a formalisation that seem rather formalisation specific, but you seem to have cast the net too wide.

2Alex Turner1mo
My basic point remains. Why is it not absurd to think that, without further evidential justification? By what evidence have you considered the highly specific investigation into AIXI-tl, and located the idea that ontology identification is a useful problem to think about at all (in its form of "detecting a certain concept in the AI")?
A note about differential technological development

I tend to agree that burning up the timeline is highly costly, but more because Effective Altruism is an Idea Machine that has only recently started to really crank up. There's a lot of effort being directed towards recruiting top students from uni groups, but these projects require time to pay off.

I’m giving this example not to say “everyone should go do agent-foundations-y work exclusively now!”. I think it’s a neglected set of research directions that deserves far more effort, but I’m far too pessimistic about it to want humanity to put all its egg

... (read more)
Let's See You Write That Corrigibility Tag

An ability to refuse to generate theories about a hypothetical world being in a simulation.

Let's See You Write That Corrigibility Tag

I guess the problem with this test is that the kinds of people who could do this tend to be busy, so they probably can't do this with so little notice.

AGI Ruin: A List of Lethalities

Hmm... It seems much, much harder to catch every single one than to catch 99%.

0Mass_Driver2mo
One of my assumptions is that it's possible to design a "satisficing" engine -- an algorithm that generates candidate proposals for a fixed number of cycles, and then, assuming at least one proposal with estimated utility greater than X has been generated within that amount of time, selects one of the qualifying proposals at random. If there are no qualifying candidates, the AI takes no action. If you have a straightforward optimizer that always returns the action with the highest expected utility, then, yeah, you only have to miss one "cheat" that improves "official" utility at the expense of murdering everyone everywhere and then we all die. But if you have a satisficer, then as long as some of the qualifying plans don't kill everyone, there's a reasonable chance that the AI will pick one of those plans. Even if you forget to explicitly penalize one of the pathways to disaster, there's no special reason why that one pathway would show up in a large majority of the AI's candidate plans.
AGI Ruin: A List of Lethalities

Regarding the point about most alignment work not really addressing the core issue: I think that a lot of this work could potentially be valuable nonetheless. People can take inspiration from all kinds of things and I think there is often value in picking something that you can get a grasp on, then using the lessons from that to tackle something more complex. Of course, it's very easy for people to spend all of their time focusing on irrelevant toy problems and never get around to making any progress on the real problem. Plus there are costs with adding more voices into the conversation as it can be tricky for people to distinguish the signal from the noise.

AGI Ruin: A List of Lethalities

If we tell an AI not to invent nanotechnology, not to send anything to protein labs, not to hack into all of the world's computers, not to design weird new quantum particles, not to do 100 of the other most dangerous and weirdest things we can think of, and then ask it to generalize and learn not to do things of that sort


I had the exact same thought. My guess would be that Eliezer might say that since the AI is maximising if the generalisation function misses even one action of this sort as something that we should exclude that we're screwed.

0Mass_Driver2mo
Sure, I agree! If we miss even one such action, we're screwed. My point is that if people put enough skill and effort into trying to catch all such actions, then there is a significant chance that they'll catch literally all the actions that are (1) world-ending and that (2) the AI actually wants to try. There's also a significant chance we won't, which is quite bad and very alarming, hence people should work on AI safety.
Confused why a "capabilities research is good for alignment progress" position isn't discussed more

I tend to value a longer timeline more than a lot of other people do. I guess I see EA and AI Safety setting up powerful idea machines that get more powerful when they are given more time to gear up.  A lot more resources have been invested into EA field-building recently, but we need time for these investments to pay off. At EA London this year, I gained a sense that AI Safety movement building is only now becoming its own thing; and of course it'll take time to iterate to get it right, then time for people to pass through the programs, then time for... (read more)

AMA Conjecture, A New Alignment Startup

How large do you expect Conjecture to become? What percent of people do you expect to be working on the product and what percentage to be working on safety? 

0Connor Leahy4mo
Ideally, we would like Conjecture to scale quickly. Alignment wise, in 5 years time, we want to have the ability to take a billion dollars and turn it into many efficient, capable, aligned teams of 3-10 people working on parallel alignment research bets, and be able to do this reliably and repeatedly. We expect to be far more constrained by talent than anything else on that front, and are working hard on developing and scaling pipelines to hopefully alleviate such bottlenecks. For the second question, we don't expect it to be a competing force (as in, we have people who could be working on alignment working on product instead). See point two in this comment [https://www.lesswrong.com/posts/rtEtTybuCcDWLk7N9/ama-conjecture-a-new-alignment-startup?commentId=pZmerzhJSADkwNJZx] .
Chris_Leong's Shortform

Random idea:  A lot of people seem discouraged from doing anything about AI Safety because it seems like such a big overwhelming problem.

What if there was a competition to encourage people to engage in low-effort actions towards AI safety, such as hosting a dinner for people who are interested, volunteering to run a session on AI safety for their local EA group, answering a couple of questions on the stampy wiki, offering to proof-read a few people’s posts or offering a few free tutorial sessions to aspiring AI Safety Researchers.

I think there’s a dec... (read more)

Chris_Leong's Shortform

Thoughts on the introduction of Goodhart's. Currently, I'm more motivated by trying to make the leaderboard, so maybe that suggests that merely introducing a leaderboard, without actually paying people, would have had much the same effect. Then again, that might just be because I'm not that far off. And if there hadn't been the payment, maybe I wouldn't have ended up in the position where I'm not that far off.

I guess I feel incentivised to post a lot more than I would otherwise, but especially in the comments rather than the posts since if you post a lot o... (read more)

General alignment plus human values, or alignment via human values?

If we have an algorithm that aligns an AI with X values, then we can add human values to get an AI that is aligned with human values.

On the other hand, I agree that it doesn't really make sense to declare an AI safe in the abstract, rather than in respect to say human values. (Small counterpoint: in order to be safe, it's not just about alignment, you also need to avoid bugs. This can be defined without reference to human values. However, this isn't sufficient for safety).

I suppose this works as a criticism of approaches like quantisers or impact-minimisation which attempt abstract safety. Although I can't see any reason why it'd imply that it's impossible to write an AI that can be aligned with arbitrary values.

Why I'm co-founding Aligned AI

If you think this is financially viable, then I'm fairly keen on this, especially if you provide internships and development opportunities for aspiring safety researchers.

3Stuart Armstrong6mo
Yes, those are important to provide, and we will.
Challenges with Breaking into MIRI-Style Research

In science and engineering, people will usually try very hard to make progress by standing on the shoulders of others. The discourse on this forum, on the other hand, more often resembles that of a bunch of crabs in a bucket.


Hmm... Yeah, I certainly don't think that there's enough collaboration or appreciation of the insights that other approaches may provide.

Any thoughts on how to encourage a healthier dynamic.

1Koen Holtman7mo
I have no easy solution to offer, except for the obvious comment that the world is bigger than this forum. My own stance is to treat the over-production of posts of type 1 above as just one of these inevitable things that will happen in the modern media landscape. There is some value to these posts, but after you have read about 20 of them, you can be pretty sure about how the next one will go. So I try to focus my energy, as a reader and writer, on work of type 2 instead. I treat arXiv as my main publication venue, but I do spend some energy cross-posting my work of type 2 here. I hope that it will inspire others, or at least counter-balance some of the type 1 work.
Challenges with Breaking into MIRI-Style Research

The object-level claims here seem straightforwardly true, but I think "challenges with breaking into MIRI-style research" is a misleading way to characterize it. The post makes it sound like these are problems with the pipeline for new researchers, but really these problems are all driven by challenges of the kind of research involved.


There's definitely some truth to this, but I guess I'm skeptical that there isn't anything that we can do about some of these challenges. Actually, rereading I can see that you've conceded this towards the end of your post. I... (read more)

4johnswentworth7mo
To be clear, I don't intend to argue that the problem is too hard or not worthwhile or whatever. Rather, my main point is that solutions need to grapple with the problems of teaching people to create new paradigms, and working with people who don't share standard frames. I expect that attempts to mimic the traditional pipelines of paradigmatic fields will not solve those problems. That's not an argument against working on it, it's just an argument that we need fundamentally different strategies than the standard education and career paths in other fields.
Challenges with Breaking into MIRI-Style Research

Even if the content is proportional, the signal-to-noise ratio will still be much higher for those interested in MIRI-style research. This is a natural consequence of being a niche area.

When I said "might not have the capacity to vet", I was referring to a range of orgs.

I would be surprised if the lack of papers didn't have an effect as presumably, you're trying to highlight high-quality work and people are more motivated to go the extra yard when trying to get published because both the rewards and standards are higher.

Challenges with Breaking into MIRI-Style Research

Just some sort of official & long-term& OFFLINE study program that would teach some of the previous published MIRI research would be hugely beneficial for growing the AF community.


Agreed.

At the last EA global there was some sort of AI safety breakout session. There were ~12 tables with different topics. I was dismayed to discover that almost every table was full with people excitingly discussing various topics in prosaic AI alignment and other things the AF table had just 2 (!) people.


Wow, didn't realise it was that little!

I have spoken with MIRI p

... (read more)

Unclear. Some things that might be involved

  • a somewhat anti/non academic vibe
  • a feeling that they have the smartest people anyway, only hire the elite few that have a proven track record
  • feeling that it would take too much time and energy to educate people
  • a lack of organisational energy
  • .... It would be great if somebody from MIRI could chime in.

I might add that I know a number of people interested in AF who feel somewhat afloat/find it difficult to contribute. Feels a bit like a waste of talent

Challenges with Breaking into MIRI-Style Research

Sorry, I wasn't criticizing your work.

I think that the lack of an equivalent of papers for MIRI-style research also plays a role here in that if someone writes a paper it's more likely to make it into the newsletter. So the issue is further down the pipeline.

2Rohin Shah7mo
To be clear, I didn't mean this comment as "stop cricitizing me". I meant it as "I think the statement is factually incorrect". The reason that the newsletter has more ML in it than MIRI work is just that there's more (public) work produced on the ML side. I don't think it's about the lack of papers, unless by papers you mean the broader category of "public work that's optimized for communication".
$1000 USD prize - Circular Dependency of Counterfactuals

Hmm... Oh, I think that was elsewhere on this thread. Probably not to you. Eliezer's Where Recursive Justification Hits Bottom seems to embrace a circular epistemology despite its title.

$1000 USD prize - Circular Dependency of Counterfactuals

Wait, I was under the impression from the quoted text that you make a distinction between 'circular epistemology' and 'other types of epistemology that will hit a point where we can provide no justification at all'. i.e. these other types are not circular because they are ultimately defined as a set of axioms, rewriting rules, and observational protocols for which no further justification is being attempted.

If you're referring to the Wittgenstenian quote, I was merely quoting him, not endorsing his views.

1Koen Holtman7mo
Not aware of which part would be a Wittgenstenian quote. Long time ago that I read Wittgenstein, and I read him in German. In any case, I remain confused on what you mean with 'circular'.
$1000 USD prize - Circular Dependency of Counterfactuals

Yeah, I believe epistemology to be inherently circular. I think it has some relation to counterfactuals being circular, but I don't see it as quite the same as counterfactuals seem a lot harder to avoid using than most other concept. The point of mentioning circular epistemology was to persuade people that my theory isn't as absurd as it sounds at first.

1Koen Holtman7mo
Wait, I was under the impression from the quoted text that you make a distinction between 'circular epistemology' and 'other types of epistemology that will hit a point where we can provide no justification at all'. i.e. these other types are not circular because they are ultimately defined as a set of axioms, rewriting rules, and observational protocols for which no further justification is being attempted. So I think I am still struggling to see what flavour of philosophical thought you want people to engage with, when you mention 'circular'. Mind you, I see 'hitting a point where we provide no justification at all' as a positive thing in a mathematical system, a physical theory, or an entire epistemology, as long as these points are clearly identified.
$1000 USD prize - Circular Dependency of Counterfactuals

What I mean is that some people seem to think that if they can describe a system that explains counterfactuals without mentioning counterfactuals when explaining them that they've avoided a circular dependence. When of course, we can't just take things at face value, but have to dig deeper than that.

1Koen Holtman7mo
OK thanks for explaining. See my other recent reply for more thoughts about this.
$1000 USD prize - Circular Dependency of Counterfactuals

I added a comment on the post directly, but I will add: we seem to roughly agree on counterfactuals existing in the imagination in a broad sense (I highlighted two ways this can go above - with counterfactuals being an intrinsic part of how we interact with the world or a pragmatic response to navigating the world). However, I think that following this through and asking why we care about them if they're just in our imagination ends up taking us down a path where counterfactuals being circular seems plausible. On the other hand, you seem to think that this path takes us somewhere where there isn't any circularity. Anyway, that's the difference in our positions as far as I can tell from having just skimmed your link.

2JoshuaOSHickman7mo
I was attempting to solve a relatively specific technical problem related to self-proofs using counterfactuals. So I suppose I do think (at least non-circular ones) are useful. But I'm not sure I'd commit to any broader philosophical statement about counterfactuals beyond "they can be used in a specific formal way to help functions prove statements about their own output in a way that avoid Lob's Theorem issues". That being said, that's a pretty good use, if that's the type of thing you want to do? It's also not totally clear if you're imagining counterfactuals the same way I am. I am using the English term because it matches the specific thing I'm describing decently well, but the term has a broad meaning, and without having an extremely specific imagining, it's hard to make any more statements about what can be done with them.
A Possible Resolution To Spurious Counterfactuals

I don't suppose you could clarify:

Agent :: Agent -> Situation -> Choice

It seems strange for an agent to take another agent and a situation and return a choice.

I also think this approach matches our intuition about how counterfactuals work. We imagine ourselves as the same except we're choosing this particular behavior. Surely, in the formal reasoning, there might also be a distinction between the initial agent and the agent within that counterfactual, considering it's present in our own imaginations?

Yeah, this is essentially my position as well. My m... (read more)

0JoshuaOSHickman7mo
The Agent needs access to a self pointer, and it is parameterized so it doesn't have to be a static pointer, as it was in the original paper -- this approach in particular needs it to be dynamic in this way. There are also use cases where a bit of code receives a pointer not to its exact self -- when it is called as a subagent, it will get the parent's pointer.
$1000 USD prize - Circular Dependency of Counterfactuals

Overall, reading the post and the comment section, I feel that, if I reject Newcomb's Problem as a test, I can only ever write things that will not meet your prize criterion of usefully engaging with 'circular dependency'.


Firstly, I don't see why that would interfere with evaluating possible arguments for and against circular dependency. It's possible for an article to be here's why these 3 reasons why we might think counterfactuals are circular are all false (not stating that an article would have to necessarily engage with 3 different arguments to win).

S... (read more)

1Koen Holtman7mo
OK, so if I understand you correctly, you posit that there is something called 'circular epistemology'. You said in the earlier post you link to at the top: You further suspect that circular epistemology might have something useful to say about counterfactuals, in terms of offering a justification for them without 'hitting a point where we can provide no justification at all'. And you have a bounty for people writing more about this. Am I understanding you correctly?
1Koen Holtman7mo
??? I don't follow. You meant to write "use system X instead of using system Y which calls itself a definition of counterfactuals "?
$1000 USD prize - Circular Dependency of Counterfactuals

I think there is "machinery that underlies counterfactual reasoning"

I agree that counterfactual reasoning is contingent on certain brain structures, but I would say the same about logic as well and it's clear that the logic of a kindergartener is very different from that of a logic professor - although perhaps we're getting into a semantic debate - and what you mean is that the fundamental machinery is more or less the same.

I was initially assuming (by default) that if you're trying to understand counterfactuals, you're mainly trying to understand how this

... (read more)
$1000 USD prize - Circular Dependency of Counterfactuals

I'm not sure where you got the idea that this was to solve the spurious counterfactuals problem, that was in the appendix because I anticipated that a MIRI-adjacent person would want to know how it solves that problem.


Thanks for that clarification.

A way that EDT fails to solve 5 and 10 is that it could believe with 100% certainty that it takes $5 so its expected value for $10 is undefined

I suppose that demonstrates that the 5 and 10 problem is a broader problem than I realised. I still think that it's only a hard problem within particular systems that have... (read more)

$1000 USD prize - Circular Dependency of Counterfactuals

Basically you can't say different theories really disagree unless there's some possible world / counterfactual / whatever in which they disagree;


Agreed, this is yet another argument for considering counterfactuals to be so fundamental that they don't make sense outside of themselves. I just don't see this as incompatible with determinism, b/c I'm grounding using counterfactuals rather than agency.

Those seem pretty much equivalent? Maybe by agency you mean utility function optimization, which I didn't mean to imply was required.

I don't mean utility function... (read more)

$1000 USD prize - Circular Dependency of Counterfactuals

Thoughts on Modeling Naturalized Logic Decision Theory Problems in Linear Logic

I hadn't heard of linear logic before - it seems like a cool formalisation - although I tend to believe that formalisations are overrated as unless they are used very carefully they can obscure more than they reveal.

I believe that spurious counterfactuals are only an issue with the 5 and 10 problem because of an attempt to hack logical-if to substitute for counterfactual-if in such a way that we can reuse proof-based systems. It's extremely cool that we can do as much as we can ... (read more)

2Jessica Taylor7mo
Thanks for reading all the posts! I'm not sure where you got the idea that this was to solve the spurious counterfactuals problem, that was in the appendix because I anticipated that a MIRI-adjacent person would want to know how it solves that problem. The core problem it's solving is that it's a well-defined mathematical framework in which (a) there are, in some sense, choices, and (b) it is believed that these choices correspond to the results of a particular Turing machine. It goes back to the free will vs determinism paradox, and shows that there's a formalism that has some properties of "free will" and some properties of "determinism". A way that EDT fails to solve 5 and 10 is that it could believe with 100% certainty that it takes $5 so its expected value for $10 is undefined. (I wrote previously [https://www.lesswrong.com/posts/Rcwv6SPsmhkgzfkDw/edt-solves-5-and-10-with-conditional-oracles] about a modification of EDT to avoid this problem.) CDT solves it by constructing physically impossible counterfactuals which has other problems, e.g. suppose there's a Laplace's demon that searches for violations of physics and destroys the universe if physics is violated; this theoretically shouldn't make a difference but it messes up the CDT counterfactuals. It does look like your post overall agrees with the view I presented. I would tend to call augmented reality "metaphysics" in that it is a piece of ontology that goes beyond physics. I wrote about metaphysical free will [https://unstableontology.com/2020/03/22/what-is-metaphysical-free-will/] a while ago and didn't post it on LW because I anticipated people would be allergic to the non-physicalist philosophical language.
$1000 USD prize - Circular Dependency of Counterfactuals

Comments on A critical agential account of free will, causation, and physics

Consider the statement: "I will take action A". An agent believing this statement may falsify it by taking any action B not equal to A. Therefore, this statement does not hold as a law. It may be falsified at will.

We can imagine a situation where there is a box containing an apple or a pear. Suppose we believe that it contains a pear, but we believe it contains an apple. If we look in the box (and we have good reason to believe looking doesn't change the contents), then we'll falsf... (read more)

1Jessica Taylor7mo
The main problem is that it isn't meaningful for their theories to make counterfactual predictions about a single situation; they can create multiple situations (across time and space) and assume symmetry and get falsification that way, but it requires extra assumptions. Basically you can't say different theories really disagree unless there's some possible world / counterfactual / whatever in which they disagree; finding a "crux" experiment between two theories (e.g. if one theory says all swans are white and another says there are black swans in a specific lake, the cruxy experiment looks in that lake) involves making choices to optimize disagreement. Those seem pretty much equivalent? Maybe by agency you mean utility function optimization, which I didn't mean to imply was required. The part I thought was relevant was the part where you can believe yourself to have multiple options and yet be implemented by a specific computer.
$1000 USD prize - Circular Dependency of Counterfactuals

Why did you give a talk on causal graphs if you didn't think this kind of work was interesting or relevant? Maybe I'm misunderstanding what you're saying isn't interesting or relevant.

$1000 USD prize - Circular Dependency of Counterfactuals

I just meant, there are many possible counterfactual mental models that one can construct.

I agree that there isn't a single uniquely correct notion of a counterfactual. I'd say that we want different things from this notion and there are different ways to handle the trade-offs.

By the same token, I think every neurotypical human thinking about Newcomb's problem is using counterfactual reasoning, and I think that there isn't any interesting difference in the general nature of the counterfactual reasoning that they're using.

I find this confusing as CDT counte... (read more)

By the same token, I think every neurotypical human thinking about Newcomb's problem is using counterfactual reasoning, and I think that there isn't any interesting difference in the general nature of the counterfactual reasoning that they're using.

I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.

I think there is "machinery that underlies counterfactual reasoning" (which incidentally happens to be the same as "the machinery that underlies imag... (read more)

$1000 USD prize - Circular Dependency of Counterfactuals

I'd say this is a special case of epistemic circularity


I wouldn't be surprised if other concepts such as probability were circular in the same way as counterfactuals, although I feel that this is more than just a special case of epistemic circularity. Like I agree that we can only reason starting from where we are - rather than from the view from nowhere - but counterfactuals feel different because they are such a fundamental concept that appears everywhere. As an example, our understanding of chairs doesn't seem circular in quite the same sense. That said... (read more)

1G Gordon Worley III7mo
Yes, I think there is something interesting going on where human brains seem to operate in a way that makes counterfactuals natural. I actually don't think there's anything special about counterfactuals, though, just that the human brain is designed such that thoughts are not strongly tethered to sensory input vs. "memory" (internally generated experience), but that's perhaps only subtly different than saying counterfactuals rather than something powering them is a fundamental feature of how our minds work.
$1000 USD prize - Circular Dependency of Counterfactuals

You've linked me to three different posts, so I'll address them in separate comments. 

Two Alternatives to Logical Counterfactuals

I actually really liked this post - enough that I changed my original upvote to a strong upvote. I also disagree with the notion that logical counterfactuals make sense when taken literally so I really appreciated you making this point persuasively. I agreed with your criticisms of the material condition approach and I think policy-dependent source code could be potentially promising. I guess this naturally leads to the ques... (read more)

$1000 USD prize - Circular Dependency of Counterfactuals

I also think that there are lots of specific operations that are all "counterfactual reasoning"


Agreed. This is definitely something that I would like further clarity on

Instead I would be more inclined to look for a very complicated explanation of the mistake, related to details of its training data and so on.

I guess the real-world reasons for a mistake are sometimes not very philosophically insightful (ie. Bob was high when reading the post, James comes from a Spanish speaking background and they use their equivalent of a word differently than English-spea... (read more)

2Steve Byrnes7mo
Hmm, my hunch is that you're misunderstanding me here. There are a lot of specific operations that are all "making a fist". I can clench my fingers quickly or slowly, strongly or weakly, left hand or right hand, etc. By the same token, if I say to you "imagine a rainbow-colored tree; are its leaves green?", there are a lot of different specific mental models that you might be invoking. (It could have horizontal rainbow stripes on the trunk, or it could have vertical rainbow stripes on its branches, etc.) All those different possibilities involve constructing a counterfactual mental model and querying it, in the same nuts-and-bolts way. I just meant, there are many possible counterfactual mental models that one can construct. Suppose I ask "There's a rainbow-colored tree somewhere in the world; are its leaves green?" You think for a second. What's happening under the surface when you think about this? Inside your head are various different models pushing in different directions. Maybe there's a model that says something like "rainbow-colored things tend to be rainbow-colored in all respects". So maybe you're visualizing a rainbow-colored tree, and querying the color of the leaves in that model, and this model is pushing on your visualized tree and trying to make it have a color scheme that's compatible with the kinds of things you usually see, e.g. in cartoons, which would be rainbow-colored leaves. But there's also a botany model that says "tree leaves tend to be green, because that's the most effective for photosynthesis, although there are some exceptions like Japanese maples and autumn colors". In scientifically-educated people, probably there will also be some metacognitive knowledge that principles of biology and photosynthesis are profound deep regularities in the world that are very likely to generalize , whereas color-scheme knowledge comes from cartoons etc. and is less likely to generalize. So what's at play is not "the nature of counterfactuals", but th
1IlyaShpitser7mo
I gave a talk at FHI ages ago on how to use causal graphs to solve Newcomb type problems. It wasn't even an original idea: Spohn had something similar in 2012. I don't think any of this stuff is interesting, or relevant for AI safety. There's a pretty big literature on model robustness and algorithmic fairness that uses causal ideas. If you want to worry about the end of the world, we have climate change, pandemics, and the rise of fascism.
$1000 USD prize - Circular Dependency of Counterfactuals

Update: I should further clarify that even though I provided a rough indication of how important I consider various approaches, this is off-the-cuff and I could be persuaded an approach was more valuable than I think, particularly if I saw good quality work.

I guess my ultimate interest is normative as the whole point of investigating this area is to figure out what we should do.

However, I am interested in descriptive theories insofar as they can contribute to this investigation (and not insofar as the details aren't useful for normative theories). For exam... (read more)

5Steve Byrnes7mo
I think brains build a generative world-model, and that world-model is a certain kind of data structure, and "counterfactual reasoning" is a class of operations that can be performed on that data structure. (See here [https://www.lesswrong.com/posts/SkcM4hwgH3AP6iqjs/can-you-get-agi-from-a-transformer] .) I think that counterfactual reasoning relates to reality only insofar as the world-model relates to reality. (In map-territory terminology: I think counterfactual reasoning is a set of things that you can do with the map, and those things are related to the territory only insofar as the map is related to the territory.) I also think that there are lots of specific operations that are all "counterfactual reasoning" (just as there are lots of specific operations that are all "paying attention"—paying attention to what?), and once we do a counterfactual reasoning operation, there are also a lot of things that we can do with the result of the operation. I think that, over our lifetimes, we learn metacognitive heuristics that guide these decisions (i.e. exactly what "counterfactual reasoning"-type operations to do and when, and what to do with the result of the operation), and some people's learned metacognitive heuristics are better than others (from the perspective of achieving such-and-such goal). Analogy: If you show me a particular trained ConvNet that misclassifies a particular dog picture as a cat, I wouldn't say that this reveals some deep truth about the nature of image classification, and I wouldn't conclude that there is necessarily such a thing as a philosophically-better type of image classifier that fundamentally doesn't ever make mistakes like that. (The brain image classifier makes mistakes too [https://en.wikipedia.org/wiki/Optical_illusion], albeit different mistakes than ConvNets make, but that's besides the point.) Instead I would be more inclined to look for a very complicated explanation of the mistake, related to details of its training data and
$1000 USD prize - Circular Dependency of Counterfactuals

Sorry, when you say A is solved, you're claiming that the circularity is known to be true, right?

Zack seems to be claiming that Bayesian Networks both draw out the implications and show that the circularity is false.

So unless I'm misunderstanding you, your answer seems to be at odds with Zack.

2G Gordon Worley III7mo
I don't think they're really at odds. Zack's analysis cuts off at a point where the circularity exists below it. There's still the standard epistemic circularity that exists whenever you try to ground out any proposition, counterfactual or not, but there's a level of abstraction where you can remove the seeming circularity by shoving it lower or deeper into the reduction of the proposition towards grounding out in some experience. Another way to put this is that we can choose what to be pragmatic about. Zack's analysis choosing to be pragmatic about counterfactuals at the level of making decisions, and this allows removing the circularity up to the purpose of making a decision. If we want to be pragmatic about, say, accurately predicting what we will observe about the world, then there's still some weird circularity in counterfactuals to be addressed if we try to ask questions like "why these counterfactuals rather than others?" or "why can we formulate counterfactuals at all?". Also I guess I should be clear that there's no circularity outside the map. Circularity is entirely a feature of our models of reality rather than reality itself. That's way, for example, the analysis on epistemic circularity I offer is that we can ground things out in purpose and thus the circularity was actually an illusion of trying to ground truth in itself rather than experience. I'm not sure I've made this point very clearly elsewhere before, so sorry if that's a bit confusing. The point is that circularity is a feature of the relative rather than the absolute, so circularity exists in the map but not the territory. We only get circularity by introducing abstractions that can allow things in the map to depend on each other rather than the territory.
$1000 USD prize - Circular Dependency of Counterfactuals

Which part are you claiming is a solved problem? Is it:

a) That counterfactuals can only be understood within the counterfactual perspective OR
b) The implications of this for decision theory OR
c) Both

1G Gordon Worley III7mo
I think A is solved, though I wouldn't exactly phrase it like that, more like counterfactuals make sense because they are what they are and knowledge works the way it does. Zack seems to be making a claim to B, but I'm not expert enough in decision theory to say much about it.
Grokking the Intentional Stance

Concepts are generally clusters and I would say that being well-predicted by the Intentional Strategy is one aspect of what is meant by agency.

Another aspect relates to the interior functioning of an object. A very simple model would be to say that we generally expect the object to have a) some goals, b) counterfactual modeling abilities and c) to pursue the goals based on these modeling abilities. This definition is less appealing because it is much more vague and each of the elements in the previous sentence would need further clarification; however this... (read more)

Epistemological Framing for AI Alignment Research

"Even if understanding completely something like agency would basically solve the problem, how long will it take (if it is ever reached)? Historical examples in both natural sciences and computer science show that the original paradigm of a field isn’t usually adapted to tackle questions deemed fundamental by later paradigms. And this progress of paradigms takes decades in the best of cases, and centuries in the worst! With the risk of short timelines, we can’t reasonably decide that this is the only basket to put our research eggs."

Yeah, this is one of the core problems that we need to solve. AI safety would seem much more tractable if we had more time to iterate through a series of paradigms.

Oracle predictions don't apply to non-existent worlds

That's an interesting point. I suppose it might be viable to acknowledge that the problem taken literally doesn't require the prediction to be correct outside of the factual, but nonetheless claim that we should resolve the vagueness inherent in the question about what exactly the counterfactual is by constructing it to meet this condition. I wouldn't necessarily be strongly against this - my issue is confusion about what an Oracle's prediction necessarily entails.

Regarding, your notion about things being magically stipulated, I suppose there's some possib... (read more)

Oracle predictions don't apply to non-existent worlds

I presume Vladimir and me are likely discussing this from within the determinist paradigm in which "either the Oracle is wrong, or the choice is illusory" doesn't apply (although I propose a similar idea in Why 1-boxing doesn't imply backwards causation).

Oracle predictions don't apply to non-existent worlds

Isn't that prediction independent of your decision to grab your coat or not?

1Vladimir Nesov1y
The prediction is why you grab your coat, it's both meaningful and useful to you, a simple counterexample to the sentiment that since correctness scope of predictions is unclear, they are no good. The prediction is not about the coat, but that dependence wasn't mentioned in the arguments against usefulness of predictions above.
Load More