Speedup on evolution?Maybe? Might work okayish, but doubt the best solution is that speculative.
As in, you could score some actions, but then there isn't a sense in which you "can" choose one according to any criterion.
I've noticed that issue as well. Counterfactuals are more a convenient model/story than something to be taken literally. You've grounded decision by taking counterfactuals to exist a priori. I ground them by noting that our desire to construct counterfactuals is ultimately based on evolved instincts and/or behaviours so these stories aren't just arbitrary stories but a way in which we can leverage the lessons that have been instilled in us by evolution. I'm curious, given this explanation, why do we still need choices to be actual?
I commented directly on your post.
Let A be some action. Consider the statement: "I will take action A". An agent believing this statement may falsify it by taking any action B not equal to A. Therefore, this statement does not hold as a law. It may be falsified at will.
If you believe determinism then an agent can sometimes falsify it, sometimes not.
I think it's quite clear how shifting ontologies could break a specification of values. And sometimes you just need a formalisation, any formalisation, to play around with. But I suppose it depends more of the specific details of your investigation.
I strongly disagree with your notion of how privileging the hypothesis works. It's not absurd to think that techniques for making AIXI-tl value diamonds despite ontological shifts could be adapted for other architectures. I agree that there are other examples of people working on solving problems within a formalisation that seem rather formalisation specific, but you seem to have cast the net too wide.
I tend to agree that burning up the timeline is highly costly, but more because Effective Altruism is an Idea Machine that has only recently started to really crank up. There's a lot of effort being directed towards recruiting top students from uni groups, but these projects require time to pay off.
I’m giving this example not to say “everyone should go do agent-foundations-y work exclusively now!”. I think it’s a neglected set of research directions that deserves far more effort, but I’m far too pessimistic about it to want humanity to put all its egg
An ability to refuse to generate theories about a hypothetical world being in a simulation.
I guess the problem with this test is that the kinds of people who could do this tend to be busy, so they probably can't do this with so little notice.
Hmm... It seems much, much harder to catch every single one than to catch 99%.
Regarding the point about most alignment work not really addressing the core issue: I think that a lot of this work could potentially be valuable nonetheless. People can take inspiration from all kinds of things and I think there is often value in picking something that you can get a grasp on, then using the lessons from that to tackle something more complex. Of course, it's very easy for people to spend all of their time focusing on irrelevant toy problems and never get around to making any progress on the real problem. Plus there are costs with adding more voices into the conversation as it can be tricky for people to distinguish the signal from the noise.
If we tell an AI not to invent nanotechnology, not to send anything to protein labs, not to hack into all of the world's computers, not to design weird new quantum particles, not to do 100 of the other most dangerous and weirdest things we can think of, and then ask it to generalize and learn not to do things of that sort
I had the exact same thought. My guess would be that Eliezer might say that since the AI is maximising if the generalisation function misses even one action of this sort as something that we should exclude that we're screwed.
I tend to value a longer timeline more than a lot of other people do. I guess I see EA and AI Safety setting up powerful idea machines that get more powerful when they are given more time to gear up. A lot more resources have been invested into EA field-building recently, but we need time for these investments to pay off. At EA London this year, I gained a sense that AI Safety movement building is only now becoming its own thing; and of course it'll take time to iterate to get it right, then time for people to pass through the programs, then time for... (read more)
How large do you expect Conjecture to become? What percent of people do you expect to be working on the product and what percentage to be working on safety?
Random idea: A lot of people seem discouraged from doing anything about AI Safety because it seems like such a big overwhelming problem.
What if there was a competition to encourage people to engage in low-effort actions towards AI safety, such as hosting a dinner for people who are interested, volunteering to run a session on AI safety for their local EA group, answering a couple of questions on the stampy wiki, offering to proof-read a few people’s posts or offering a few free tutorial sessions to aspiring AI Safety Researchers.
I think there’s a dec... (read more)
Thoughts on the introduction of Goodhart's. Currently, I'm more motivated by trying to make the leaderboard, so maybe that suggests that merely introducing a leaderboard, without actually paying people, would have had much the same effect. Then again, that might just be because I'm not that far off. And if there hadn't been the payment, maybe I wouldn't have ended up in the position where I'm not that far off.
I guess I feel incentivised to post a lot more than I would otherwise, but especially in the comments rather than the posts since if you post a lot o... (read more)
If we have an algorithm that aligns an AI with X values, then we can add human values to get an AI that is aligned with human values.
On the other hand, I agree that it doesn't really make sense to declare an AI safe in the abstract, rather than in respect to say human values. (Small counterpoint: in order to be safe, it's not just about alignment, you also need to avoid bugs. This can be defined without reference to human values. However, this isn't sufficient for safety).
I suppose this works as a criticism of approaches like quantisers or impact-minimisation which attempt abstract safety. Although I can't see any reason why it'd imply that it's impossible to write an AI that can be aligned with arbitrary values.
If you think this is financially viable, then I'm fairly keen on this, especially if you provide internships and development opportunities for aspiring safety researchers.
In science and engineering, people will usually try very hard to make progress by standing on the shoulders of others. The discourse on this forum, on the other hand, more often resembles that of a bunch of crabs in a bucket.
Hmm... Yeah, I certainly don't think that there's enough collaboration or appreciation of the insights that other approaches may provide.Any thoughts on how to encourage a healthier dynamic.
The object-level claims here seem straightforwardly true, but I think "challenges with breaking into MIRI-style research" is a misleading way to characterize it. The post makes it sound like these are problems with the pipeline for new researchers, but really these problems are all driven by challenges of the kind of research involved.
There's definitely some truth to this, but I guess I'm skeptical that there isn't anything that we can do about some of these challenges. Actually, rereading I can see that you've conceded this towards the end of your post. I... (read more)
Even if the content is proportional, the signal-to-noise ratio will still be much higher for those interested in MIRI-style research. This is a natural consequence of being a niche area.When I said "might not have the capacity to vet", I was referring to a range of orgs.I would be surprised if the lack of papers didn't have an effect as presumably, you're trying to highlight high-quality work and people are more motivated to go the extra yard when trying to get published because both the rewards and standards are higher.
Just some sort of official & long-term& OFFLINE study program that would teach some of the previous published MIRI research would be hugely beneficial for growing the AF community.
At the last EA global there was some sort of AI safety breakout session. There were ~12 tables with different topics. I was dismayed to discover that almost every table was full with people excitingly discussing various topics in prosaic AI alignment and other things the AF table had just 2 (!) people.
Wow, didn't realise it was that little!
I have spoken with MIRI p
Some things that might be involved
I might add that I know a number of people interested in AF who feel somewhat afloat/find it difficult to contribute. Feels a bit like a waste of talent
Sorry, I wasn't criticizing your work.I think that the lack of an equivalent of papers for MIRI-style research also plays a role here in that if someone writes a paper it's more likely to make it into the newsletter. So the issue is further down the pipeline.
Hmm... Oh, I think that was elsewhere on this thread. Probably not to you. Eliezer's Where Recursive Justification Hits Bottom seems to embrace a circular epistemology despite its title.
Wait, I was under the impression from the quoted text that you make a distinction between 'circular epistemology' and 'other types of epistemology that will hit a point where we can provide no justification at all'. i.e. these other types are not circular because they are ultimately defined as a set of axioms, rewriting rules, and observational protocols for which no further justification is being attempted.
If you're referring to the Wittgenstenian quote, I was merely quoting him, not endorsing his views.
Yeah, I believe epistemology to be inherently circular. I think it has some relation to counterfactuals being circular, but I don't see it as quite the same as counterfactuals seem a lot harder to avoid using than most other concept. The point of mentioning circular epistemology was to persuade people that my theory isn't as absurd as it sounds at first.
What I mean is that some people seem to think that if they can describe a system that explains counterfactuals without mentioning counterfactuals when explaining them that they've avoided a circular dependence. When of course, we can't just take things at face value, but have to dig deeper than that.
I added a comment on the post directly, but I will add: we seem to roughly agree on counterfactuals existing in the imagination in a broad sense (I highlighted two ways this can go above - with counterfactuals being an intrinsic part of how we interact with the world or a pragmatic response to navigating the world). However, I think that following this through and asking why we care about them if they're just in our imagination ends up taking us down a path where counterfactuals being circular seems plausible. On the other hand, you seem to think that this path takes us somewhere where there isn't any circularity. Anyway, that's the difference in our positions as far as I can tell from having just skimmed your link.
I don't suppose you could clarify:
Agent :: Agent -> Situation -> Choice
It seems strange for an agent to take another agent and a situation and return a choice.
I also think this approach matches our intuition about how counterfactuals work. We imagine ourselves as the same except we're choosing this particular behavior. Surely, in the formal reasoning, there might also be a distinction between the initial agent and the agent within that counterfactual, considering it's present in our own imaginations?
Yeah, this is essentially my position as well. My m... (read more)
Overall, reading the post and the comment section, I feel that, if I reject Newcomb's Problem as a test, I can only ever write things that will not meet your prize criterion of usefully engaging with 'circular dependency'.
Firstly, I don't see why that would interfere with evaluating possible arguments for and against circular dependency. It's possible for an article to be here's why these 3 reasons why we might think counterfactuals are circular are all false (not stating that an article would have to necessarily engage with 3 different arguments to win).S... (read more)
I think there is "machinery that underlies counterfactual reasoning"
I agree that counterfactual reasoning is contingent on certain brain structures, but I would say the same about logic as well and it's clear that the logic of a kindergartener is very different from that of a logic professor - although perhaps we're getting into a semantic debate - and what you mean is that the fundamental machinery is more or less the same.
I was initially assuming (by default) that if you're trying to understand counterfactuals, you're mainly trying to understand how this
I'm not sure where you got the idea that this was to solve the spurious counterfactuals problem, that was in the appendix because I anticipated that a MIRI-adjacent person would want to know how it solves that problem.
Thanks for that clarification.
A way that EDT fails to solve 5 and 10 is that it could believe with 100% certainty that it takes $5 so its expected value for $10 is undefined
I suppose that demonstrates that the 5 and 10 problem is a broader problem than I realised. I still think that it's only a hard problem within particular systems that have... (read more)
Basically you can't say different theories really disagree unless there's some possible world / counterfactual / whatever in which they disagree;
Agreed, this is yet another argument for considering counterfactuals to be so fundamental that they don't make sense outside of themselves. I just don't see this as incompatible with determinism, b/c I'm grounding using counterfactuals rather than agency.
Those seem pretty much equivalent? Maybe by agency you mean utility function optimization, which I didn't mean to imply was required.
I don't mean utility function... (read more)
Thoughts on Modeling Naturalized Logic Decision Theory Problems in Linear LogicI hadn't heard of linear logic before - it seems like a cool formalisation - although I tend to believe that formalisations are overrated as unless they are used very carefully they can obscure more than they reveal.I believe that spurious counterfactuals are only an issue with the 5 and 10 problem because of an attempt to hack logical-if to substitute for counterfactual-if in such a way that we can reuse proof-based systems. It's extremely cool that we can do as much as we can ... (read more)
Comments on A critical agential account of free will, causation, and physics
Consider the statement: "I will take action A". An agent believing this statement may falsify it by taking any action B not equal to A. Therefore, this statement does not hold as a law. It may be falsified at will.
We can imagine a situation where there is a box containing an apple or a pear. Suppose we believe that it contains a pear, but we believe it contains an apple. If we look in the box (and we have good reason to believe looking doesn't change the contents), then we'll falsf... (read more)
Why did you give a talk on causal graphs if you didn't think this kind of work was interesting or relevant? Maybe I'm misunderstanding what you're saying isn't interesting or relevant.
I just meant, there are many possible counterfactual mental models that one can construct.
I agree that there isn't a single uniquely correct notion of a counterfactual. I'd say that we want different things from this notion and there are different ways to handle the trade-offs.
By the same token, I think every neurotypical human thinking about Newcomb's problem is using counterfactual reasoning, and I think that there isn't any interesting difference in the general nature of the counterfactual reasoning that they're using.
I find this confusing as CDT counte... (read more)
By the same token, I think every neurotypical human thinking about Newcomb's problem is using counterfactual reasoning, and I think that there isn't any interesting difference in the general nature of the counterfactual reasoning that they're using.I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
I find this confusing as CDT counterfactuals where you can only project forward seem very different from things like FDT where you can project back in time as well.
I think there is "machinery that underlies counterfactual reasoning" (which incidentally happens to be the same as "the machinery that underlies imag... (read more)
I'd say this is a special case of epistemic circularity
I wouldn't be surprised if other concepts such as probability were circular in the same way as counterfactuals, although I feel that this is more than just a special case of epistemic circularity. Like I agree that we can only reason starting from where we are - rather than from the view from nowhere - but counterfactuals feel different because they are such a fundamental concept that appears everywhere. As an example, our understanding of chairs doesn't seem circular in quite the same sense. That said... (read more)
You've linked me to three different posts, so I'll address them in separate comments.
Two Alternatives to Logical Counterfactuals
I actually really liked this post - enough that I changed my original upvote to a strong upvote. I also disagree with the notion that logical counterfactuals make sense when taken literally so I really appreciated you making this point persuasively. I agreed with your criticisms of the material condition approach and I think policy-dependent source code could be potentially promising. I guess this naturally leads to the ques... (read more)
I also think that there are lots of specific operations that are all "counterfactual reasoning"
Agreed. This is definitely something that I would like further clarity on
Instead I would be more inclined to look for a very complicated explanation of the mistake, related to details of its training data and so on.
I guess the real-world reasons for a mistake are sometimes not very philosophically insightful (ie. Bob was high when reading the post, James comes from a Spanish speaking background and they use their equivalent of a word differently than English-spea... (read more)
What are your thoughts on Newcomb's, ect?
Update: I should further clarify that even though I provided a rough indication of how important I consider various approaches, this is off-the-cuff and I could be persuaded an approach was more valuable than I think, particularly if I saw good quality work.I guess my ultimate interest is normative as the whole point of investigating this area is to figure out what we should do.
However, I am interested in descriptive theories insofar as they can contribute to this investigation (and not insofar as the details aren't useful for normative theories). For exam... (read more)
Sorry, when you say A is solved, you're claiming that the circularity is known to be true, right?
Zack seems to be claiming that Bayesian Networks both draw out the implications and show that the circularity is false.
So unless I'm misunderstanding you, your answer seems to be at odds with Zack.
Which part are you claiming is a solved problem? Is it:
a) That counterfactuals can only be understood within the counterfactual perspective ORb) The implications of this for decision theory ORc) Both
Concepts are generally clusters and I would say that being well-predicted by the Intentional Strategy is one aspect of what is meant by agency.Another aspect relates to the interior functioning of an object. A very simple model would be to say that we generally expect the object to have a) some goals, b) counterfactual modeling abilities and c) to pursue the goals based on these modeling abilities. This definition is less appealing because it is much more vague and each of the elements in the previous sentence would need further clarification; however this... (read more)
How do you define alignment theory?
"Even if understanding completely something like agency would basically solve the problem, how long will it take (if it is ever reached)? Historical examples in both natural sciences and computer science show that the original paradigm of a field isn’t usually adapted to tackle questions deemed fundamental by later paradigms. And this progress of paradigms takes decades in the best of cases, and centuries in the worst! With the risk of short timelines, we can’t reasonably decide that this is the only basket to put our research eggs."
Yeah, this is one of the core problems that we need to solve. AI safety would seem much more tractable if we had more time to iterate through a series of paradigms.
That's an interesting point. I suppose it might be viable to acknowledge that the problem taken literally doesn't require the prediction to be correct outside of the factual, but nonetheless claim that we should resolve the vagueness inherent in the question about what exactly the counterfactual is by constructing it to meet this condition. I wouldn't necessarily be strongly against this - my issue is confusion about what an Oracle's prediction necessarily entails.Regarding, your notion about things being magically stipulated, I suppose there's some possib... (read more)
I presume Vladimir and me are likely discussing this from within the determinist paradigm in which "either the Oracle is wrong, or the choice is illusory" doesn't apply (although I propose a similar idea in Why 1-boxing doesn't imply backwards causation).
Isn't that prediction independent of your decision to grab your coat or not?