Sorted by New

Wiki Contributions


Redwood Research’s current project

Similarly, you might think that a promising approach is to look for snippets which cause the generator to generate violent completions with particularly high probability.

It seems like there could be a problem if you're working with "prompts that are especially likely to have a violent continuation" where non-violent continuations would consistently seem unrealistic and thus fail the 50% goal.

The Blackwell order as a formalization of knowledge

Blackwell’s theorem says that the conditions under which κ1 can be said to be more generally useful than κ2 are precisely the situations where κ1 is a post-garbling of κ2.

Is this reversed?

Where are intentions to be found?

It's not obvious to me that the information you're looking for is not present in a single toe. In the same way that an advanced AI could discover General Relativity by carefully examining a few frames of a falling apple, couldn't it infer something about human/rabbit/rainforest values by observing the behavior of a toe? My concern would instead be that there is too much information and that the AI would pick out some values but not necessarily the ones you expect.

Donald Hobson's Shortform

It seems like the button-works action will usually be some variety of "take preemptive action to ensure the button won't be pressed" and so the AI will have a high chance to shut down at each decision step.

Limiting Causality by Complexity Class

I guess that makes sense. Thanks for clarifying!

Limiting Causality by Complexity Class

Computational complexity only makes sense in terms of varying sizes of inputs. Are some Y events "bigger" than others in some way so that you can look at how the program runtime depends on that "size"?

Limiting Causality by Complexity Class

What do X and Y represent in this construction? What is the scaling parameter used to define the complexity class?