Measure - AI Alignment Forum

Steering GPT-2-XL by adding an activation vector

"party", "ceremony", "dress", "with", "photographer"

While these aren't syntactically valid continuations of the prompt, they are highly likely (and syntactically valid) continuations for "wedding ". More than just being wedding-related, these seem like direct continuations.

Redwood Research’s current project

Measure3y00

Similarly, you might think that a promising approach is to look for snippets which cause the generator to generate violent completions with particularly high probability.

It seems like there could be a problem if you're working with "prompts that are especially likely to have a violent continuation" where non-violent continuations would consistently seem unrealistic and thus fail the 50% goal.

The Blackwell order as a formalization of knowledge

Measure3y00

Blackwell’s theorem says that the conditions under which κ1 can be said to be more generally useful than κ2 are precisely the situations where κ1 is a post-garbling of κ2.

Is this reversed?

Where are intentions to be found?

Measure4y30

It's not obvious to me that the information you're looking for is not present in a single toe. In the same way that an advanced AI could discover General Relativity by carefully examining a few frames of a falling apple, couldn't it infer something about human/rabbit/rainforest values by observing the behavior of a toe? My concern would instead be that there is too much information and that the AI would pick out some values but not necessarily the ones you expect.

Donald Hobson's Shortform

Measure4y00

It seems like the button-works action will usually be some variety of "take preemptive action to ensure the button won't be pressed" and so the AI will have a high chance to shut down at each decision step.

Limiting Causality by Complexity Class

Measure4y00

I guess that makes sense. Thanks for clarifying!

Limiting Causality by Complexity Class

Measure4y00

Computational complexity only makes sense in terms of varying sizes of inputs. Are some Y events "bigger" than others in some way so that you can look at how the program runtime depends on that "size"?

Limiting Causality by Complexity Class

Measure4y00

What do X and Y represent in this construction? What is the scaling parameter used to define the complexity class?

AI ALIGNMENT FORUM
AF

Posts

Wiki Contributions

Comments