Trinley Goldenberg — AI Alignment Forum

METR: Measuring AI Ability to Complete Long Tasks

I'm not at all convinced it has to be something discrete like "skills" or "achieved general intelligence".

There are many continuous factors that I can imagine that help planning long tasks.

I think this is one of the most important questions we currently have in relation to time to AGI, and one of the most important "benchmarks" that tell us where we are in terms of timelines.

Modern Transformers are AGI, and Human-Level

Trinley Goldenberg2y10

The question is - how far can we get with in-context learning. If we filled Gemini's 10 million tokens with Sudoku rules and examples, showing where it went wrong each time, would it generalize? I'm not sure but I think it's possible

Modern Transformers are AGI, and Human-Level

Trinley Goldenberg2y11

It seems likely to me that you could create a prompt that would have a transformer do this.

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Trinley Goldenberg2y58

Anthropic is making a big deal of this and what it means for AI safety - it sort of reminds me of the excitement MIRI had when discovering logical inductors. I've read through the paper, and it does seem very exciting to be able to have this sort of "dial" that can find interpretable features at different levels of abstraction.

I'm curious about other people's takes on this who work in alignment. It seems like if something fundamental is being touched on here, then it could provide large boons to research agendas such as Mechanistic Interpretability agenda, Natural Abstractions, Shard Theory, and Ambitious Value Learning.

But it's also possible there are hidden gotchas I'm not seeing, or that this still doesn't solve the hard problem that people see in going from "inscrutable matrices" to "aligned AI".

What are people's takes?

Especially interested in people who are full-time alignment researchers.

How LLMs are and are not myopic

Trinley Goldenberg2y10

If you can come up with an experimental setup that does that it would be sufficient for me.

How LLMs are and are not myopic

Trinley Goldenberg2y27

In my experience, larger models often become aware that they are a LLM generating text rather than predicting an existing distribution. This is possible because generated text drifts off distribution and can be distinguished from text in the training corpus.

I'm quite skeptical of this claim on face value, and would love to see examples.

I'd be very surprised if current models, absent the default prompts telling them they are an LLM, would spontaneously output text predicting they are an LLM unless steered in that direction.

To what extent is GPT-3 capable of reasoning?

Trinley Goldenberg5y00

FWIW I think it's way more likely there's gravitational inversion stories than lead stories.

To what extent is GPT-3 capable of reasoning?

Trinley Goldenberg5y00

Don't you think it's possible there are many stories involving gravitational inversion in it's training corpus, and it can recognize the pattern?

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments