AI ALIGNMENT FORUM
AF

465
Matt Goldenberg
Ω4003
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
0Matt Goldenberg's Short Form Feed
6y
0
METR: Measuring AI Ability to Complete Long Tasks
Matt Goldenberg7mo58

I'm not at all convinced it has to be something discrete like "skills" or "achieved general intelligence". 

There are many continuous factors that I can imagine that help planning long tasks.

Reply
Have LLMs Generated Novel Insights?
Matt Goldenberg8mo413

I think this is one of the most important questions we currently have in relation to time to AGI, and one of the most important "benchmarks" that tell us where we are in terms of timelines.

Reply
Modern Transformers are AGI, and Human-Level
Matt Goldenberg2y10

The question is - how far can we get with in-context learning.  If we filled Gemini's 10 million tokens with Sudoku rules and examples, showing where it went wrong each time, would it generalize? I'm not sure but I think it's possible

Reply
Modern Transformers are AGI, and Human-Level
Matt Goldenberg2y11

It seems likely to me that you could create a prompt that would have a transformer do this.

Reply
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Matt Goldenberg2y58

Anthropic is making a big deal of this and what it means for AI safety - it sort of reminds me of the excitement MIRI had when discovering logical inductors. I've read through the paper, and it does seem very exciting to be able to have this sort of "dial" that can find interpretable features at different levels of abstraction.

I'm curious about other people's takes on this who work in alignment.  It seems like if something fundamental is being touched on here, then it could provide large boons to research agendas such as Mechanistic Interpretability agenda, Natural Abstractions, Shard Theory, and Ambitious Value Learning.

But it's also possible there are hidden gotchas I'm not seeing, or that this still doesn't solve the hard problem that people see in going from "inscrutable matrices" to "aligned AI".

What are people's takes? 

Especially interested in people who are full-time alignment researchers.

Reply
How LLMs are and are not myopic
Matt Goldenberg2y10

If you can come up with an experimental setup that does that it would be sufficient for me.

Reply
How LLMs are and are not myopic
Matt Goldenberg2y27

In my experience, larger models often become aware that they are a LLM generating text rather than predicting an existing distribution. This is possible because generated text drifts off distribution and can be distinguished from text in the training corpus.

I'm quite skeptical of this claim on face value, and would love to see examples.

I'd be very surprised if current models, absent the default prompts telling them they are an LLM, would spontaneously output text predicting they are an LLM unless steered in that direction.

Reply
To what extent is GPT-3 capable of reasoning?
Matt Goldenberg5y00

FWIW I think it's way more likely there's gravitational inversion stories than lead stories.

Reply
To what extent is GPT-3 capable of reasoning?
Matt Goldenberg5y00

Don't you think it's possible there are many stories involving gravitational inversion in it's training corpus, and it can recognize the pattern?

Reply
Memory Reconsolidation
2 years ago
(+541)
Case Study
5 years ago
(+77)
Organization Updates
5 years ago
(+53)