AI ALIGNMENT FORUM
AF

Machine Learning (ML)AI
Frontpage

17

"Decision Transformer" (Tool AIs are secret Agent AIs)

by gwern
9th Jun 2021
1 min read
4

17

This is a linkpost for https://sites.google.com/berkeley.edu/decision-transformer
Machine Learning (ML)AI
Frontpage
"Decision Transformer" (Tool AIs are secret Agent AIs)
7John Schulman
4Evan Hubinger
2gwern
New Comment
3 comments, sorted by
top scoring
Click to highlight new comments since: Today at 4:42 AM
[-]John Schulman4y70

Basically agree -- I think that a model trained by maximum likelihood on offline data is less goal-directed than one that's trained by an iterative process where you reinforce its own samples (aka online RL), but still somewhat goal directed. It needs to simulate a goal-directed agent to do a good job at maximum likelihood. OTOH it's mostly concerned with covering all possibilities, so the goal directed reasoning isn't emphasized. But with multiple iterations, the model can improve quality (-> more goal directedness) at the expense of coverage/diversity.

Reply
[-]Evan Hubinger4y40

(Moderation note: added to the Alignment Forum from LessWrong.)

Reply
[-]gwern4y*20

Rewards need not be written in natural language as crudely as "REWARD: +10 UTILONS". Something to think about as you continue to write text online.

And what of the dead? I own that I thought of myself, at times, almost as dead. Are they not locked below ground in chambers smaller than mine was, in their millions of millions? There is no category of human activity in which the dead do not outnumber the living many times over. Most beautiful children are dead. Most soldiers, most cowards. The fairest women and the most learned men – all are dead. Their bodies repose in caskets, in sarcophagi, beneath arches of rude stone, everywhere under the earth. Their spirits haunt our minds, ears pressed to the bones of our foreheads. Who can say how intently they listen as we speak, or for what word?

Reply
Moderation Log
Curated and popular this week
3Comments