Understanding Gato's Supervised Reinforcement Learning

lorepieri

Understanding Gato's Supervised Reinforcement Learning

by lorepieri

1 min read18th May 20225 comments

3

DeepMindAI

Frontpage

This is a linkpost for https://lorenzopieri.com/rl_transformers/

The recent publication of Gato spurred a lot of discussion on wheter we may be witnessingth the first example of AGI. Regardless of this debate, Gato's makes use of recent developments in reinforcement learning, that is using supervised learning on reinforcement learning trajectories by exploiting the ability of transformer architectures to proficiently handle sequential data.

Reading the comments it seems that this point created some confusion to readers not familiar with these techniques. Some time ago I wrote an introductory article to how transformers can be used in reinforcement learning which may be helpful to clarify some of these doubts: https://lorenzopieri.com/rl_transformers/

New to LessWrong?

Getting Started

FAQ

Library

Understanding Gato's Supervised Reinforcement Learning

New Comment

5 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:50 AM

[-]Kenny2y20

Somewhat tangential to your post (which I only just started reading), but what would you suggest for people mostly-new to ML/DL – learn about all/most of the historical models/frameworks/architectures, or focus only/mostly on the 'SOTA' (state of the art)?

For concreteness, I've been following 'AI' for decades, tho mostly at an abstract high-level. (Neural networks were mostly impractical on my first PCs.) I recently decided to get some 'hands on' training/practice and completed Andrew Ng's Machine Learning course on Coursera (a few years ago). I'm tentatively reconsidering looking for a job opportunity in AI alignment/safety and so it occurred to me learn some more about working with the actual models that have been or are being used. But your intro about how transformers have 'obsoleted' various older models/frameworks/architectures made me think that it might be better to just skip a lot of the 'survey' I would have otherwise done.

[-]lorepieri2y20

In research there are a lot of publications, but few stand the test of time. I would suggest to you to look at the architectures which brought significant changes and ideas, those are still very relevant as they:

- often form the building block of current solutions

- they help you build intuition on how architectures can be improved

- it is often assumed in the field that you know about them

- they are often still useful, especially when having low resources

You should not need to look at more than 1-2 architectures per year in each field (computer vision, NLP, RL). Only then I would focus on SOTA.

You may want to check https://fullstackdeeplearning.com/spring2021/ it should have enough historic material to get you covered and expand from there, while also going quickly to modern topics.

[-]Kenny2y10

Thanks for the reply! This seems helpful and, I think, matches what I expected might be a good heuristic.

I'm not sure I know how to identify "the architectures which brought significant changes and ideas" – beyond what I've already been doing, i.e. following some 'feeds' and 'skimming headlines' with an occasional full read of posts like this.

What would you think about mostly focusing on SOTA and then, as needed, and potentially recursively, learning about the 'prior art' on which the current SOTA is built/based? Or does the "Full Stack Deep Learning" course materials provide a (good-enough) outline of all of the significant architectures worth learning about?

A side project I briefly started a little over a year ago, but have since mostly abandoned, was to re-implement the examples/demos from the Machine Learning course I took. I found the practical aspect to be very helpful – it was also my primary goal for taking the course; getting some 'practice'. Any suggestions about that for this 'follow-up survey'? For my side project, I was going to re-implement the basic models covered by that first course in a new environment/programming-language, but maybe that's too much 'yak shaving' for a broad survey.

[-]lorepieri2y60

Yea, what I meant is that the slides of Full Stack Deep Learning course materials provide a decent outline of all of the significant architectures worth learning.

I would personally not go to that low level of abstraction (e.g. implementing NNs in a new language) unless you really feel your understanding is shaky. Try building an actual side project, e.g. an object classifier for cars, and problems will arise naturally.

[-]Kenny2y10

Wonderful – I'll keep that in mind when I get around to reviewing/skimming that outline. Thanks for sharing it.

I have a particularly idiosyncratic set of reasons for the particular kind of 'yak shaving' I'm thinking of, but your advice, i.e. to NOT do any yak shaving, is noted and appreciated.

Moderation Log