maximkazhenkov

Posts

Sorted by New

Wiki Contributions

Comments

AI Tracker: monitoring current and near-future risks from superscale models

Interesting; I hadn't heard of DreamerV2. From a quick look at the paper, it looks like one might describe it as a step on the way to something like EfficientZero. Does that sound roughly correct?

Yes. They don't share a common lineage, but are similar in that they're both recent advances in efficient model-based RL. Personally speaking, I think this is the subfield to be closely tracking progress in, because 1) it has far-reaching implications in the long term and 2) it has garnered relatively little attention compared to other subfields.

We may extend this to older models in the future. But our goal right now is to focus on these models' public safety risks as standalone (or nearly standalone) systems.

I see. If you'd like to visualize trends though, you'll need more historical data points, I think.

AI Tracker: monitoring current and near-future risks from superscale models

DreamerV2 seems worthy of inclusion to me. In general, it would be great to see older models incorporated as well; I know this has been done before but having it integrated in a live tracker like yours would be super convenient as a one-stop shop of historical context. It would save people from making lots of new lists every time an important new model gets released.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Oh I see, did I misunderstand point 1. from Razied then or was it mistaken? I thought  and  were trained separately with 

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Learn the environment dynamics by self-supervision instead of relying only on reward signals. Meaning that they don't learn the dynamics end-to-end like in MuZero. For them the loss function for the enviroment dynamics is completely separate from the RL loss function.

I wonder how they prevent the latent state representation of observations from collapsing into a zero-vector, thus becoming completely uninformative and trivially predictable. And if this was the reason MuZero did things its way.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Unless I misinterpreted the results, the "500 times less data" is kind of a clickbait because they're referencing DQN with that statement and not SOTA or even MuZero.

Unfortunately they didn't include any performance-over-timesteps graph. I imagine it looks something like e.g. DreamerV2 where the new algorithm just converges at a higher performance, but if you clipped the whole graph at the max. performance level of DQN and ask how long it took the new algorithm to get to that level, the answer will be tens or hundreds times quicker because on the curve for the new algorithm, you haven't left the steep part of the hockey stick yet.

I'd love to see how this algorithm combines with curiosity-driven exploration. They benchmarked on only 26 out of the 57 classic Atari games and didn't include the notoriously hard Montezuma's Revenge, which I assume EfficientZero can't tackle yet. Otherwise, since it is a visual RL algorithm, why not just throw something like Dota 2 at it - if it's truly a "500 times less data"-level revolutionary change, running the experiment should be cheap enough already.

Matt Botvinick on the spontaneous emergence of learning algorithms

Isn't evolution a better analogy for deep learning anyway? All natural selection does is gradient descent (hill climbing technically), with no capacity for lookahead. And we've known this one for 150 years!

Are we in an AI overhang?

If you extrapolated those straight lines further, doesn't it mean that even small businesses will be able to afford training their own quadrillion-parameter-models just a few years after Google?

Are we in an AI overhang?

Is density even relevant when your computations can be run in parallel? I feel like price-performance will be the only relevant measure, even if that means slower clock cycles.