AI ALIGNMENT FORUM
AF

GPTLanguage Models (LLMs)Machine Learning (ML)Scaling LawsAI
Frontpage

19

NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG

by Ozyrus
11th Oct 2021
1 min read
36

19

This is a linkpost for https://developer.nvidia.com/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/?fbclid=IwAR1XidfXX_6bis17pbPqLc9iTq5-a21Bwipnod3kziubSWEiWUGCyw2tRg0
GPTLanguage Models (LLMs)Machine Learning (ML)Scaling LawsAI
Frontpage
NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG
1gwern
New Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 8:23 AM
[-]gwern4y10

Paper is out: https://arxiv.org/abs/2201.11990

Reply
Moderation Log
More from Ozyrus
View more
Curated and popular this week
1Comments

In addition to reporting aggregate metrics on benchmark tasks, we also qualitatively analyzed model outputs and have intriguing findings (Figure 4). We observed that the model can infer basic mathematical operations from context (sample 1), even when the symbols are badly obfuscated (sample 2). While far from claiming numeracy, the model seems to go beyond only memorization for arithmetic.

We also show samples (the last row in Figure 4) from the HANS task where we posed the task containing simple syntactic structures as a question and prompted the model for an answer. Despite the structures being simple, existing natural language inference (NLI) models often have a hard time with such inputs. Fine-tuned models often pick up spurious associations between certain syntactic structures and entailment relations from systemic biases in NLI datasets. MT-NLG performs competitively in such cases without finetuning.

Seems like next big transformer model is here. No way to test it out yet, but scaling seems to continue, see quote.
It is not mixture of experts, so parameters mean something as compared to WuDao (also it beats GPT-3 on PiQA and LAMBADA).

How big of a deal is that?