Google Gemini Announced

Jacob G-W

20 Google Gemini Announced

6th Dec 2023

1 min read

20

This is a linkpost for https://blog.google/technology/ai/google-gemini-ai/

Google just announced Gemini, and Hassabis claims that "in each of the 50 different subject areas that we tested it on, it's as good as the best expert humans in those areas"

State-of-the-art performance

We've been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.

With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.

Our new benchmark approach to MMLU enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, leading to significant improvements over just using its first impression.

It also seems like it can understand video, which is new for multimodal models (GPT-4 cannot do this currently).

DeepMindAI

Personal Blog

Google Gemini Announced

5Nisan

5Vanessa Kosoy

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:52 AM

[-]Nisan2y53

I wonder why Gemini used RLHF instead of Direct Preference Optimization (DPO). DPO was written up 6 months ago; it's simpler and apparently more compute-efficient than RLHF.

Is the Gemini org structure so sclerotic that it couldn't switch to a more efficient training algorithm partway through a project?
Is DPO inferior to RLHF in some way? Lower quality, less efficient, more sensitive to hyperparameters?
Maybe they did use DPO, even though they claimed it was RLHF in their technical report?

[-]Vanessa Kosoy2y52

in each of the 50 different subject areas that we tested it on, it's as good as the best expert humans in those areas

That sounds like an incredibly strong claim, but I suspect that the phrasing is very misleading. What kind of tests is Hassabis talking about here? Maybe those are tests that rely on remembering known facts much more than on making novel inferences? Surely Gemini is not (say) as good as the best mathematicians at solving open problems in mathematics?

Moderation Log

Curated and popular this week

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

20

Google Gemini Announced

20