AI ALIGNMENT FORUM
AF

Wikitags

Language Models (LLMs)

Edited by Yaakov T, plex, et al. last updated 30th Dec 2024

Language Models are computer programs made to estimate the likelihood of a piece of text. "Hello, how are you?" is likely. "Hello, fnarg horses" is unlikely.

Language models can answer questions by estimating the likelihood of possible question-and-answer pairs, selecting the most likely question-and-answer pair. "Q: How are You? A: Very well, thank you" is a likely question-and-answer pair. "Q: How are You? A: Correct horse battery staple" is an unlikely question-and-answer pair.

The language models most relevant to AI safety are language models based on "deep learning". Deep-learning-based language models can be "trained" to understand language better, by exposing them to text written by humans. There is a lot of human-written text on the internet, providing loads of training material.

Deep-learning-based language models are getting bigger and better trained. As the models become stronger, they get new skills. These skills include arithmetic, explaining jokes, programming, and solving math problems.

There is a potential risk of these models developing dangerous capabilities as they grow larger and better trained. What additional skills will they develop given a few years?

See also

  • - A family of large language models created by
GPT
Subscribe
1
Subscribe
1
OpenAI
Discussion0
Discussion0
Posts tagged Language Models (LLMs)
137Simulators
janus
3y
90
81Alignment Implications of LLM Successes: a Debate in One Act
Zack M. Davis
2y
5
61How LLMs are and are not myopic
janus
2y
7
39Inverse Scaling Prize: Round 1 Winners
Ethan Perez, Ian McKenzie
3y
10
19How to Control an LLM's Behavior (why my P(DOOM) went down)
Roger Dearnaley
2y
0
18A "Bitter Lesson" Approach to Aligning AGI and ASI
Roger Dearnaley
1y
0
67Transformer Circuits
Evan Hubinger
4y
3
8Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
Roger Dearnaley
2y
0
138SolidGoldMagikarp (plus, prompt generation)
Jessica Rumbelow, mwatkins
2y
17
58The Waluigi Effect (mega-post)
Cleo Nardo
2y
26
14Representation Tuning
Christopher Ackerman
1y
3
50Self-fulfilling misalignment data might be poisoning our AI models
Alex Turner
4mo
13
26Testing PaLM prompts on GPT3
Yitzi Litt
3y
0
38LLM Modularity: The Separability of Capabilities in Large Language Models
Nicky Pochinkov
2y
0
33Modulating sycophancy in an RLHF model via activation steering
Nina Panickssery
2y
19
Load More (15/211)
Add Posts