This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Language Models
•
Applied to
LLMs seem (relatively) safe
by
JustisMills
1d
ago
•
Applied to
At last! ChatGPT does, shall we say, interesting imitations of “Kubla Khan”
by
Bill Benzon
3d
ago
•
Applied to
How LLMs Work, in the Style of The Economist
by
Rocket Drew
4d
ago
•
Applied to
What's up with all the non-Mormons? Weirdly specific universalities across LLMs
by
mwatkins
8d
ago
•
Applied to
Inducing Unprompted Misalignment in LLMs
by
Sam Svenningsen
8d
ago
•
Applied to
An examination of GPT-2's boring yet effective glitch
by
niplav
9d
ago
•
Applied to
Claude 3 Opus can operate as a Turing machine
by
Gunnar Zarncke
10d
ago
•
Applied to
Experiments with an alternative method to promote sparsity in sparse autoencoders
by
Eoin Farrell
11d
ago
•
Applied to
Claude wants to be conscious
by
Joe Kwon
14d
ago
•
Applied to
Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?
by
right..enough?
14d
ago
•
Applied to
Is LLM Translation Without Rosetta Stone possible?
by
cubefox
16d
ago
•
Applied to
End-to-end hacking with language models
by
Timothée Chauvin
22d
ago
•
Applied to
Language and Capabilities: Testing LLM Mathematical Abilities Across Languages
by
Ethan Edwards
23d
ago
•
Applied to
How To Choose an OpenAI Alternative LLM in 2024?
by
Kristy_Poole
1mo
ago
•
Applied to
How do LLMs give truthful answers? A discussion of LLM vs. human reasoning, ensembles & parrots
by
Owain Evans
1mo
ago
•
Applied to
Your LLM Judge may be biased
by
Rachel Freedman
1mo
ago
•
Applied to
Enhancing biosecurity with language models: defining research directions
by
jacobjacob
1mo
ago
•
Applied to
Could LLMs Help Generate New Concepts in Human Language?
by
Pekka Lampelto
1mo
ago
•
Applied to
[Linkpost] Vague Verbiage in Forecasting
by
trevor
1mo
ago
•
Applied to
Inferring the model dimension of API-protected LLMs
by
Ege Erdil
1mo
ago