This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Eliciting Latent Knowledge (ELK)
•
Applied to
AXRP Episode 29 - Science of Deep Learning with Vikrant Varma
by
DanielFilan
8h
ago
•
Applied to
Auditing LMs with counterfactual search: a tool for control and ELK
by
Jacob Pfau
2mo
ago
•
Applied to
Striking Implications for Learning Theory, Interpretability — and Safety?
by
Roger Dearnaley
4mo
ago
•
Applied to
Measurement tampering detection as a special case of weak-to-strong generalization
by
Ryan Greenblatt
4mo
ago
•
Applied to
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
by
Seb Farquhar
4mo
ago
•
Applied to
Betting on what is un-falsifiable and un-verifiable
by
Abhimanyu Pallavi Sudhir
5mo
ago
•
Applied to
Eliciting Latent Knowledge in Comprehensive AI Services Models
by
acabodi
5mo
ago
•
Applied to
Robustness of Contrast-Consistent Search to Adversarial Prompting
by
Nandi
6mo
ago
•
Applied to
Discovering Latent Knowledge in the Human Brain: Part 1 – Clarifying the concepts of belief and knowledge
by
Joseph Emerson
6mo
ago
•
Applied to
Attributing to interactions with GCPD and GWPD
by
Jenny Nitishinskaya
7mo
ago
•
Applied to
A personal explanation of ELK concept and task.
by
Zeyu Qin
7mo
ago
•
Applied to
Uncovering Latent Human Wellbeing in LLM Embeddings
by
ChengCheng
7mo
ago
•
Applied to
Ground-Truth Label Imbalance Impairs the Performance of Contrast-Consistent Search (and Other Contrast-Pair-Based Unsupervised Methods)
by
Tom Angsten
9mo
ago
•
Applied to
Still no Lie Detector for LLMs
by
ben_levinstein
9mo
ago
•
Applied to
Goal-misgeneralization is ELK-hard
by
rokosbasilisk
11mo
ago
•
Applied to
Contrast Pairs Drive the Empirical Performance of Contrast Consistent Search (CCS)
by
Scott Emmons
1y
ago