This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Language Models
•
Applied to
Hamiltonian Dynamics in AI: A Novel Approach to Optimizing Reasoning in Language Models
by
Javier Marin Valenzuela
3d
ago
•
Applied to
Musings on Text Data Wall (Oct 2024)
by
Vladimir Nesov
7d
ago
•
Applied to
Exploring SAE features in LLMs with definition trees and token lists
by
mwatkins
7d
ago
•
Applied to
Biasing VLM Response with Visual Stimuli
by
Jaehyuk Lim
9d
ago
•
Applied to
Do Sparse Autoencoders (SAEs) transfer across base and finetuned language models?
by
Taras Kutsyk
12d
ago
•
Applied to
In-Context Learning: An Alignment Survey
by
Alfie Lamerton
12d
ago
•
Applied to
Base LLMs refuse too
by
Connor Kissane
14d
ago
•
Applied to
Evaluating LLaMA 3 for political sycophancy
by
alma.liezenga
14d
ago
•
Applied to
Two new datasets for evaluating political sycophancy in LLMs
by
alma.liezenga
14d
ago
•
Applied to
Avoiding jailbreaks by discouraging their representation in activation space
by
Guido Bergman
15d
ago
•
Applied to
Self location for LLMs by LLMs: Self-Assessment Checklist.
by
weightt an
16d
ago
•
Applied to
The Geometry of Feelings and Nonsense in Large Language Models
by
Satvik Golechha
16d
ago
•
Applied to
Characterizing stable regions in the residual stream of LLMs
by
Jett Janiak
16d
ago
•
Applied to
[Linkpost] Play with SAEs on Llama 3
by
Raymond Arnold
16d
ago
•
Applied to
[Paper] Hidden in Plain Text: Emergence and Mitigation of Steganographic Collusion in LLMs
by
Yohan Mathew
17d
ago
•
Applied to
If I ask an LLM to think step by step, how big are the steps?
by
Ryan Blough
1mo
ago
•
Applied to
[Paper] Programming Refusal with Conditional Activation Steering
by
Bruce W. Lee
1mo
ago
•
Applied to
Checking public figures on whether they "answered the question" quick analysis from Harris/Trump debate, and a proposal
by
david reinstein
1mo
ago
•
Applied to
Redundant Attention Heads in Large Language Models For In Context Learning
by
skunnavakkam
1mo
ago