This is a brief literature review of Text AutoEncoders, as I used them in a recent project and did not find a good resource covering them. TL;DR: There exist models that take some text -> encode it into a single vector -> decode back into approximately the same text. Meta's...
Apply to AI Safety Camp 2024 by 1st December 2023. All mistakes here are my own. Below are some summaries for each project proposal, listed in order of how they appear on the website. These are edited by me, and most have not yet been reviewed by the project leads....
Interpreting Models by Ablation. Image generated by DALL-E 3. Introduction Interpretability in machine learning, especially in language models, is an area with a large number of contributions. While this can be quite useful for improving our understanding of models, one issue is that there is the lack of robust benchmarks...
[Epistemic Status: Exploratory, and I may have confusions] Introduction LLMs and other possible RL agent have the property of taking many actions iteratively. However, not all possible short-term outputs are equally likely, and I think better modelling what these possible outcomes might look like could give better insight into what...
Separating out different capabilities. Post format: First, a 30-second TL;DR, next a 5-minute summary, and finally the full ~40-minute full length technical report. Special thanks to Lucius Bushnaq for inspiring this work with his work on modularity. TL;DR One important aspect of Modularity, is that there are different components of...
This post is written as an explanation of a misconception I had with transformer embedding when I was getting started. Thanks to Stephen Fowler for the discussion last August that made me realise the misconception, and others for helping me refine my explanation. Any mistakes are my own. Thanks to...
Epistemic Status: Highly Speculative. I spent less than a day thinking about this in particular, and though I have spent a few months studying large language models, I have never trained a language model. I am likely wrong about many things. I have not seen research on this, so it...