NickyP — AI Alignment Forum

Literature Review of Text AutoEncoders

This is a brief literature review of Text AutoEncoders, as I used them in a recent project and did not find a good resource covering them. TL;DR: There exist models that take some text -> encode it into a single vector -> decode back into approximately the same text. Meta's...

Feb 19, 202522

AISC 2024 - Project Summaries

Apply to AI Safety Camp 2024 by 1st December 2023. All mistakes here are my own. Below are some summaries for each project proposal, listed in order of how they appear on the website. These are edited by me, and most have not yet been reviewed by the project leads....

Nov 27, 202348

Machine Unlearning Evaluations as Interpretability Benchmarks

Interpreting Models by Ablation. Image generated by DALL-E 3. Introduction Interpretability in machine learning, especially in language models, is an area with a large number of contributions. While this can be quite useful for improving our understanding of models, one issue is that there is the lack of robust benchmarks...

Oct 23, 202333

Ideation and Trajectory Modelling in Language Models

[Epistemic Status: Exploratory, and I may have confusions] Introduction LLMs and other possible RL agent have the property of taking many actions iteratively. However, not all possible short-term outputs are equally likely, and I think better modelling what these possible outcomes might look like could give better insight into what...

Oct 5, 202316

LLM Modularity: The Separability of Capabilities in Large Language Models

Separating out different capabilities. Post format: First, a 30-second TL;DR, next a 5-minute summary, and finally the full ~40-minute full length technical report. Special thanks to Lucius Bushnaq for inspiring this work with his work on modularity. TL;DR One important aspect of Modularity, is that there are different components of...

Mar 26, 2023103

LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space

This post is written as an explanation of a misconception I had with transformer embedding when I was getting started. Thanks to Stephen Fowler for the discussion last August that made me realise the misconception, and others for helping me refine my explanation. Any mistakes are my own. Thanks to...

Feb 13, 202385

Speculation on Path-Dependance in Large Language Models.

Epistemic Status: Highly Speculative. I spent less than a day thinking about this in particular, and though I have spent a few months studying large language models, I have never trained a language model. I am likely wrong about many things. I have not seen research on this, so it...

Jan 15, 202316

Nicky Pochinkov

Nicky Pochinkov

Nicky Pochinkov

LLM Modularity: The Separability of Capabilities in Large Language Models

LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space

AISC 2024 - Project Summaries

Machine Unlearning Evaluations as Interpretability Benchmarks

Nicky Pochinkov

LLM Modularity: The Separability of Capabilities in Large Language Models

LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space

AISC 2024 - Project Summaries

Machine Unlearning Evaluations as Interpretability Benchmarks

Literature Review of Text AutoEncoders

AISC 2024 - Project Summaries

Machine Unlearning Evaluations as Interpretability Benchmarks

Ideation and Trajectory Modelling in Language Models

LLM Modularity: The Separability of Capabilities in Large Language Models

LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space

Speculation on Path-Dependance in Large Language Models.