AI ALIGNMENT FORUMTags
AF

Gradient Hacking

•
Applied to Crystalizing an agent's objective: how inner-misalignment could work in our favor by D0TheMath at 10d
•
Applied to A Toy Model of Gradient Hacking by Oam Patel at 11d
•
Applied to Is Fisherian Runaway Gradient Hacking? by Ryan Kidd at 3mo
•
Applied to Gradient Hacking via Schelling Goals by Adam Scherlis at 6mo
•
Applied to Some motivations to gradient hack by Multicore at 6mo
•
Applied to Understanding Gradient Hacking by Peter Barnett at 7mo
•
Applied to Obstacles to gradient hacking by Peter Barnett at 7mo
•
Applied to Some real examples of gradient hacking by Ruben Bloom at 7mo
•
Applied to Approaches to gradient hacking by Ruben Bloom at 8mo
•
Applied to Thoughts on gradient hacking by Ruben Bloom at 8mo
•
Applied to How does Gradient Descent Interact with Goodhart? by Ruben Bloom at 8mo
•
Applied to Gradient hacking by Ruben Bloom at 8mo
•
Applied to Meta learning to gradient hack by Quintin Pope at 9mo
•
Applied to Towards Deconfusing Gradient Hacking by leogao at 9mo
•
Created by leogao at 9mo