All Posts

Sorted by Magic (New & Upvoted)

Monday, July 13th 2020
Mon, Jul 13th 2020

No posts for July 13th 2020

Friday, July 10th 2020
Fri, Jul 10th 2020

Shortform
1Alex Turner2dTransparency Q: how hard would it be to ensure a neural network doesn't learn any explicit NANDs?

Tuesday, July 7th 2020
Tue, Jul 7th 2020

Shortform
8Alex Turner5dI think instrumental convergence also occurs in the model space for machine learning. For example, many different architectures likely learn edge detectors in order to minimize classification loss on MNIST. But wait - you'd also learn edge detectors to maximize classification loss on MNIST (loosely, getting 0% on a multiple-choice exam requires knowing all of the right answers). I bet you'd learn these features for a wide range of cost functions. I wonder if that's already been empirically investigated? And, same for adversarial features. And perhaps, same for mesa optimizers (understanding how to stop mesa optimizers from being instrumentally convergent seems closely related to solving inner alignment). What can we learn about this?

Monday, July 6th 2020
Mon, Jul 6th 2020

No posts for July 6th 2020

Load More Days