All Posts

Sorted by New

Monday, July 13th 2020
Mon, Jul 13th 2020

No posts for July 13th 2020

Sunday, July 12th 2020
Sun, Jul 12th 2020

No posts for July 12th 2020

Friday, July 10th 2020
Fri, Jul 10th 2020

No posts for July 10th 2020
Shortform
1Alex Turner2dTransparency Q: how hard would it be to ensure a neural network doesn't learn any explicit NANDs?

Tuesday, July 7th 2020
Tue, Jul 7th 2020

No posts for July 7th 2020
Shortform
8Alex Turner5dI think instrumental convergence also occurs in the model space for machine learning. For example, many different architectures likely learn edge detectors in order to minimize classification loss on MNIST. But wait - you'd also learn edge detectors to maximize classification loss on MNIST (loosely, getting 0% on a multiple-choice exam requires knowing all of the right answers). I bet you'd learn these features for a wide range of cost functions. I wonder if that's already been empirically investigated? And, same for adversarial features. And perhaps, same for mesa optimizers (understanding how to stop mesa optimizers from being instrumentally convergent seems closely related to solving inner alignment). What can we learn about this?

Monday, July 6th 2020
Mon, Jul 6th 2020

No posts for July 6th 2020

Sunday, July 5th 2020
Sun, Jul 5th 2020

No posts for July 5th 2020

Load More Days