Jesse Hoogland

Currently a research assistant at the Krueger Lab & SERI MATS 3.1 scholar. 

Website: jessehoogland.com

Twitter: @jesse_hoogland

Sequences

Developmental Interpretability

Wiki Contributions

Comments

There are three natural reward functions that are plausible:

  • , which is linear in the number of times  is pressed.
  • , which is linear in the number of times  is pressed.
  • , where  is the indicator function for  being pressed an even number of times,  being the indicator function for  being pressed an odd number of times.

 

Why are these reward functions "natural" or more plausible than , (some constant, independent of button presses),  (the total number of button presses), etc.