DeepMind claims that their recent agents learned tool use; in particular, they learned to build ramps to access inaccessible areas even though none of their training environments contained any inaccessible areas!
This is the result that impresses me most. However, I'm uncertain whether it's really true, or just cherry-picked.
Their results showreel depicts multiple instances of the AIs accessing inaccessible areas via ramps and other clever methods. However, in the actual paper page 51, they list all the hand-authored tasks they used to test the agents on, and mention that for some of them the agents did not get >0 reward. One of the tasks that the agents got 0 reward on is:
Tool Use Climb 1: The agent must use the objects to reach a higher floor with the target object.
So... what gives? They have video of the agents using objects to reach a higher floor containing the target object, multiple separate times. But then in the chart it says failed to achieve reward >0.
EDIT: Maybe the paper just has a typo? The blog post contains this image, which appears to show non-zero reward for the "tool use" task, zero-shot:
Nice, I missed that! Thanks!