Review of Recent Reinforcement Learning Studies #6

In the rapidly evolving field of artificial intelligence, reinforcement learning (RL) has been making significant strides. Here's a roundup of some recent research in this area.

Paper 1: MT-Opt for Continuous Multi-Task Robotic Reinforcement Learning

Kalashnikov et al. have introduced MT-Opt, a method that allows robots to learn new tasks on the fly by reusing past experience. This approach is noteworthy as it can distribute learning among several agents, as demonstrated by learning 12 tasks in real life with a distributed learning on 7 robots.

Paper 2: Learning Athletic Skills with Reinforcement Learning

Zhiqi and colleagues have developed a learning framework for athletic skills, specifically the high jump. The framework applies RL to a simulated character model, with the agent discovering the best strategy to jump over the bar without any demonstration. Interestingly, the Fosbury-flop, a more efficient high jump technique where athletes arrive with their back to the obstacle, was first executed at the 1968 Olympic Games.

Paper 3: Reset-Free Reinforcement Learning via Multi-Task Learning

The authors of this paper propose a method for reset-free reinforcement learning, allowing for the learning of dexterous manipulation behaviours without human intervention. While the paper does not provide specific examples or results, it discusses the methodology and potential applications.

Paper 4: Introducing Reanalise for Model-Based Reinforcement Learning

Schrittwieser et al. have proposed Reanalise, a method that directly uses a learned model for policy and value improvement in model-based RL, both offline and online. The method presented in the paper is a solution for semi-online-semi-offline learning. Notably, the combination of Reanalise with model-based algorithms, such as MuZero, is called MuZero unplugged, and it has reached a new state of the art in online and offline RL, outperforming previous baselines in the Atari learning environment for all orders of magnitude of data budget.

Bonus Paper: Backpropagation and Handwritten Zip Code Recognition

In 1989, David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams presented backpropagation applied to handwritten zip code recognition, a key approach used in computer vision algorithms, including reinforcement learning when the observation space is an image. This approach, despite its computational intensity, gained increased efficiency in the 2000's thanks to the parallelization of computation thanks to the GPU.

These papers represent a snapshot of the current state of reinforcement learning research, showcasing a range of innovative approaches and applications. As the field continues to evolve, we can expect to see even more exciting developments in the future.