Skip to content

Latest Roundup of Recent Reinforcement Learning Studies #6

Robotic Reinforcement Learning at Scale: A Scalable Approach With MT-Opt in Multi-Task Environments, Detailed in Paper 1 by Kalashnikov et al. (2021), Found in ArXiv preprint arXiv:2104.08212. Multi-task robotic learning development, a fascinating element triggering ideas in science fiction,...

Latest analysis of reinforcement learning academic papers – issue 6
Latest analysis of reinforcement learning academic papers – issue 6

Latest Roundup of Recent Reinforcement Learning Studies #6

In the realm of artificial intelligence, recent developments in reinforcement learning (RL) have been making waves. Two groundbreaking papers, "MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale" by Kalashnikov et al. (2021) and "Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention" by the Google Research Team, are leading the charge.

MT-Opt: Learning Across Multiple Tasks and Robots

MT-Opt, a proposed method for continuous multi-task robotic reinforcement learning, allows learning to be distributed among several agents. This innovative approach enabled MT-Opt to learn 12 tasks in real life with distributed learning on 7 robots. The authors demonstrated examples of tasks learned with the MT-Opt method, such as alignment and rearrangement. Remarkably, the robots can learn new tasks on the fly by reusing past experience collected for learning other tasks.

Reanalise: A New Approach to Reinforcement Learning

Schrittwieser et al. (2021) introduced Reanalise, a method that directly uses a learned model for policy and value improvement in model-based reinforcement learning, both offline and online. Reanalise, when combined with MuZero, created MuZero unplugged, achieving a new state of the art in online and offline reinforcement learning. This powerful combination outperformed previous baselines in the Atari learning environment for all orders of magnitude of data budget.

Learning the High Jump

Zhiqi et al. (2021) presented a learning framework for athletic skills, specifically the high jump. The authors applied reinforcement learning to learn the high jump using a simulated character model, without the need for demonstration. One of the strategies that emerged was the Fosbury-flop, which remains the best strategy in simulation by reaching 2.0m.

The Legacy of Backpropagation

The backpropagation approach, first presented by LeCun et al. (1989), has had a significant impact on computer vision algorithms, including reinforcement learning when the observation space is an image. Initially, the approach did not gain immediate interest due to its computational intensity. However, with the parallelization of computation using the GPU in the 2000's, backpropagation gained popularity and is now used in all computer vision algorithms. LeCun et al. (1989) also presented the first use of backpropagation to learn convolution kernel coefficients directly from images for handwritten zip code recognition.

These advancements in reinforcement learning are paving the way for machines to learn and adapt more effectively, opening up exciting possibilities for the future of AI.

Read also: