UofT-Developed RL System Drives Cost Down for Robotics Research

University of Toronto Machine Intelligence Team

3 min readOct 19, 2021

Written by Natalie Volk. A discussion of the paper titled “Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFinger”.

For decades, an important goal within the worlds of AI and robotics has been to develop a robotic hand that mimics a human hand’s dexterity, precision, and versatility. The current state-of-the-art is still significantly less dexterous than human hands, but advanced robotic hand technology has myriads of applications in remote exploration, manufacturing, accessibility, and beyond.

Unfortunately, however, robotics research is incredibly expensive, which prohibits smaller companies and labs from working on it. Training robotic hands can take tens of thousands of years of training data. For reference, OpenAI’s Dactyl robotic hand, one of the world’s most advanced, required 13,000 years of training data. Of course, labs use techniques such as parallelization to run tasks simultaneously and later combine them — but this is where the research becomes expensive. Researchers can either create thousands of physical robot hands and train them concurrently (which is obviously infeasible for most labs) or, more commonly, they can use a simulated environment. Using a simulated environment comes with its own costs and challenges because it is hugely computationally expensive to create accurate representations of all real-world interactions. In order to train Dactyl, OpenAI required 30,000 CPU cores for the simulated environment. Without such vast computational power, there’s a large “sim2real” gap, which is when there’s a performance drop in the reinforcement learning model once it is taken from the simulated environment and used on a real-life robotic hand.

To combat the high cost and enable smaller labs to participate in robotic hand development, a team of researchers from Nvidia, the University of Toronto, and others has developed a system that uses highly efficient deep-learning techniques and simulation environments. This system uses Nvidia’s Isaac Gym, which allows thousands of simulations on a single GPU, making it accessible for most machine learning labs.

*Image credit: A. Allshire, M. Mittal, et al*

The system uses a type of reinforcement learning algorithm known as Proximity Policy Optimization (PPO). PPO is model-free, which means that it chooses the optimal policy without using dynamics of the environment; instead, it estimates a value function based directly on the interaction between the hand and its surroundings. In this case, the function receives sensory data from the simulated robot hand, as well as eight points along the hand that represent the hand’s position. It balances the distance between the fingers to the object that’s being picked up, the distance to the final intended location, and the ideal pose of the hand. This algorithm saves significant computation power because it doesn’t need to evaluate all the details of the environment.

The results published by the researchers seem promising. The paper claims that all requirements for running the system can be purchased for less than $10,000 — an impressively low bill when considering its apparent effectiveness. When testing on a real robotic hand’s ability to solve a Rubik’s cube after being trained by the researchers’ system, the sim2real gap was only 7%.

References

Allshire, A., Mittal, M., Lodaya, V., & Makoviychuk, V. (2021, August 22). Transferring dexterous manipulation from GPU Simulation to a Remote Real-World TriFinger. arxiv.org. Retrieved October 10, 2021, from https://arxiv.org/pdf/2108.09779.pdf.
Dickson, B. (2021, September 27). Nvidia, University of Toronto are making robotics research available to small firms. TechTalks. Retrieved October 7, 2021, from https://bdtechtalks.com/2021/09/27/nvidia-robotic-hand-simulation-training/.
Negrello, F., Stuart, H. S., & Catalano, M. G. (1AD, January 1). Hands in the real world. Frontiers. Retrieved October 7, 2021, from https://www.frontiersin.org/articles/10.3389/frobt.2019.00147/full.
Schulman, J. (2020, September 2). Proximal policy optimization. OpenAI. Retrieved October 7, 2021, from https://openai.com/blog/openai-baselines-ppo/.

UofT-Developed RL System Drives Cost Down for Robotics Research

Written by University of Toronto Machine Intelligence Team

No responses yet