End-to-End Training of Deep Visuomotor Policies

The following Great Innovative Idea is from Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel in the Electrical Engineering and Computer Sciences (EECS) Department at the University of California Berkeley. Their End-to-End Training of Deep Visuomotor Policies paper was one of the winners at the Computing Community Consortium (CCC) sponsored Blue Sky Ideas Track Competition at the AAAI-RSS Special Workshop on the 50th Anniversary of Shakey: The Role of AI to Harmonize Robots and Humans in Rome, Italy. It was a half day workshop on July 16th during the Robotics Science and Systems (RSS) 2015 Conference.

The Innovative Idea

Techniques like reinforcement learning and optimal control offer the promise of automating robotic decision making by using optimization and machine learning. However, practical applications of these techniques typically require a range of manually designed components to be provided to support these automated learning and optimization algorithms. For example, policy search might be used to autonomously acquire a policy for robotic grasping, so long as the algorithm is supplied with a manually designed vision system that detects the object of interest, a control system that moves the arm to the commanded position, and so forth. In practice, these manually designed components limit the applicability of robotic learning to relatively controlled situations, falling short of the promise of fully autonomous robotic learning. In this work, we take a step toward greater autonomy by proposing a learning algorithm that can be used to acquire robotic control policies that map directly from raw inputs, in the form of sensor readings from joint encoders and RGB cameras, to raw outputs, in the form of motor currents. This is enabled by recent advances in deep learning, a branch of machine learning that deals with training rich function approximators, such as neural networks, to directly process high-dimensional sensory input, such as images and sounds. Applying deep learning techniques to robotic learning has traditionally been very challenging, due to the high dimensionality of the learned function approximators. We address this challenge by using an approach called guided policy search, where the difficult reinforcement learning problem is transformed into a supervised learning problem, with supervision provided by an optimal control “teacher” that solves simplified versions of the task to generate training data for a neural network policy.

Aside from improving the autonomy of robotic learning by removing the need for manually engineered vision and control systems, our results show that the resulting method actually achieves better performance on a number of robotic manipulation tasks than the more traditional approach, where the control policy and perception system are trained separately. By combining control and perception (in our case, vision using an RGB camera) into a single problem, our method is able to discover visual features that are more relevant to the task, which allows the policy to avoid the kinds of errors in perception that are most harmful for task performance. In contrast, the more standard modular approach requires the perception system to be accurate at all times, since there is no feedback to the perception modules that indicates when errors are more or less harmful.

Impact

This kind of end-to-end robotic learning holds considerable promise for improving the ability of robots to handle complex, unstructured environments. Learning is already the dominant technique in fields like computer vision, where simple decisions, such as deciding on the label of an object in the current camera frame, are typically taken by using classifiers trained on large prior datasets of labeled images. However, adapting the successes of computer vision to robotic control has often proven less than straightforward, because many of the more subtle robotic tasks require close cooperation between the perception and control systems. An effective robotic perception system is not necessarily one that achieves high average accuracy, but one that is accurate at the times when it is most important. By training the perception and control systems end-to-end to map from raw sensory inputs to raw motor commands, methods such as the one proposed in our paper can bridge the gap between perception and control and provide robots with effective, autonomously learned sensorimotor skills.

Robots that can acquire sensorimotor skills automatically can build up rich repertoires of behaviors that are necessary to succeed in complex unstructured environments. Many of the domains in which robots are most effective today, for instance manufacturing, are quite structured and require the robot to perform only a small number of different tasks. More ambitious domains, such as household robotics or robots deployed in disaster relief scenarios, require the ability to deploy a wide array of skills to deal with large variation in the environment and goal. Engineering each of these skills by hand presents a considerable burden, and learning offers the promise of bridging this gap with minimal manual effort.

Other Research

Our work focuses on the intersection between robotics, machine learning, and computer vision. In particular, we focus on methods that enable robots to acquire greater autonomy through learning. Aside from combining robotic perception and control, we also focus on methods that can allow robots to learn from humans through learning from demonstration, computer vision and perception techniques that incorporate active decision making, and robotic control methods that leverage and recombine prior experience to tackle new tasks. Robotic control offers an excellent venue for exploring core questions in artificial intelligence, and in the course of understanding the algorithmic questions behind learning and decision making, allows us to also produce technologies that will enable near-term real-world impact.

Researcher’s Background

Sergey Levine is a postdoctoral researcher at UC Berkeley. He holds a PhD from Stanford University.
Chelsea Finn is a PhD student at UC Berkeley.
Trevor Darrell is on the faculty of the CS Division of the EECS Department at UC Berkeley and he is also appointed at the UC-affiliated International Computer Science Institute (ICSI). He completed his PhD at MIT.
Pieter Abbeel is an associate professor at UC Berkeley. He completed his PhD at Stanford University.