Robot In-hand Dexterous manipulation by extracting data from human manipulation of objects to improve robotic autonomy and dexterity - InDex

Lead Research Organisation: Aston University
Department Name: College of Engineering and Physical Sci


Humans excel when dealing with everyday objects and manipulation tasks, learning new skills, and adapting to different or complex environments. This is a basic skill for our survival as well as a key feature in our world of artefacts and human-made devices. Our expert ability to use our hands results from a lifetime of learning by both observing other skilled humans and ourselves as we discover how to handle objects first hand. Unfortunately, today's robotic hands are still unable to achieve such a high level of dexterity in comparison to humans nor are systems entirely able to understand their own potential. In order for robots to truly operate in a human world and fulfil the expectations as intelligent assistants, they must be able to manipulate a wide variety of unknown objects by mastering their capabilities of strength, finesse and subtlety. To achieve such dexterity with robotic hands, cognitive capacity is needed to deal with uncertainties in the real world and to generalise previously learned skills to new objects and tasks. Furthermore, we assert that the complexity of programming must be greatly reduced and robot autonomy must become much more natural. The InDex project aims to understand how humans perform in-hand object manipulation and to replicate the observed skilled movements with dexterous artificial hands, merging the concepts of deep reinforcement and transfer learning to generalise in-hand skills for multiple objects and tasks. In addition, an abstraction and representation of previous knowledge will be fundamental for the reproducibility of learned skills to different hardware. Learning will use data across multiple modalities that will be collected, annotated and assembled into a large dataset. The data and our methods will be shared with the wider research community to allow testing against benchmarks and reproduction of results. More concretely, the core objectives are: (i) to build a multi-modal artificial perception architecture that extracts data of object manipulation by humans; (ii) the creation of a multimodal dataset of in-hand manipulation tasks such as regrasping, reorienting and finely repositioning; (iii) the development of an advanced object modelling and recognition system, including the characterisation of object affordances and grasping properties, in order to encapsulate both explicit information and possible implicit object usages; (iv) to autonomously learn and precisely imitate human strategies in handling tasks; and (v) to build a bridge between observation and execution, allowing deployment that is independent of the robot architecture.

Planned Impact

Not Applicable
Description InDex's ultimate target is focused on how to drastically enhance the ability of anthropomorphic robotic hands to perform in-hand manipulation tasks similar to humans with regards to sensing, reasoning and action. One of the significant barriers to the uptake of highly sophisticated manipulators is the major overhead of controlling these systems to obtain the full benefit from their capabilities and features with regards of fine movements. We are still defining multiple imitation learning (learning from human experiences) strategies based on deep and transfer learning, canonical actions encoding and decoding, and object affordances to reduce the re-adjustment of constraints and provide the robot with an increased autonomy. The robot will be able to imitate human motion, however, it will also gain the possibility of improving its performance through its own experience, by exploration. This will be indispensable to the advent of the new generation of robots capable of achieving grasping skills and sophisticated in-hand manipulation so as to perform efficient and useful interaction activities.
Exploitation Route It is expected that the outcomes of InDex will allow dexterous hands to operate in a much more autonomous mode for in-hand manipulation of objects. It will improve the change management strategy for certain robots in industrial chains, as well as increase the flexibility of factories using this system. The intention is also to have the possibility of providing an approach integrated to robots operating autonomously in either domestic environment, replacing humans in service tasks, or difficult terrains, or in dangerous, hazardous tasks. In order to spur this significant breakthrough, the consortium will develop a novel path for robot learning strategies. The outcomes of this project will be relevant for industry.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Construction,Healthcare,Other

Description Sim2Real: From Simulation to Real Robotic Application using Deep Reinforcement Learning and Knowledge Transfer
Amount £20,000 (GBP)
Funding ID RGS\R2\192498 
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 10/2019 
End 10/2020
Title Goal Density-based Hindsight Experience Prioritizationfor Multi-Goal Robot Manipulation Reinforcement Learning 
Description We proposed the experience prioritization algorithm for the hindsight experience replay (HER) method that can improve the learning efficiency in reinforcement learning. We especially focus on utilizing the density distribution of the achieved points of a desired trajectory. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that prioritize those achieved points that are rare seen in the replay buffer to be used as substitute goals for HER algorithm in order to find the best trajectory for a robot task. We evaluate our method on simulation environments with several robot manipulation tasks outperforming HER performance. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? Yes  
Impact Reinforcement Learning needs the environment to provide rewards as feed-back for training. However in real-world, most rewards areprovided sparsely, which means reward signal is only given when a trajectory is successful. Dealing with sparse reward is especially challenging for most classical RL algorithms. For robot RL tasks, during the earlier stage of the learning, most experience are not successful, thus cannot provide positive feedback. This method has introduced the idea of goal-density and has point out that rare experience are more valuable to learn, so it prioritises achieved points of a given trajectory that are rare seen in the replay buffer to be used as substitute goal, in order to find the best trajectory for a given task. 
Description Sobornne 
Organisation Sorbonne University
Country France 
Sector Academic/University 
PI Contribution Providing learning algorithms using data extracted from human manipulation
Collaborator Contribution • Formalisation of in-hand manipulation algorithm which is split into two parts. In the first part, the algorithm takes the decision on which finger should be in contact in order to do the manipulation.This part is done with a probabilistic approach based on manipulability. • In the second part, finding the best motion primitive combination under hand constraints to go from initial position to the final position. • The motion primitives to be used should be extracted from in-hand manipulation data based on Non-negative Matrix Factorization (NMF) that insures sparsity. Ongoing work: finding good data to apply NMF method so we can extract motion primitives to use them in the second step of our algorithm.
Impact under development - outputs in a few months
Start Year 2019
Description TUW 
Organisation Vienna University of Technology
Country Austria 
Sector Academic/University 
PI Contribution Discussion and definition of methodology for object-centric perception.
Collaborator Contribution • The primary focus in the first year has been on developing the hand and object tracking system as this is a fundamental capability for the other tasks. We have made significant steps for single frame object pose estimation using RGB or depth information. In particular, our most noteworthy achievement is Pix2Pose (Park et al. 2019), which won best performing method on the YCB-Video and Rutgers Amazon Picking Challenge at the Benchmark for 6D Object Pose Estimation Challenge at ICCV 2019. Pix2Pose uses an auto-encoder architecture to estimate 3D coordinates of object pixels, which are then composed in the PnP algorithm to derive a 6D object pose. By applying generative adversarial training, Pix2Pose better recovers occluded object parts, which is highly necessary for the in-hand manipulation scenario of InDex. The outstanding results in the ICCV challenge establishes our work as SotA for RGB-based pose estimation. Recent work has extended the concept to a tracking system by fusing the single frame estimates with a Kalman filter. For depth data, we have also made progress. Specifically, we have extended the leading depth-based method that uses Point-Pair features by exploiting object symmetries. By identifying object symmetries, we reduce the number of false hypotheses and also reduce computation time. This work was awarded best paper at the International Conference on Vision Systems, 2019. A further addition to our pose estimation work incorporates physics-based verification to refine and improve initial estimates . For hand tracking, we have investigated the feasibility of existing methods using the HO3D dataset, which is a labelled dataset closely aligned to the scenario studied in InDex. • In this year we have also made significant progress in generalising grasps from known to unknown objects as well as achieving task-relevant grasps. We introduced the dense geometric correspondence matching network that applies metric learning to encode geometric similarity. Using this network, we can construct 3D-3D correspondences between pairs of objects, which is used in the grasping pipeline to transfer an associated grasp for a known 2.5D image to an unseen object. Grasping experiments with our approach outperform baseline methods for objects in the YCB dataset. Furthermore, we demonstrated that known grasps can be annotated in such a way that they are relevant for the task. As a result, the robot executes grasps that enable the functional use of objects. A dataset of the grasp annotations is released with the publication
Impact dataset: publication listed in the project portfolio
Start Year 2019
Description University of Genoa 
Organisation University of Genoa
Country Italy 
Sector Academic/University 
PI Contribution Collaboration on specification of Protocol of Data Acquisition, including definition of scenarios and In-hand Manipulation dictionary
Collaborator Contribution A series of activities have been conducted to define the techniques and the methodologies involved in the data acquisition process. In particular: • A literature review analysis have been performed to formally define the hand-object interaction task in relation with other sub tasks such as: grasping, transporting, manipulation and releasing. • A literature review analysis have been conducted to determine which are relevant information necessary to model the hand-object interaction. The results suggests that to properly model the hand-object interaction either motor system information and sensory feedback ones are necessary. • Since the previous work suggested the necessity of a multisensory dataset to model the hand-object interaction, a systematic literature review have been conducted to identify the state of the art on this subject. The literature mainly presents computer vision related datasets for object grasping and manipulation but a multi-modal dataset is missing. • A software architecture to record a multi-modal dataset, using the Robotic Operative System (ROS) as communication framework, have been developed. The architecture allows the synchronization of data coming from different sensors such as: RGB-D cameras (Kinect), motion capture (MoCap) system and data glove which can be equipped with inertial measurement units (IMUs) and tactile sensors. • A first version of a data glove integrating inertial measurements units and MoCap markers for the hand motion perception have been built. Currently the second version is under development. • The experimental scenario is currently being finalized, also with the interaction of the other partners.
Impact under development - outcomes in the following months with data acquisition of human in-hand manipulation of objects (motion patterns of humans and objects) for imitation learning strategies.
Start Year 2019
Description University of Tartu 
Organisation University of Tartu
Country Estonia 
Sector Academic/University 
PI Contribution Definition of strategies for Software Integration (perception, learning/reasoning, robot reaction) using Robot Operating System.
Collaborator Contribution • UTARTU has been participating in meetings organized in Vienna. He set up the slack channel for discussions about software architecture and testing procedures. They have been in contact with partners and discussed technical details. Their PhD student Robert Valner has prepared the testing procedure and requirements for software integration. The interfaces are under discussion and data exchange protocols are under discussion right now. • Currently, they are working on TeMoto software development for collaborative robotics solutions.
Impact under development
Start Year 2019
Description Meetings - IEEE Special Interest Group on Grasp Ontology 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Special Group at IEEE organisation, to define world standard in robot grasping/manipulation ontology. We are trying to have InDex project team engaged within the definition of this new standard that is being created for robot grasping.
Year(s) Of Engagement Activity 2019