Robot In-hand Dexterous manipulation by extracting data from human manipulation of objects to improve robotic autonomy and dexterity - InDex

Lead Research Organisation: Aston University
Department Name: College of Engineering and Physical Sci


Humans excel when dealing with everyday objects and manipulation tasks, learning new skills, and adapting to different or complex environments. This is a basic skill for our survival as well as a key feature in our world of artefacts and human-made devices. Our expert ability to use our hands results from a lifetime of learning by both observing other skilled humans and ourselves as we discover how to handle objects first hand. Unfortunately, today's robotic hands are still unable to achieve such a high level of dexterity in comparison to humans nor are systems entirely able to understand their own potential. In order for robots to truly operate in a human world and fulfil the expectations as intelligent assistants, they must be able to manipulate a wide variety of unknown objects by mastering their capabilities of strength, finesse and subtlety. To achieve such dexterity with robotic hands, cognitive capacity is needed to deal with uncertainties in the real world and to generalise previously learned skills to new objects and tasks. Furthermore, we assert that the complexity of programming must be greatly reduced and robot autonomy must become much more natural. The InDex project aims to understand how humans perform in-hand object manipulation and to replicate the observed skilled movements with dexterous artificial hands, merging the concepts of deep reinforcement and transfer learning to generalise in-hand skills for multiple objects and tasks. In addition, an abstraction and representation of previous knowledge will be fundamental for the reproducibility of learned skills to different hardware. Learning will use data across multiple modalities that will be collected, annotated and assembled into a large dataset. The data and our methods will be shared with the wider research community to allow testing against benchmarks and reproduction of results. More concretely, the core objectives are: (i) to build a multi-modal artificial perception architecture that extracts data of object manipulation by humans; (ii) the creation of a multimodal dataset of in-hand manipulation tasks such as regrasping, reorienting and finely repositioning; (iii) the development of an advanced object modelling and recognition system, including the characterisation of object affordances and grasping properties, in order to encapsulate both explicit information and possible implicit object usages; (iv) to autonomously learn and precisely imitate human strategies in handling tasks; and (v) to build a bridge between observation and execution, allowing deployment that is independent of the robot architecture.

Planned Impact

Not Applicable


10 25 50
Description InDex's ultimate target is focused on how to drastically enhance the ability of anthropomorphic robotic hands to perform in-hand manipulation tasks similar to humans with regards to sensing, reasoning and action. One of the significant barriers to the uptake of highly sophisticated manipulators is the major overhead of controlling these systems to obtain the full benefit from their capabilities and features with regards of fine movements. We are still defining multiple imitation learning (learning from human experiences) strategies based on deep and transfer learning, canonical actions encoding and decoding, and object affordances to reduce the re-adjustment of constraints and provide the robot with an increased autonomy. The robot will be able to imitate human motion, however, it will also gain the possibility of improving its performance through its own experience, by exploration. This will be indispensable to the advent of the new generation of robots capable of achieving grasping skills and sophisticated in-hand manipulation so as to perform efficient and useful interaction activities.
Exploitation Route It is expected that the outcomes of InDex will allow dexterous hands to operate in a much more autonomous mode for in-hand manipulation of objects. It will improve the change management strategy for certain robots in industrial chains, as well as increase the flexibility of factories using this system. The intention is also to have the possibility of providing an approach integrated to robots operating autonomously in either domestic environment, replacing humans in service tasks, or difficult terrains, or in dangerous, hazardous tasks. In order to spur this significant breakthrough, the consortium will develop a novel path for robot learning strategies. The outcomes of this project will be relevant for industry.
Sectors Aerospace

Defence and Marine


Food and Drink




Description Sim2Real: From Simulation to Real Robotic Application using Deep Reinforcement Learning and Knowledge Transfer
Amount £20,000 (GBP)
Funding ID RGS\R2\192498 
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2019 
End 10/2021
Title Goal Density-based Hindsight Experience Prioritizationfor Multi-Goal Robot Manipulation Reinforcement Learning 
Description We proposed the experience prioritization algorithm for the hindsight experience replay (HER) method that can improve the learning efficiency in reinforcement learning. We especially focus on utilizing the density distribution of the achieved points of a desired trajectory. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that prioritize those achieved points that are rare seen in the replay buffer to be used as substitute goals for HER algorithm in order to find the best trajectory for a robot task. We evaluate our method on simulation environments with several robot manipulation tasks outperforming HER performance. 
Type Of Material Computer model/algorithm 
Year Produced 2020 
Provided To Others? Yes  
Impact Reinforcement Learning needs the environment to provide rewards as feed-back for training. However in real-world, most rewards areprovided sparsely, which means reward signal is only given when a trajectory is successful. Dealing with sparse reward is especially challenging for most classical RL algorithms. For robot RL tasks, during the earlier stage of the learning, most experience are not successful, thus cannot provide positive feedback. This method has introduced the idea of goal-density and has point out that rare experience are more valuable to learn, so it prioritises achieved points of a given trajectory that are rare seen in the replay buffer to be used as substitute goal, in order to find the best trajectory for a given task. 
Description Sobornne 
Organisation Sorbonne University
Country France 
Sector Academic/University 
PI Contribution Providing learning algorithms using data extracted from human manipulation
Collaborator Contribution • Formalisation of in-hand manipulation algorithm which is split into two parts. In the first part, the algorithm takes the decision on which finger should be in contact in order to do the manipulation.This part is done with a probabilistic approach based on manipulability. • In the second part, finding the best motion primitive combination under hand constraints to go from initial position to the final position. • The motion primitives to be used should be extracted from in-hand manipulation data based on Non-negative Matrix Factorization (NMF) that insures sparsity. Ongoing work: finding good data to apply NMF method so we can extract motion primitives to use them in the second step of our algorithm.
Impact under development - outputs in a few months
Start Year 2019
Description TUW 
Organisation Vienna University of Technology
Country Austria 
Sector Academic/University 
PI Contribution Discussion and definition of methodology for object-centric perception.
Collaborator Contribution • The primary focus in the first year has been on developing the hand and object tracking system as this is a fundamental capability for the other tasks. We have made significant steps for single frame object pose estimation using RGB or depth information. In particular, our most noteworthy achievement is Pix2Pose (Park et al. 2019), which won best performing method on the YCB-Video and Rutgers Amazon Picking Challenge at the Benchmark for 6D Object Pose Estimation Challenge at ICCV 2019. Pix2Pose uses an auto-encoder architecture to estimate 3D coordinates of object pixels, which are then composed in the PnP algorithm to derive a 6D object pose. By applying generative adversarial training, Pix2Pose better recovers occluded object parts, which is highly necessary for the in-hand manipulation scenario of InDex. The outstanding results in the ICCV challenge establishes our work as SotA for RGB-based pose estimation. Recent work has extended the concept to a tracking system by fusing the single frame estimates with a Kalman filter. For depth data, we have also made progress. Specifically, we have extended the leading depth-based method that uses Point-Pair features by exploiting object symmetries. By identifying object symmetries, we reduce the number of false hypotheses and also reduce computation time. This work was awarded best paper at the International Conference on Vision Systems, 2019. A further addition to our pose estimation work incorporates physics-based verification to refine and improve initial estimates . For hand tracking, we have investigated the feasibility of existing methods using the HO3D dataset, which is a labelled dataset closely aligned to the scenario studied in InDex. • In this year we have also made significant progress in generalising grasps from known to unknown objects as well as achieving task-relevant grasps. We introduced the dense geometric correspondence matching network that applies metric learning to encode geometric similarity. Using this network, we can construct 3D-3D correspondences between pairs of objects, which is used in the grasping pipeline to transfer an associated grasp for a known 2.5D image to an unseen object. Grasping experiments with our approach outperform baseline methods for objects in the YCB dataset. Furthermore, we demonstrated that known grasps can be annotated in such a way that they are relevant for the task. As a result, the robot executes grasps that enable the functional use of objects. A dataset of the grasp annotations is released with the publication
Impact dataset: publication listed in the project portfolio
Start Year 2019
Description University of Genoa 
Organisation University of Genoa
Country Italy 
Sector Academic/University 
PI Contribution Collaboration on specification of Protocol of Data Acquisition, including definition of scenarios and In-hand Manipulation dictionary
Collaborator Contribution A series of activities have been conducted to define the techniques and the methodologies involved in the data acquisition process. In particular: • A literature review analysis have been performed to formally define the hand-object interaction task in relation with other sub tasks such as: grasping, transporting, manipulation and releasing. • A literature review analysis have been conducted to determine which are relevant information necessary to model the hand-object interaction. The results suggests that to properly model the hand-object interaction either motor system information and sensory feedback ones are necessary. • Since the previous work suggested the necessity of a multisensory dataset to model the hand-object interaction, a systematic literature review have been conducted to identify the state of the art on this subject. The literature mainly presents computer vision related datasets for object grasping and manipulation but a multi-modal dataset is missing. • A software architecture to record a multi-modal dataset, using the Robotic Operative System (ROS) as communication framework, have been developed. The architecture allows the synchronization of data coming from different sensors such as: RGB-D cameras (Kinect), motion capture (MoCap) system and data glove which can be equipped with inertial measurement units (IMUs) and tactile sensors. • A first version of a data glove integrating inertial measurements units and MoCap markers for the hand motion perception have been built. Currently the second version is under development. • The experimental scenario is currently being finalized, also with the interaction of the other partners.
Impact under development - outcomes in the following months with data acquisition of human in-hand manipulation of objects (motion patterns of humans and objects) for imitation learning strategies.
Start Year 2019
Description University of Tartu 
Organisation University of Tartu
Country Estonia 
Sector Academic/University 
PI Contribution Definition of strategies for Software Integration (perception, learning/reasoning, robot reaction) using Robot Operating System.
Collaborator Contribution • UTARTU has been participating in meetings organized in Vienna. He set up the slack channel for discussions about software architecture and testing procedures. They have been in contact with partners and discussed technical details. Their PhD student Robert Valner has prepared the testing procedure and requirements for software integration. The interfaces are under discussion and data exchange protocols are under discussion right now. • Currently, they are working on TeMoto software development for collaborative robotics solutions.
Impact under development
Start Year 2019
Title DGCM-Net for learning robotic grasping from experience 
Description Successful attempts are remembered and then used to guide future grasps such that more reliable grasping is achieved over time. To generalise the learned experience to unseen objects, we introduce the dense geometric correspondence matching network (DGCM-Net). This applies metric learning to encode objects with similar geometry nearby in feature space. Retrieving relevant experience for an unseen object is thus a nearest neighbour search with the encoded feature maps. DGCM-Net also reconstructs 3D-3D correspondences using the view-dependent normalised object coordinate space to transform grasp configurations from retrieved samples to unseen objects. In comparison to baseline methods, our approach achieves an equivalent grasp success rate. However, the baselines are significantly improved when fusing the knowledge from experience with their grasp proposal strategy. Offline experiments with a grasping dataset highlight the capability to generalise within and between object classes as well as to improve success rate over time from increasing experience. Lastly, by learning task-relevant grasps, our approach can prioritise grasps that enable the functional use of objects. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Useful for InDex project, WP2, WP3 and WP4 
Title Pix2Pose for object pose estimation from RGB 
Description Pix2Pose is a novel pose estimation method that predicts the 3D coordinates of each object pixel without textured models. An auto-encoder architecture was designed to estimate the 3D coordinates and expected errors per pixel. These pixel-wise predictions are then used in multiple stages to form 2D-3D correspondences to directly compute poses with the PnP algorithm with RANSAC iterations. This method is robust to occlusion by leveraging recent achievements in generative adversarial training to precisely recover occluded parts. Furthermore, a novel loss function, the transformer loss, is proposed to handle symmetric objects by guiding predictions to the closest symmetric pose. This method will assist WP1 and WP2 for object detection and tracking for in-hand manipulation. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Won best performing method on the YCB-Video and Rutgers Amazon Picking Challenge at the Benchmark for 6D Object Pose Estimation Challenge at ICCV 2019. 
Title SyDPose: Object detection and pose estimation in cluttered real-world depth images trained using only synthetic data 
Description Synthetically created depth data with domain-relevant background randomized noise heuristics to train an end-to-end, multi-task network, for pose estimation. We simultaneously detect, classify and estimate the poses of texture-less objects in cluttered real-world depth images of an arbitrary number of objects. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This technique will be relevant for InDex (WP1 and WP2) to assist with object detection and pose estimation for grasping purposes. 
Description IEEE RO-MAN HOBI Workshop (Hand-Object Interaction: From Human Demonstrations to Robot Manipulation) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The Workshop aimed at gathering new approaches and experience from different fields to discuss which conceptual and engineering tools are better suited to sense human hand motions, to recognise objects and their physical characteristics, as well as to model and encode this knowledge to develop new robot behaviours.
In this event we could discuss with participants - researchers (robotics, computer science, engineering) from all world - about the utmost importance of focus on how humans use their hands with the aim of developing novel robot capabilities to deal with tasks usually considered a human prerogative, and in general being able to interact, collaborate or communicate with humans in a socially acceptable and safe manner. For example, a robot should be able to dextrously make use of tools, to synchronise its movements with the human it is collaborating with, either for joint work or turn-taking, or to manipulate objects such as to enhance a sense of trust. We had sessions for plenary talks, sessions for paper presentations and a round table to discuss advances on this field.
Year(s) Of Engagement Activity 2020
Description Meetings - IEEE Special Interest Group on Grasp Ontology 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Special Group at IEEE organisation, to define world standard in robot grasping/manipulation ontology. We are trying to have InDex project team engaged within the definition of this new standard that is being created for robot grasping.
Year(s) Of Engagement Activity 2019