iSee - Intelligent Vision for Grasping

Lead Research Organisation: University of Glasgow
Department Name: School of Computing Science

Abstract

Intelligent vision is a key enabler for future robotics technology. Shadow's recent development work on a disruptive universal gripper (launched at Innovate16) has identified a need for better vision to permit the automation of grasping. Building on R&D at the University of Glasgow for over 20 years, the iSee project will establish concrete robot vision benchmarks, based on commercially relevant scenes, and then develop, validate and integrate vision sensors and processing algorithms into the Smart Grasping System (SGS) to enable it to reach significant new markets in automation, logistics and service robotics. The following candidate sensors have been selected for benchmarking:

A. Low-cost time of Flight (TOF) 3D cameras (available from various companies)
B. Stereo pairs of off-the-shelf HD cameras and stereo pairs of embedded vision sensors in conjunction with CVAS's existing custom stereo-pair image matching and photogrammetry software
C. An Asus Xtion RGBD camera will serve as a benchmark reference sensor

We propose to build an integrated hand-eye system for each sensor listed above, along with appropriate lighting, and to develop complete integrated pipelines to benchmark the different combinations of capture and analysis systems on the specified scenarios. This investigation will allow us to determine the performance of 2D and 3D sensing methods in terms of quality of image capture and hand-eye performance.

We propose also to evaluate the new and highly disruptive Deep Convolutional Neural net technology (DCNNs) that has the potential to leapfrog the best algorithmic vision methods and to provide a fast, accurate and complete vision solution that meets the demands of advanced robotic grasping and manipulation. We will thus augment the evaluation with potentially very efficient and highspeed DCNN algorithms for interpreting images using potentially low-cost sensors for:

* Detecting and localising known objects and estimating their pose for grasping purposes
* Estimating depth, size and surface normals directly from single monocular images, using transfer methods
* Recovering depth from binocular and monocular camera systems using stereo matching and structure from motion (optical flow) respectively

Once trained, DCNNs can analyse images very quickly and are now becoming suitable for low-cost embedded platforms, such as smartphones. This aspect of the proposed investigation has the potential to massively simplify the sensor in terms of hardware only single cameras, or stereo pairs of cameras, are required in combination with DCCNs as the basis for a vision system which might potentially provide all of the functionality required to control the hand in a wide range of scenarios.

Benchmark results will let us develop specific camera-algorithm combinations to improve performance for the specified use cases over a number of test-evaluate-improve iterations. The core 3D sensing approaches will be integrated with the SGS, and we shall evaluate additional cameras situated off-hand providing critical ancillary visual input when objects are close to the gripper camera prior to a grasp, or during in-hand manipulation. Both camera systems will be used to acquire different views of the scene, with both systems mounted on a robot arm for interactive perception of the scene and objects contained within it.

In parallel, we will develop a showcase demonstration system at Shadow based on Shadow's current grasp planning software coupled to the 3D images captured by the benchmarked 3D vision systems. The developed vision modules will be encapsulated within the "Blocky" programming system to afford a simple and direct method for end-users to take advantage of this capability.

In conclusion, we believe that the robotics hand-eye pipelines proposed within the iSee project have the potential to play an important role in maintaining market leadership in the development of complete robotics system solutions.

Planned Impact

The iSee project will provide a technology that can utilise in assistive and social settings that will underpin fundamental research and commercial collaborations and will deliver impacts in the Knowledge, Economy, Society, and People areas.

Knowledge
In iSee, we will ensure that the understanding of problems and solutions generated by this activity will flow in both directions between the Shadow and the CVAS group. Therefore, direct scientific impacts will occur if the benefits of the iSee's vision library lead to its adoption within the smart grasping system. Indirect impacts will occur if iSee's integrated smart sensing is adopted in other robotic scenarios outside the current scope of the project -- e.g. assistive and care roles, human-robot interaction, etc.

Robotics and autonomous systems are recognised by the UK as one of the eight grand technologies of the future of which iSee's vision and grasping will subserve as the founding robotic platform for the development and design of new robotic and autonomous technologies. In the longer term enhanced and integrated visual sensing technologies within smart grasping systems will encourage the development of new types of robots and robotic systems. Specifically, robots will be capable of working in new areas such as constrained places in manufacturing that are challenging for humans to access, and to operate in environments not suitable for industrial vision.

Economy
Robotic technologies are increasingly used in high wage economies such as the UK, and it is anticipated to be one of the drivers of the fourth industrial revolution. The technology developed in iSee will provide the industrial sector with immediate economic impact in terms of the exploitation of new product sales and profitability that is tightly coupled with design, research, production, sales, user feedback and field trials.

It is anticipated that robotics will fundamentally reduce labour costs by replacing a large proportion of routine roles. The Copenhagen study shows that UK adoption of industrial automation will produce a long-term increase in productivity of 22% and workforce increase of 7.4% as staff are re-skilled and moved to higher skilled roles. In this context, iSee's will be extremely significant since the project will have impact on the development of future service robots unlocking new industries. Vision for service robots is a significant challenge, and if we can deploy an effective sensorised solution, we have the potential to enable a new wave of startups creating vision-enabled service robots across multiple market domains.

People
iSee will facilitate the development of new vision and robotics skills in the research associates and CVAS academic staff. People with robotics and autonomous systems expertise are in high demand in both industry and academia and are significant and economic contributors. Likewise, the ability to deploy robots to perform a wider range of repetitive manual tasks will reduce the incidence of industrial injuries due to tiredness and boredom, and, hence, improve the quality of people's working environments.

Society
Robots are also key to addressing social challenges in high wage economies, e.g. increasing healthcare demands and the aging population. The enhanced reliability that iSee will deliver is essential for disruptive new robotic technologies that are not currently possible such as hospitals and care facilities. A new generation of robots may be deployed in assistive and care roles, which could have significant impact on social care and aging society challenges. End effectors that can see mean that robots can work in areas where illumination cannot be controlled and in areas where access is constrained or dangerous such as inside storage shelving or inside large workpieces, and land mine defusing or bomb disposal.
 
Description The most significant achievements of the award comprise the construction of a robotic hand-eye testbed system that has demonstrated state of the art performance in detecting and localising objects for robotic grasping and manipulation. A novel training pipeline has been developed that support the practical use of deep learning technology for learning object appearance that can be deployed to serve the use-cases envisioned for the technology, namely bin-picking, materials handling and order completion in warehouse and manufacturing scenarios. We have demonstrated object recognition using both conventional 3D computer vision techniques and state of the art deep learning methods. We have also shown that these latter Deep Net based methods are capable of segmenting objects in challenging situations where they are partially occluded, piled in heaps or located inside transport totes. In addition to the above we have demonstrated that it is possible to couple a high-resolution software retina to the deep net to allow large images to be processed in a single pass of the network, greatly improving the efficiency and utility of deep nets. Finally we have improved out 3D imaging algorithms to a level where these appear to be capturing 3D surface information of a quality rivaling laser scanning technology, opening the potential for very low-cost, and highly compact, 3D vision systems.
Exploitation Route The technology we have developed could be applied in a wide variety of industries, manufacturing processes and warehouse order processing systems. The vision components are sufficiently generic to improve operation of a wide variety of autonomous systems including drone aircraft and driverless vehicles. The current consortium is bidding for further funding support to take the system we have developed to the next level of industrial integration and more extensive trials in manufacturing and warehouse order completion scenarios. Separate consortia are bidding for funds to develop the vision technology in the context of drone aircraft control and navigation and also driverless vehicles. In a new collaboration with a number of Institutes of Cognitive Neuroscience we are investigating using the retina-brain model explored in iSee for developing functional visual pathway models linked to real subject data collected using fMRI scanners.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Communities and Social Services/Policy,Construction,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Energy,Environment,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Retail,Transport

 
Description Our findings are being used to support developments in advanced manufacturing and also in how to analyse aerial images of landscape to identify sites of historic interest. However, these are still in the research and development stage, as opposed to deployed practice or products.
First Year Of Impact 2018
Sector Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections
Impact Types Cultural,Policy & public services

 
Description Doctoral Training Account
Amount £62,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start  
 
Title Deep Net Trainign Pipeline 
Description New training pipeline to allow large numbers of segmented and labelled images depicting views of objects to be extracted from video sequences to allow the large scale data required to train Depp Nets to recognise objects for robotics grasping and manipulation. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? No  
Impact Too early as the method has not yet been published. 
 
Title Deep Softare Retina 
Description This is an Engineering/Technology project. We have produced a software retina-DCNN combination that improves the speed at which deep nets can process images and also process large images. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact It is now possible to process images of the order of 1M pixels and larger in a single forward pass through a DCNN, allowing real-time computer vision on embedded computer platforms. 
URL http://www.eyewear-computing.org/EPIC_ICCV17/Short_Papers/EPIC17_id21.pdf
 
Title Retina pipeline codes 
Description Codes to implement a model of the retino-cortical transformation for improving the efficiency of Deep Learning artificial neural networks for visual image processing. 
Type Of Material Model of mechanisms or symptoms - human 
Year Produced 2017 
Provided To Others? Yes  
Impact Support for ~20 undergraduate and MSc student projects and a number of PhD projects. Primary URL: https://github.com/Pozimek/RetinaVision 
URL https://github.com/Pozimek/RetinaVision
 
Title Deep Software Retina 
Description Novel combination of a software retina with a Depp Neural network 
Type Of Material Computer model/algorithm 
Year Produced 2017 
Provided To Others? Yes  
Impact Enables deep nets to process large images > 1Mpixel in a single pass and therefore train and execute 10-100x faster. 
URL http://www.eyewear-computing.org/EPIC_ICCV17/Short_Papers/EPIC17_id21.pdf
 
Title Object DCNN Training Set 
Description Collection of segmented and labelled RGB images depicting views of objects for training Depp Nets (7,500 images per object for 10 objects) 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? No  
Impact The ability to recognise and segment objects which are partially occluded, in heaps or in totes. 
 
Description Shadow-Glasgow-iSee 
Organisation Shadow Robot Company
Country United Kingdom 
Sector Private 
PI Contribution The GU research team has demonstrated functional computer vision systems integrated within the Robot Operating System that are capable of detecting, identifying and localising the pose of household objects to allow these to be grasped and manipulated using a a Shadow SGS (Smart Grasping System) robot manipulator. We have further demonstrated a Deep Learning based vision system capable of identifying and segmenting objects in challenging scenarios (partially occluded, piles or in a transport tote). We have advanced passive stereo-based 3D sensing to operate in robot industrial scenarios and demonstrated an in-hand wide angle camera suitable for use with a software retina developed by GU. Under iSee we developed a high-resolution software retina and have demonstrated its potential to accelerate DL vision systems by reducing the input data to the DCNN by x17, with the potential for greater efficiencies.
Collaborator Contribution Shadow developed the Smart Grasping System 3-fingered hand and integrated this with ROS and the camera and vision systems being developed by GU.
Impact An integrated hand-eye robot manipulation system is in the final stages of development.
Start Year 2017
 
Description ARM Research Summit 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Presentation at the 2017 ARM Research Summit in Cambridge describing our software retina work and its application to robot vision in the iSee project.
Year(s) Of Engagement Activity 2017
URL https://developer.arm.com/research/summit/previous-summits/2017
 
Description Human Brain Project 2017 Meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation describing the software retina work being undertaken and applications in robotics.
Year(s) Of Engagement Activity 2017
URL https://sos.exo.io/public-website-production/filer_public/e0/1c/e01cdb4e-5590-46fb-ae4b-46c38ce0db1d...
 
Description KESS Public Engagement Lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact The iSee PI gave a lecture in 3D vision for Robotics by invitation to the Kilmarnock Engineering Science Society. This produced debate about future developments in automation and AI.
Year(s) Of Engagement Activity 2018
URL http://www.kess2012.org/
 
Description Keynote presentation opening ICMMI 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Personal invitation to the PI to give the keynote presentation opening ICMMI 2017 International Conference on Man-Machine Interactions October 3-6, 2017 Cracow, Poland
Year(s) Of Engagement Activity 2017
URL http://icmmi.polsl.pl/