iSee - Intelligent Vision for Grasping
Lead Research Organisation:
University of Glasgow
Department Name: School of Computing Science
Abstract
Intelligent vision is a key enabler for future robotics technology. Shadow's recent development work on a disruptive universal gripper (launched at Innovate16) has identified a need for better vision to permit the automation of grasping. Building on R&D at the University of Glasgow for over 20 years, the iSee project will establish concrete robot vision benchmarks, based on commercially relevant scenes, and then develop, validate and integrate vision sensors and processing algorithms into the Smart Grasping System (SGS) to enable it to reach significant new markets in automation, logistics and service robotics. The following candidate sensors have been selected for benchmarking:
A. Low-cost time of Flight (TOF) 3D cameras (available from various companies)
B. Stereo pairs of off-the-shelf HD cameras and stereo pairs of embedded vision sensors in conjunction with CVAS's existing custom stereo-pair image matching and photogrammetry software
C. An Asus Xtion RGBD camera will serve as a benchmark reference sensor
We propose to build an integrated hand-eye system for each sensor listed above, along with appropriate lighting, and to develop complete integrated pipelines to benchmark the different combinations of capture and analysis systems on the specified scenarios. This investigation will allow us to determine the performance of 2D and 3D sensing methods in terms of quality of image capture and hand-eye performance.
We propose also to evaluate the new and highly disruptive Deep Convolutional Neural net technology (DCNNs) that has the potential to leapfrog the best algorithmic vision methods and to provide a fast, accurate and complete vision solution that meets the demands of advanced robotic grasping and manipulation. We will thus augment the evaluation with potentially very efficient and highspeed DCNN algorithms for interpreting images using potentially low-cost sensors for:
* Detecting and localising known objects and estimating their pose for grasping purposes
* Estimating depth, size and surface normals directly from single monocular images, using transfer methods
* Recovering depth from binocular and monocular camera systems using stereo matching and structure from motion (optical flow) respectively
Once trained, DCNNs can analyse images very quickly and are now becoming suitable for low-cost embedded platforms, such as smartphones. This aspect of the proposed investigation has the potential to massively simplify the sensor in terms of hardware only single cameras, or stereo pairs of cameras, are required in combination with DCCNs as the basis for a vision system which might potentially provide all of the functionality required to control the hand in a wide range of scenarios.
Benchmark results will let us develop specific camera-algorithm combinations to improve performance for the specified use cases over a number of test-evaluate-improve iterations. The core 3D sensing approaches will be integrated with the SGS, and we shall evaluate additional cameras situated off-hand providing critical ancillary visual input when objects are close to the gripper camera prior to a grasp, or during in-hand manipulation. Both camera systems will be used to acquire different views of the scene, with both systems mounted on a robot arm for interactive perception of the scene and objects contained within it.
In parallel, we will develop a showcase demonstration system at Shadow based on Shadow's current grasp planning software coupled to the 3D images captured by the benchmarked 3D vision systems. The developed vision modules will be encapsulated within the "Blocky" programming system to afford a simple and direct method for end-users to take advantage of this capability.
In conclusion, we believe that the robotics hand-eye pipelines proposed within the iSee project have the potential to play an important role in maintaining market leadership in the development of complete robotics system solutions.
A. Low-cost time of Flight (TOF) 3D cameras (available from various companies)
B. Stereo pairs of off-the-shelf HD cameras and stereo pairs of embedded vision sensors in conjunction with CVAS's existing custom stereo-pair image matching and photogrammetry software
C. An Asus Xtion RGBD camera will serve as a benchmark reference sensor
We propose to build an integrated hand-eye system for each sensor listed above, along with appropriate lighting, and to develop complete integrated pipelines to benchmark the different combinations of capture and analysis systems on the specified scenarios. This investigation will allow us to determine the performance of 2D and 3D sensing methods in terms of quality of image capture and hand-eye performance.
We propose also to evaluate the new and highly disruptive Deep Convolutional Neural net technology (DCNNs) that has the potential to leapfrog the best algorithmic vision methods and to provide a fast, accurate and complete vision solution that meets the demands of advanced robotic grasping and manipulation. We will thus augment the evaluation with potentially very efficient and highspeed DCNN algorithms for interpreting images using potentially low-cost sensors for:
* Detecting and localising known objects and estimating their pose for grasping purposes
* Estimating depth, size and surface normals directly from single monocular images, using transfer methods
* Recovering depth from binocular and monocular camera systems using stereo matching and structure from motion (optical flow) respectively
Once trained, DCNNs can analyse images very quickly and are now becoming suitable for low-cost embedded platforms, such as smartphones. This aspect of the proposed investigation has the potential to massively simplify the sensor in terms of hardware only single cameras, or stereo pairs of cameras, are required in combination with DCCNs as the basis for a vision system which might potentially provide all of the functionality required to control the hand in a wide range of scenarios.
Benchmark results will let us develop specific camera-algorithm combinations to improve performance for the specified use cases over a number of test-evaluate-improve iterations. The core 3D sensing approaches will be integrated with the SGS, and we shall evaluate additional cameras situated off-hand providing critical ancillary visual input when objects are close to the gripper camera prior to a grasp, or during in-hand manipulation. Both camera systems will be used to acquire different views of the scene, with both systems mounted on a robot arm for interactive perception of the scene and objects contained within it.
In parallel, we will develop a showcase demonstration system at Shadow based on Shadow's current grasp planning software coupled to the 3D images captured by the benchmarked 3D vision systems. The developed vision modules will be encapsulated within the "Blocky" programming system to afford a simple and direct method for end-users to take advantage of this capability.
In conclusion, we believe that the robotics hand-eye pipelines proposed within the iSee project have the potential to play an important role in maintaining market leadership in the development of complete robotics system solutions.
Planned Impact
The iSee project will provide a technology that can utilise in assistive and social settings that will underpin fundamental research and commercial collaborations and will deliver impacts in the Knowledge, Economy, Society, and People areas.
Knowledge
In iSee, we will ensure that the understanding of problems and solutions generated by this activity will flow in both directions between the Shadow and the CVAS group. Therefore, direct scientific impacts will occur if the benefits of the iSee's vision library lead to its adoption within the smart grasping system. Indirect impacts will occur if iSee's integrated smart sensing is adopted in other robotic scenarios outside the current scope of the project -- e.g. assistive and care roles, human-robot interaction, etc.
Robotics and autonomous systems are recognised by the UK as one of the eight grand technologies of the future of which iSee's vision and grasping will subserve as the founding robotic platform for the development and design of new robotic and autonomous technologies. In the longer term enhanced and integrated visual sensing technologies within smart grasping systems will encourage the development of new types of robots and robotic systems. Specifically, robots will be capable of working in new areas such as constrained places in manufacturing that are challenging for humans to access, and to operate in environments not suitable for industrial vision.
Economy
Robotic technologies are increasingly used in high wage economies such as the UK, and it is anticipated to be one of the drivers of the fourth industrial revolution. The technology developed in iSee will provide the industrial sector with immediate economic impact in terms of the exploitation of new product sales and profitability that is tightly coupled with design, research, production, sales, user feedback and field trials.
It is anticipated that robotics will fundamentally reduce labour costs by replacing a large proportion of routine roles. The Copenhagen study shows that UK adoption of industrial automation will produce a long-term increase in productivity of 22% and workforce increase of 7.4% as staff are re-skilled and moved to higher skilled roles. In this context, iSee's will be extremely significant since the project will have impact on the development of future service robots unlocking new industries. Vision for service robots is a significant challenge, and if we can deploy an effective sensorised solution, we have the potential to enable a new wave of startups creating vision-enabled service robots across multiple market domains.
People
iSee will facilitate the development of new vision and robotics skills in the research associates and CVAS academic staff. People with robotics and autonomous systems expertise are in high demand in both industry and academia and are significant and economic contributors. Likewise, the ability to deploy robots to perform a wider range of repetitive manual tasks will reduce the incidence of industrial injuries due to tiredness and boredom, and, hence, improve the quality of people's working environments.
Society
Robots are also key to addressing social challenges in high wage economies, e.g. increasing healthcare demands and the aging population. The enhanced reliability that iSee will deliver is essential for disruptive new robotic technologies that are not currently possible such as hospitals and care facilities. A new generation of robots may be deployed in assistive and care roles, which could have significant impact on social care and aging society challenges. End effectors that can see mean that robots can work in areas where illumination cannot be controlled and in areas where access is constrained or dangerous such as inside storage shelving or inside large workpieces, and land mine defusing or bomb disposal.
Knowledge
In iSee, we will ensure that the understanding of problems and solutions generated by this activity will flow in both directions between the Shadow and the CVAS group. Therefore, direct scientific impacts will occur if the benefits of the iSee's vision library lead to its adoption within the smart grasping system. Indirect impacts will occur if iSee's integrated smart sensing is adopted in other robotic scenarios outside the current scope of the project -- e.g. assistive and care roles, human-robot interaction, etc.
Robotics and autonomous systems are recognised by the UK as one of the eight grand technologies of the future of which iSee's vision and grasping will subserve as the founding robotic platform for the development and design of new robotic and autonomous technologies. In the longer term enhanced and integrated visual sensing technologies within smart grasping systems will encourage the development of new types of robots and robotic systems. Specifically, robots will be capable of working in new areas such as constrained places in manufacturing that are challenging for humans to access, and to operate in environments not suitable for industrial vision.
Economy
Robotic technologies are increasingly used in high wage economies such as the UK, and it is anticipated to be one of the drivers of the fourth industrial revolution. The technology developed in iSee will provide the industrial sector with immediate economic impact in terms of the exploitation of new product sales and profitability that is tightly coupled with design, research, production, sales, user feedback and field trials.
It is anticipated that robotics will fundamentally reduce labour costs by replacing a large proportion of routine roles. The Copenhagen study shows that UK adoption of industrial automation will produce a long-term increase in productivity of 22% and workforce increase of 7.4% as staff are re-skilled and moved to higher skilled roles. In this context, iSee's will be extremely significant since the project will have impact on the development of future service robots unlocking new industries. Vision for service robots is a significant challenge, and if we can deploy an effective sensorised solution, we have the potential to enable a new wave of startups creating vision-enabled service robots across multiple market domains.
People
iSee will facilitate the development of new vision and robotics skills in the research associates and CVAS academic staff. People with robotics and autonomous systems expertise are in high demand in both industry and academia and are significant and economic contributors. Likewise, the ability to deploy robots to perform a wider range of repetitive manual tasks will reduce the incidence of industrial injuries due to tiredness and boredom, and, hence, improve the quality of people's working environments.
Society
Robots are also key to addressing social challenges in high wage economies, e.g. increasing healthcare demands and the aging population. The enhanced reliability that iSee will deliver is essential for disruptive new robotic technologies that are not currently possible such as hospitals and care facilities. A new generation of robots may be deployed in assistive and care roles, which could have significant impact on social care and aging society challenges. End effectors that can see mean that robots can work in areas where illumination cannot be controlled and in areas where access is constrained or dangerous such as inside storage shelving or inside large workpieces, and land mine defusing or bomb disposal.
Publications
Hristozova Nina
(2018)
Efficient Egocentric Visual Perception Combining Eye-tracking, a Software Retina and Deep Learning
in arXiv e-prints
Ozimek P
(2019)
A Space-Variant Visual Pathway Model for Data Efficient Deep Learning.
in Frontiers in cellular neuroscience
Ozimek, P.
(2019)
A Space-Variant Visual Pathway Model for Data Efficient Deep Learning
in Frontiers in Cellular Neuroscience
Siebert, J.P.
(2018)
Smart Visual Sensing Using a Software Retina Model
Siebert, J.P.
(2017)
Advances in Hand-Eye Robot Interactions
Description | The most significant achievements of the award comprise the construction of a robotic hand-eye testbed system that has demonstrated state of the art performance in detecting and localising objects for robotic grasping and manipulation. A novel training pipeline has been developed that support the practical use of deep learning technology for learning object appearance that can be deployed to serve the use-cases envisioned for the technology, namely bin-picking, materials handling and order completion in warehouse and manufacturing scenarios. We have demonstrated object recognition using both conventional 3D computer vision techniques and state of the art deep learning methods. We have also shown that these latter Deep Net based methods are capable of segmenting objects in challenging situations where they are partially occluded, piled in heaps or located inside transport totes. In addition to the above we have demonstrated that it is possible to couple a high-resolution software retina to the deep net to allow large images to be processed in a single pass of the network, greatly improving the efficiency and utility of deep nets. Finally we have improved out 3D imaging algorithms to a level where these appear to be capturing 3D surface information of a quality rivaling laser scanning technology, opening the potential for very low-cost, and highly compact, 3D vision systems. |
Exploitation Route | The technology we have developed could be applied in a wide variety of industries, manufacturing processes and warehouse order processing systems. The vision components are sufficiently generic to improve operation of a wide variety of autonomous systems including drone aircraft and driverless vehicles. The current consortium is bidding for further funding support to take the system we have developed to the next level of industrial integration and more extensive trials in manufacturing and warehouse order completion scenarios. Separate consortia are bidding for funds to develop the vision technology in the context of drone aircraft control and navigation and also driverless vehicles. In a new collaboration with a number of Institutes of Cognitive Neuroscience we are investigating using the retina-brain model explored in iSee for developing functional visual pathway models linked to real subject data collected using fMRI scanners. |
Sectors | Aerospace Defence and Marine Agriculture Food and Drink Chemicals Communities and Social Services/Policy Construction Creative Economy Digital/Communication/Information Technologies (including Software) Education Electronics Energy Environment Healthcare Leisure Activities including Sports Recreation and Tourism Manufacturing including Industrial Biotechology Culture Heritage Museums and Collections Pharmaceuticals and Medical Biotechnology Retail Transport |
Description | Our findings are being used to support developments in advanced manufacturing and also in how to analyse aerial images of landscape to identify sites of historic interest. However, these are still in the research and development stage, as opposed to deployed practice or products. |
First Year Of Impact | 2018 |
Sector | Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections |
Impact Types | Cultural Policy & public services |
Description | Doctoral Training Account |
Amount | £62,000 (GBP) |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start |
Title | Deep Net Trainign Pipeline |
Description | New training pipeline to allow large numbers of segmented and labelled images depicting views of objects to be extracted from video sequences to allow the large scale data required to train Depp Nets to recognise objects for robotics grasping and manipulation. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2018 |
Provided To Others? | No |
Impact | Too early as the method has not yet been published. |
Title | Deep Softare Retina |
Description | This is an Engineering/Technology project. We have produced a software retina-DCNN combination that improves the speed at which deep nets can process images and also process large images. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | It is now possible to process images of the order of 1M pixels and larger in a single forward pass through a DCNN, allowing real-time computer vision on embedded computer platforms. |
URL | http://www.eyewear-computing.org/EPIC_ICCV17/Short_Papers/EPIC17_id21.pdf |
Title | Retina pipeline codes |
Description | Codes to implement a model of the retino-cortical transformation for improving the efficiency of Deep Learning artificial neural networks for visual image processing. |
Type Of Material | Model of mechanisms or symptoms - human |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | Support for ~20 undergraduate and MSc student projects and a number of PhD projects. Primary URL: https://github.com/Pozimek/RetinaVision |
URL | https://github.com/Pozimek/RetinaVision |
Title | Deep Software Retina |
Description | Novel combination of a software retina with a Depp Neural network |
Type Of Material | Computer model/algorithm |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | Enables deep nets to process large images > 1Mpixel in a single pass and therefore train and execute 10-100x faster. |
URL | http://www.eyewear-computing.org/EPIC_ICCV17/Short_Papers/EPIC17_id21.pdf |
Title | Object DCNN Training Set |
Description | Collection of segmented and labelled RGB images depicting views of objects for training Depp Nets (7,500 images per object for 10 objects) |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | No |
Impact | The ability to recognise and segment objects which are partially occluded, in heaps or in totes. |
Description | Shadow-Glasgow-iSee |
Organisation | Shadow Robot Company |
Country | United Kingdom |
Sector | Private |
PI Contribution | The GU research team has demonstrated functional computer vision systems integrated within the Robot Operating System that are capable of detecting, identifying and localising the pose of household objects to allow these to be grasped and manipulated using a a Shadow SGS (Smart Grasping System) robot manipulator. We have further demonstrated a Deep Learning based vision system capable of identifying and segmenting objects in challenging scenarios (partially occluded, piles or in a transport tote). We have advanced passive stereo-based 3D sensing to operate in robot industrial scenarios and demonstrated an in-hand wide angle camera suitable for use with a software retina developed by GU. Under iSee we developed a high-resolution software retina and have demonstrated its potential to accelerate DL vision systems by reducing the input data to the DCNN by x17, with the potential for greater efficiencies. |
Collaborator Contribution | Shadow developed the Smart Grasping System 3-fingered hand and integrated this with ROS and the camera and vision systems being developed by GU. |
Impact | An integrated hand-eye robot manipulation system is in the final stages of development. |
Start Year | 2017 |
Description | ARM Research Summit 2017 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Presentation at the 2017 ARM Research Summit in Cambridge describing our software retina work and its application to robot vision in the iSee project. |
Year(s) Of Engagement Activity | 2017 |
URL | https://developer.arm.com/research/summit/previous-summits/2017 |
Description | Human Brain Project 2017 Meeting |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation describing the software retina work being undertaken and applications in robotics. |
Year(s) Of Engagement Activity | 2017 |
URL | https://sos.exo.io/public-website-production/filer_public/e0/1c/e01cdb4e-5590-46fb-ae4b-46c38ce0db1d... |
Description | KESS Public Engagement Lecture |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Public/other audiences |
Results and Impact | The iSee PI gave a lecture in 3D vision for Robotics by invitation to the Kilmarnock Engineering Science Society. This produced debate about future developments in automation and AI. |
Year(s) Of Engagement Activity | 2018 |
URL | http://www.kess2012.org/ |
Description | Keynote presentation opening ICMMI 2017 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Personal invitation to the PI to give the keynote presentation opening ICMMI 2017 International Conference on Man-Machine Interactions October 3-6, 2017 Cracow, Poland |
Year(s) Of Engagement Activity | 2017 |
URL | http://icmmi.polsl.pl/ |