Self-supervised reinforcement learning for robotics manipulation and grasping tasks

Lead Research Organisation: University of Bristol

Department Name: Aerospace Engineering

Abstract

The overall aim of the project is to develop novel reinforcement learning algorithms inspired by neuroscience and apply them in the robotic environments to improve the manipulation tasks.

Recent advancements in the field of deep learning shows that self-supervised learning has a close connection with the way brains learn. By exploiting this learning paradigm, robotics would benefit from a vast amount of unsupervised data. Main advantage of self-supervised learning, or in general unsupervised learning, consist on not relying on labelled data, which is an expensive process, instead they define their own labels such that they can extract meaningful features for further downstream tasks [1]. However, the relevance of the extracted features to the final task must be considered. To ensure the relevance, in reinforcement learning, this could be achieved by combining self-supervised exploratory (intrinsically motivated) behaviour with a task-specific (extrinsic) reward given when the task is completed. Leveraging the intrinsic and extrinsic rewards would enable to gather features cheaply through unsupervised (self-supervised) learning while ensuring that these features are relevant for the task through the extrinsic reward signal.

The goal of these approaches will be to allow a robotic arm to learn through vision-based images without relying only on low-level features such as joints angles and velocities. Specific tasks will include reaching a target location, pushing an object to a destination and picking and placing objects of different sizes and shapes. Although these are fairly standard tasks in robotics and reinforcement learning, the aim is to achieve these through enhanced representation learning. While developing the algorithm, further inspiration will be taken from neuroscience such as the plausibility of performing backpropagation; neural-networks rely on the backpropagation of the final error to the first layer, while the brain most likely propagates a local error at each layer. Computation of local errors inspired decoupled networks. Each module of this networks is loosely independent of each other and does not perform full backpropagation, yet it achieves satisfactory performances in vision and speech processing [2]. These methods have not been applied to reinforcement learning and robotics specifically. This could potentially improve the generalisation of the objects interacting with the robot as well as of the tasks performed by the robotics arm. Moreover, it could speed up learning as complex networks architectures can be split into multiple modules, avoiding backpropagation through all layers.

Since the focus is vision-based features, this would enable us to explore both camera-based images as well as visual-tactile sensors.
All experiments will be carried out both in simulation as well as with real-world physical arms, UR10, Panda Franka and Dobot. For the simulation, PyBullet will be preferred.

Milestones:
- Develop a new model of intrinsically-driven learning (reinforcement learning and self-supervised learning) inspired by neuroscience.
- Test in the simulation (sim2real).
- Test in a simple robot.
- Gradually increase the complexity of the tests through more challenging tasks (in terms of control behaviour).
- Additionally, develop another model (based on reinforcement learning, self-supervised learning and neuroscience) to improve the overall performance.

Planned Impact

FARSCOPE-TU will deliver a step change in UK capabilities in robotics and autonomous systems (RAS) by elevating technologies from niche to ubiquity. It meets the critical need for advanced RAS, placing the UK in prime position to capture a significant proportion of the estimated $18bn global market in advanced service robotics. FARSCOPE-TU will provide an advanced training network in RAS, pump priming a generation of professional and adaptable engineers and leaders who can integrate fundamental and applied innovation, thereby making impact across all the "four nations" in EPSRC's Delivery Plan. Specifically, it will have significant immediate and ongoing impact in the following six areas:
1. Training: The FARSCOPE-TU coherent strategy will deliver five cohorts trained in state-of-the-art RAS research, enterprise, responsible innovation and communication. Our students will be trained with wide knowledge of all robotics, and deep specialist skills in core domains, all within the context of the 'innovation pipeline', meeting the need for 'can-do' research engineers, unafraid to tackle new and emergent technical challenges. Students will graduate as future thought leaders, ready for deployment across UK research and industrial innovation.
2. Partner and industrial impact: The FARSCOPE-TU programme has been designed in collaboration with our industrial and end-user partners, including: DSTL; Thales; Atkins; Toshiba; Roke Manor Research; Network Rail; BT; National Nuclear Lab; AECOM; RNTNE Hospital; Designability; Bristol Heart Inst.; FiveAI; Ordnance Survey; TVS; Shadow Robot Co.; React AI; RACE (part of UKAEA) and Aimsun. Partners will deliver context and application-oriented training direct to the students throughout the course, ensuring graduates are perfectly placed to transition into their businesses and deliver rapid impact.
3. RAS community: FARSCOPE-TU will act as multidisciplinary centre in robotics and autonomous systems for the whole RAS community, provide an inclusive model for future research and training centres and bring new opportunities for networking between other centres. These include joint annual conference with other RAS CDTs and training exchanges. FARSCOPE-TU will generate significant international exposure within and beyond the RAS community, including major robotics events such as ICRA and IROS, and will interface directly with the UK-RAS network.
4. Societal Impact: FARSCOPE-TU will promote an informed debate on the adoption of autonomous robotics in society, cutting through hype and fear while promoting the highest levels of ethics and safety. All students will design and deliver public engagement events to schools and the public, generating knock-on impact in two ways: greater STEM uptake enhances future economic potential, and greater awareness makes people better users of robots, amplifying societal benefits.
5. Economic impact: FARSCOPE-TU will not only train cohorts in fundamental and applied research but will also demonstrate how to bridge the "technology valley of death" between lower and higher TRL. This will enable students to exploit their ideas in technology incubators (incl. BRL incubator, SetSquared and EngineShed) and through IP protection. FARSCOPE-TU's vision of ubiquitous robotics will extend its impact across all UK industrial and social sectors, from energy suppliers, transport and agriculture to healthcare, aging and human-machine interaction. It will pump-prime ubiquitous UK robotics, inspiring and enabling myriad new businesses and economic and social impact opportunities.
6. Long-term Impact: FARSCOPE-TU will have long-term impact beyond the funded lifetime of the Centre through a network for alumni, enabling knowledge exchange and networking between current and past students, and with partners and research groups. FARSCOPE-TU will have significant positive impact on the 80-strong non-CDT postgraduate student body in BRL, extending best-practice in supervision and training.

Student:

Dabal Pedamonti

Period of Study:

Oct 19 - Sep 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2261278

Research Topic:

Unclassified

Organisations

University of Bristol (Lead Research Organisation)

People	ORCID iD
Arthur Richards (Primary Supervisor)
Rui Ponte Costa (Primary Supervisor)	http://orcid.org/0000-0003-2595-2027
Dabal Pedamonti (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S021795/1			01/10/2019	31/03/2028
2261278	Studentship	EP/S021795/1	01/10/2019	15/09/2023	Dabal Pedamonti