MIMIc: Multimodal Imitation Learning in MultI-Agent Environments

Lead Research Organisation: Loughborough University
Department Name: Loughborough University in London

Abstract

In UK, we are not allowed to drive a vehicle until we are 17. It is because, driving is a complex and safety critical activity that requires many advanced cognitive skills like recognition of possible threats, anticipation of behavior of other road users and agile reaction to emerging situations. Think about a football player making decisions on field. A good player can sense the opportunities, through anticipating what other players will do, and select an action that will increase the odds of scoring. It takes a long time for humans to develop these advanced cognitive skills, to become an expert at such complex real-world tasks. Artificial Intelligence has made significant progress during the last decade, demonstrated by breakthroughs in cancer detection, computers beating 'Go' masters and intelligent robotics. However, if AI is to live up to its science fictional promises to assist humanity or even supersede human intelligence, it should at least be equipped with cognitive skills such as those possessed by humans. This project aims to develop ground breaking algorithms that equip autonomous systems with human like cognitive skills required to thrive in real world environments.
We are focused on applications that require autonomous agents (e.g. Robot or Driverless car) to interact with multiple intelligent agents in the environment to accomplish a task (known as Multi-Agent Environments: MAEs). Such applications require an agent to anticipate the behaviour of other agents and to select the most appropriate course of actions. Equipping agents with such autonomous decision-making capability is known as policy learning. Compared to policy learning in single agent domains (teaching a robot to walk or a computer to play a video game), the recent progress of policy learning in MAEs has been quite modest. This is due to multiple reasons: 1)Due to agent actions the environment is dynamic 2)multi-agent policy learning suffers from a theoretical limitation known as curse of dimensionality (CoD) 3)Utility functions that capture agent objectives are difficult to define 4)there is a significant lack of adequate multi-agent datasets that allow meaningful research. This project proposes to undertake research in to policy learning in MAEs, by addressing the above limitations.
Our unique approach to policy learning in MAEs is motivated by how humans thrive in similar settings. Firstly, we perceive the world through multiple senses, (i.e. vision, audition, touch) enabling a rich perception of the world. Secondly, when acting in a MAE, humans do not pay attention to all the stimuli but only to key stimuli e.g. when a football player is attacking the ball, the player pays attention only to the teammates capable of effecting a goal and the key defenders. Finally, the learning paradigm we employ known as imitation learning is an emerging methodology to learn by observing experts, which is a productive approach that we use to learn new skills. Accordingly, we propose to learn realistic policies in MAEs through imitation learning by leveraging multimodal data fusion and selective-attention modelling. Multimodal data fusion allows to capture high dimensional context of the real world and selective attention model allows for allaying the issue of CoD. We have been provided a unique multimodal multi-agent dataset and access to state-of-the-art facilities to capture data, by an elite football club facilitating this ambitious research project.
The project outputs will be subjectively validated as a tool to answer "what-if" questions related to game play in football assisting coaching staff to visualize speculative game strategies, and as a computational benchmark to quantify cognitive skills of football players. The planned impact activities will ensure the project will leave a legacy in AI development benefiting UK PLC through significant contribution in multiple high growth areas, such as driverless vehicles, video gaming, and assistive robots.

Planned Impact

AI is set to contribute £232Bn in to the UK economy by 2030. Towards facilitating such growth, in alignment with connected nation prosperity outcome of EPSRC (More info in Sec. 3 of CfS), this project develops advanced human-like decision making algorithms. Beneficiaries of this project are anyone who could benefit from advanced decision-making algorithms and assessing skills of humans using AI models. Examples of societal and economic impact from different sectors are given below.

Commercial private sector:
Sports content broadcasters can use MIMIc algorithms can be used to visualize speculative game play with realistic simulations to assist expert commentators to discuss on different game strategies;
Video gaming industry (worth ~£3Bn to UK economy) can benefit from algorithms capable of learning realistic strategies and player behaviour from real games, enabling creative video games where the strategies are realistic and where gamers can adapt behaviors of professional players.
Algorithms with human like cognitive skills will benefit autonomous vehicle/collaborative robot design. This enables humans and autonomous systems to coexist. Imitation learning algorithms developed in the project will equip autonomous vehicles/ robots with intelligent control algorithms that are empathetic towards humans.
Sports analytics industry, set to worth $4.5Bn by 2024, can benefit from the project outcomes through algorithms that fuse multimodal data sources to gain insights to game play, and to gain competitive edge through creative game strategies.
Project demonstrates the use of AI models as a benchmark to assess human decision making skills. Such methodology can enable industries such as manufacturing/sports/education to identify and develop talent thereby improving the workforce.

Policy-makers, public authorities & third sector:
Government funded agencies and charities promoting physical activity, and participation in sports, can benefit from this project, by using the demonstrator to raise awareness of sports, and facilitate measuring physical activity.
Urban planning authorities can gain insights into human behavior in cities through multimodal imitation learning and simulation: e.g. to gain realistic insights of the impact of traffic, or adverse weather conditions; Intelligent infrastructure in smart cities can learn from human interactions to enforce safe and efficient strategies: e.g. road side units can learn policies by analysing safe driving patterns, and then enforce such policies through vehicle-to-infrastructure communication.
The project algorithms can be utilized to assess skills of individuals and teams benefiting selection and training of military personnel; Combination of models with virtual reality, will facilitate realistic virtual training regimes.
Science promotion organizations will benefit from the project demonstrators to show the impact of AI on the society, through a poignant topic of public interest, i.e. football.

Societal impacts
The public who will eventually encounter robots, such as driverless vehicles, will benefit because the robots will be more aware of human actions.
Football fans will benefit from novel visualizations of player strategies, which enables intelligible enjoyment;
Coaches in schools who want to use examples from professional players and discuss consequences of actions can benefit from the project demonstrators.
Video gamers (37Mn in UK) who wish to play games that are realistic and correspond to real game strategies will also be a beneficiary.
Urban dwellers, city workers, and local councils will benefit from improved city planning to make their journeys enjoyable, reduce pollution and improved safety;
Finally, it is recognized that improved skills in machine learning, simulation and research methodologies has a critical role in bridging the gap between academia and industry, especially in the fields of multi-agent systems and data driven policy learning

Publications

10 25 50
 
Description As set out in the project proposal, one of the key challenges for my research is to develop intelligent agents that make decisions in real world multi agent environments.

We have made two key practical engineering developments during the last year to overcome this challenge.
Firstly, when making decisions we made the agent to consider only the nearby agents, thus significantly reducing the complexity of the learning task. Secondly, by leveraging very large computational units known as deep neural networks, and by designing a suitable architecture, we have developed an algorithm to learn such agents. The team has come up with three major investigations and algorithms as possible solutions (Algorithms: sequence of computer instructions) that answer the following questions:
1) How can agents learn to coordinate their actions to accomplish a collective goal, by acting independent but learning efficiently from each other's experience?
2) How can agents learn to cooperate, when they cannot exactly learn from each other's experience? The algorithm pertains to be more self aware using causality estimations. We have further incorporated state of the art causality estimation deep learning method to automate the improvements to multi-agent reinforcement learning using this concept.
3) How can we efficiently train a large group of agents to cooperate, by structuring the learning process in an appropriate manner? This is similar to how we come up with a curriculum to train the agents efficiently.
These major breakthroughs have originated in two major publications published in IEEE Transactions on Neural Networks and Learning Systems and in archived conference proceedings of AAMAS 2023, the premier multi-agent systems conference in the world. Furthermore, applied findings have also been reported in multiple conferences and journals as reported.

Furthermore, towards bridging the gap between theoretical developments and real world deployment, the project set out to demonstrate the developments within a use case of football training in collaboration with Chelsea Football Club. This involves developing artificial models of footballers, which may then act as benchmarks to compare real performances of players. Towards this end, one key stream of information to developing the artificial models of players is to understand what individual players do with their bodies on field. As such, 3-dimensional body pose estimation is an important development in this area. Such developments are also contributing towards different applications such as intelligent mobility involving development of driverless vehicles (a driverless vehicle need to be aware of what humans around it is doing).
There were two aspects of development here. Firstly, we developed a multimodal human activity recognition module for intelligent mobility applications. This has the capability to select and understand human activity from point cloud data. Secondly we are have developed a 3D body pose estimation technique to be used for sports. The work towards understanding human activity from point cloud data is now published in IEEE transactions on Cybernetics. The work on 3D pose estimation for sports will be sent for review shortly. Specifically, within the development of 3D pose estimation we are mostly investigating the possibility to utilise existing estimation algorithms to easily transfer to different data sets in a very labor efficient manner (We call this active learning). A major piece of activity involved the development of a single camera 3D pose estimation technique which has important implications on democratising the AI developments to grassroot level football clubs, and also finds applications in fields such as robotic surgery, body vibration analysis and human activity recognition in pedestrianised areas. Three publications have originated on this activity, which are currently being reviewed in International Joint Conference in AI and two other IAPR sponsored conferences.

Combining the developments in multi-agent learning and body pose estimation, three PhD students associated with the project as Research Assistants, are developing a simulation framework to visualize player movements and discuss different "what-if" questions in football. Specifically, these projects try to evaluate how player movements are affecting the overall outcome of games. One PhD student Student has been completed with significant links to the development of Autonomous systems, and these three PhD students are nearing their completion in relation to the investigation of new AI methods and application of AI to team sports analytics.
Exploitation Route We are using these theoretical developments to develop intelligent agents for football skill measurement. This is working alongside our project partner Chelsea FC academy.

Specifically, we are working with the following external stakeholders:
Chelsea FC Academy: with a view to utilize the developments as a player action reflection tool
Toyota Manufacturing UK: utilize the findings and experimental platforms for creating knowledge around autonomous vehicles.
Breakaway data: to utilize some of the football related findings in their suite of products
Sportsology: a consultancy eager to popularize the gamification for player training

Since the last submission, PI has engaged with the EPSRC national security and defence sandpit, through which he gained further funding to apply the foundations of multi-agent RL and multimodal machine learning in the context of healthcare delivery in contested battlefields. This is a major step forward, originating from this activity, where the team will be working along side stakeholders including DSTL and Ministry of Defense.

PI is also seeking to apply some of these techniques within context of medical decision making, for which he works with the NHS Barts Trust, and University of Cambridge.

PI has also identified a useful avenue of investigation on how to incorporate imitation learning for intangible cultural heritage preservation, and is working in a cross-disciplinary team on this.

Some of the methods of our research that involved in embedding a team strategy in to building artificial football models, through reward machines, have been identified as a method to incoorperate ethics and legal frameworks in to machine learning /AI methods in military. For this, we work with TC Bernie School of Law in University of Queensland and Australia Defence Force.
Sectors Aerospace, Defence and Marine,Creative Economy,Digital/Communication/Information Technologies (including Software),Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections,Transport

 
Description As part of this project, we are engaging with the Chelsea Football club academy, and several other related companies such as Toyota Manufacturing UK, Breakaway and Sportsology. The findings of the project are being discussed with the stakeholders at Chelsea, and we are assisted by their data. The developments of this project were demonstrated to partners through a series of 3 workshops. Stakeholders from Chelsea FC and Breakaway groups have participated in these to identify how to integrate the project outcomes to their player assessment frameworks. The companies have indicated the willingness to continue to cooperate. Apart from discussions with ChelseaFC, the PI has been engaged with several public engagement activities to explain the research to a wider public domain. The high light is the session with BBC Radio on the use AI in driverless cars, a panel discussion in Responsible AI in the Military Summit in The Hague, organised by the Ministry of Foreign Affairs in Netherlands.
First Year Of Impact 2021
Sector Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Leisure Activities, including Sports, Recreation and Tourism,Transport
Impact Types Societal,Economic

 
Description ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage
Amount £869,031 (GBP)
Funding ID EP/X028631/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2023 
End 04/2026
 
Description ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage 
Organisation Edge Hill University
Country United Kingdom 
Sector Academic/University 
PI Contribution I provide the autonomous capabilities to the drone technology used, including multimodal machine learning, reinforcement learning based control and casualty triaging.
Collaborator Contribution Computer vision, drone control and ethical principles of trustworthiness.
Impact For now we are still setting out to start the funded project. We have got a grant of £869000 from EPSRC for this work, collaboration agreements are in place now.
Start Year 2022
 
Description ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage 
Organisation University of Portsmouth
Country United Kingdom 
Sector Academic/University 
PI Contribution I provide the autonomous capabilities to the drone technology used, including multimodal machine learning, reinforcement learning based control and casualty triaging.
Collaborator Contribution Computer vision, drone control and ethical principles of trustworthiness.
Impact For now we are still setting out to start the funded project. We have got a grant of £869000 from EPSRC for this work, collaboration agreements are in place now.
Start Year 2022
 
Description Preservation of Intangible Cultural Heritage 
Organisation Loughborough University
Country United Kingdom 
Sector Academic/University 
PI Contribution I contribute in the area of Machine Learning and data-driven imitation model building.
Collaborator Contribution Prof. Massimiliano Secca Dr. Russell Lock Dr. Arianna Miorani All from different schools of Loughborough University
Impact The idea is to utilize multimodal machine learning and multi-agent imitation learning to develop models of people performing intangible cultural heritage. These are then to preserved as virtualized models with which users can interact with. The hypothesis is that it allows the intangible cultural heritage to be preserved through interaction and hence the with it. This will change the way museums are preserving cultural heritage.
Start Year 2022
 
Description Testing the feasibility of the technology in developing countries 
Organisation Sri Lanka Institute of Information Technology
Country Sri Lanka 
Sector Academic/University 
PI Contribution Assisted with setting up a data collection platform for collecting vehicular data. Supply of sensors for the project Discussion of research
Collaborator Contribution Setting up of a road vehicle, including modifications Getting legal permission from Sri lankan Authorities Data collection with assistance from Sri Lanka Police Data curation
Impact Data set for analysis of driving conditions in unstructured traffic environments. Data set is now being used in a PhD student project.
Start Year 2020
 
Description BBC Interview 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact A special episode looking at AI - why it still is far from perfect? We discuss what would happen if you took a driverless car from the streets of California and put it on roads in a developing country, why deep fakes are so difficult to detect and how the images that are used to teach machines to recognise things are biased against women and ethnic minorities.
Year(s) Of Engagement Activity 2019
URL https://www.bbc.co.uk/programmes/w3csy67d
 
Description REAIM Summit Panel 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Organised and participated in a panel discussion titled "Responsible AI Framework for Military Domain," which featured a group of prominent panelists Peter Lee Tasos Dagiuklas Varuna De Silva Ioannis A. Kakadiaris and Prof Rosaria Taddeo . The panel discussion focused on critical aspects of responsible AI governance, which included international collaboration, multidisciplinary stakeholder dialogue, AI quantifying standards, ethical considerations, transparency, and accountability.

The panelists emphasized the importance of international collaboration in responsible AI governance. They highlighted the need for a multidisciplinary and multi-stakeholder approach that involves input from a wide range of experts from different fields, including technology, ethics, and international relations.

The panelists highlighted the importance of AI quantifying standards, which should be put in place to ensure that AI is developed and deployed in a responsible manner. The standards should be able to measure and evaluate the impact of AI on society, and they should also include mechanisms for ensuring ethical considerations, transparency, and accountability.

In addition to the critical aspects discussed in the panel, the panelists also pin the importance of ethical considerations in responsible AI governance. They emphasized the need for transparency and accountability in the development and deployment of AI technologies specifically in militarydomai.
Year(s) Of Engagement Activity 2023
URL https://reaim2023.org/events/responsible-ai-framework-for-military-domain-development-and-critical-a...
 
Description Security Advisory 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact To deepen knowledge about concrete applications of AI in the military domain, the The Hague Center for Strategic Studies (HCSS) is completing a research project together with Datenna for the Netherlands Ministry of Defence. This research focuses specifically on the building blocks, such as advanced semiconductors, that will enable the most promising AI applications in the military field. To set up this theoretical framework, HCSS interviews 20 experts to answer one or more of the following questions:
a. What are the most promising applications of AI to enhance the capabilities of militaries?
b. How can AI applications enhance specific key military capabilities, especially sensing, swarming, targeting?
c. Which militaries are currently world leading in developing these capabilities?
d. What properties should semiconductors have to unlock artificial intelligence applications, which can be used, among other things, to enhance important military capabilities, in particular sensing, swarming, targeting?
e. What are the most important developments in the designing and manufacturing of AI chips?
f. Which building blocks - in addition to advanced semiconductors - are important to strengthen the aforementioned three important military capabilities by means of AI?
Year(s) Of Engagement Activity 2023
URL https://hcss.nl/europe-in-the-indo-pacific-hub/