📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Effective Algorithm Deployment to Hardware with Resource Aware Dynamic Optimisation

Lead Research Organisation: Newcastle University
Department Name: Sch of Computing

Abstract

The commercial electronics industry-including smartphones, wearables, virtual and augmented reality, robotics, autonomous vehicles, and UAVs-continues to demand greater computational performance, energy efficiency, and real-time responsiveness. These demands are met by heterogeneous systems that combine CPUs, GPUs, SoCs, and FPGAs. Each processor type has unique strengths: CPUs offer general-purpose flexibility, GPUs handle parallel workloads efficiently, SoCs provide integration for specific functions, and FPGAs deliver low-latency, reconfigurable acceleration with high energy efficiency.
However, leveraging these platforms for computer vision and machine learning workloads introduces significant development complexity. Efficient deployment requires understanding the programming models, memory hierarchies, and execution behavior of each component. For instance, control logic may be handled by a CPU, convolutional layers offloaded to a GPU, and real-time filtering executed on an FPGA. This mapping must be optimised to balance performance and energy consumption, particularly in constrained edge environments like spaceborne imaging systems or portable medical devices, where power and compute budgets are tightly limited.
This research addresses the lack of a unified framework for deploying vision workloads across heterogeneous platforms. It proposes a language-independent, energy-aware compilation flow that automates profiling, graph partitioning, and runtime tuning of computer vision pipelines targeting CPUs, GPUs, and FPGAs. The goal is to abstract hardware-specific concerns from developers while enabling adaptive deployment based on workload characteristics and device constraints.
The approach includes developing high-level abstractions for common vision operations, enabling portability across hardware backends. These abstractions will be compiled into optimised low-level code using intermediate representations and hardware-aware scheduling strategies. Dynamic optimisation techniques such as runtime load balancing and just-in-time reconfiguration of FPGA logic will be integrated to adapt to changing performance and power requirements.
This methodology will be built by extending the Tensor Virtual Machine framework, which provides modularity, automatic tuning, and broad hardware support. The enhanced compiler will coordinate execution across devices, manage data movement, and apply device-specific optimisations while presenting a unified development interface.
The anticipated outcomes include improved developer productivity, better energy-performance trade-offs, and scalable deployment of computer vision algorithms in embedded AI systems. The framework aims to accelerate innovation in consumer electronics, autonomous systems, healthcare diagnostics, and scientific computing by enabling seamless and efficient utilisation of heterogeneous hardware.

People

ORCID iD

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/W524700/1 30/09/2022 29/09/2028
2934691 Studentship EP/W524700/1 30/09/2023 29/09/2027