Back to projects
Home/Projects/fpga-cnn-localisation

Robust FPGA Centric CNN Visual Localisation for GPS Denied UAVs

SHIPPED
Updated: Mon Dec 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) | Started: Sat Feb 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

End to end development of a compact CNN for absolute visual localisation, quantised and deployed on an AMD Kria KV260 FPGA DPU for real time, low power inference in GPS denied UAV scenarios.

Gallery

9 images
UQ Illuminate with supervisor Matthew D'Souza
UQ Illuminate presentation to Boeing - 1
UQ Illuminate presentation to Boeing - 2

Videos

Presentation @ UQ Fintech Project Showcase

Notes

Overview

A compact CNN based localisation pipeline designed for UAVs operating in GPS denied environments, deployed on an FPGA to achieve real time inference under tight power constraints.

Abstract

Unmanned aerial vehicles (UAVs) often operate in environments where GPS signals are unavailable or unreliable, creating significant localisation challenges. Traditional feature based methods such as ORB and RANSAC can operate in real time but are highly sensitive to changes in lighting, viewpoint and scene structure.

This project investigated an alternative approach based on lightweight convolutional neural networks (CNNs) deployed on field programmable gate array (FPGA) hardware. Using the AMD Kria KV260 Vision AI platform, a custom CNN model was trained, quantised and compiled through the Vitis AI tool flow for execution on a deep learning processing unit (DPU). The model directly regresses image coordinates from monocular input, enabling on board visual localisation without reliance on external compute resources.

Preliminary results demonstrate real time inference at approximately 32 frames per second with total board power consumption below 11 watts, highlighting the efficiency of FPGA acceleration. Accuracy metrics are reduced by quantisation effects and unoptimised datasets, with error rates higher than the full precision baseline, but ongoing work is directed towards quantisation aware training and improved data handling.

In preliminary real world testing, localisation accuracy was noticeably affected, likely due to differences between the datasets used for training and the imaging characteristics of the deployed camera system. In ideal conditions, datasets would be calibrated and augmented to align with the specific properties of the imaging hardware, including field of view, expected flight altitude, and colour or lighting variations, to ensure consistency between training and deployment environments.

The outcomes of this research contribute towards practical, energy efficient localisation pipelines for small UAVs operating in GPS denied scenarios.

Problem

UAVs operating in GPS denied environments require reliable localisation under tight compute and power constraints.

Key constraints

  • On board compute only (no cloud dependency)
  • Real time throughput with stable latency
  • Limited power budget suitable for small UAV platforms
  • Robustness under viewpoint, lighting, and scene variation

Approach

The work was structured around three progressive studies, each addressing a stage of the pipeline from data design through to deployment and real time integration.

Study 1: Dataset diversity and augmentation

Localisation performance was strongly governed by the diversity and realism of the training data.

  • Models trained on a single static image displayed overfitting and poor generalisation
  • Augmented datasets incorporating AI generated imagery and altitude variation improved accuracy and robustness
  • Data centric strategies were as influential as architecture choices in this visual regression task

Study 2: Quantisation and hardware deployment

This study evaluated 8 bit integer quantisation and FPGA execution via the AMD Vitis AI tool flow.

  • Trained a compact CoordinateCNN style architecture for coordinate regression
  • Applied post training quantisation and compiled for DPU execution on the Kria KV260
  • Observed only minor degradation in mean and median error (approximately 1 to 2 percent) while enabling efficient deployment

Study 3: Real time inference and emulation

This study validated end to end behaviour in a live streaming pipeline.

  • Integrated the quantised model into a real time video streaming pipeline
  • Confirmed deterministic and repeatable behaviour across runs
  • Verified stable latency and coherent spatial error patterns aligned with scene structure

Results

Throughput and power

  • Inference: approximately 32 to 33 fps on Kria KV260
  • Total board power: below 11 W
  • Latency: tightly clustered around 22 to 24 ms in real time emulation

Accuracy and generalisation

  • Post training quantisation introduced only minor accuracy degradation compared to full precision baseline
  • Real world testing showed a noticeable drop in localisation accuracy, likely due to dataset mismatch with the deployed camera system
  • This highlights the importance of camera aligned calibration and augmentation (field of view, altitude distribution, colour and lighting variation)

Discussion

Overall interpretation of findings

Across all studies, results show dataset diversity is the dominant driver of localisation performance. Training on single or overly narrow datasets led to poor generalisation, while augmented datasets improved robustness. Quantised deployment retained high accuracy under 8 bit precision while delivering more than 30 fps under 11 W on the KV260, demonstrating FPGA inference as a practical pathway for on board visual localisation.

Real time emulation confirmed predictable behaviour, stable latency around 22 ms, and stable error distributions, supporting use in navigation pipelines where consistency is essential.

Critical evaluation

Key limitations and considerations:

  • Training data was limited to a single geographic region (UQ and surrounding suburbs)
  • Absolute regression assumes a fixed reference map and limits adaptability in evolving environments
  • Single frame localisation does not leverage temporal coherence that could smooth estimates
  • DPU profiling was coarse, limiting precise co design optimisation
  • AI generated imagery can introduce bias and should be supported by automated diversity and verification metrics

Practical implications

  • Demonstrates high performance visual localisation without GPUs or cloud connectivity
  • Shows compact FPGA systems can deliver real time performance within strict energy budgets
  • Provides a reproducible pipeline from data generation through to hardware deployment
  • Deterministic latency and repeatable behaviour suit sensor fusion frameworks (for example visual inertial fusion)

Future work

  1. Quantisation aware training to close the remaining accuracy gap
  2. Mixed precision inference to improve accuracy in sensitive layers while retaining efficiency
  3. Expanded datasets across multiple regions, seasons, and environments with automated quality metrics
  4. Sensor fusion with inertial, barometric, and magnetometer data for continuous navigation
  5. Temporal modelling through lightweight filtering or recurrent components to exploit frame continuity
  6. Hardware co design including pruning, pipeline tuning, and DPU scheduling optimisation
  7. Field validation on a UAV platform with KV260 or equivalent Zynq UltraScale+ hardware
  8. Real time mapping or incremental map adaptation beyond static reference assumptions
Previous
Farm Paddock Monitoring Using Satellite Imagery
A web based paddock monitoring system using multi year Sentinel-2 satellite imagery to track vegetation condition, moisture stress, and surface water at paddock scale.
Next
Scary Crow
Game Jam project built in Unity where players farm pumpkins by day and survive crow attacks by night.