Robust FPGA Centric CNN Visual Localisation for GPS Denied UAVs
End to end development of a compact CNN for absolute visual localisation, quantised and deployed on an AMD Kria KV260 FPGA DPU for real time, low power inference in GPS denied UAV scenarios.
Gallery
Videos
Notes
Overview
A compact CNN based localisation pipeline designed for UAVs operating in GPS denied environments, deployed on an FPGA to achieve real time inference under tight power constraints.
Abstract
Unmanned aerial vehicles (UAVs) often operate in environments where GPS signals are unavailable or unreliable, creating significant localisation challenges. Traditional feature based methods such as ORB and RANSAC can operate in real time but are highly sensitive to changes in lighting, viewpoint and scene structure.
This project investigated an alternative approach based on lightweight convolutional neural networks (CNNs) deployed on field programmable gate array (FPGA) hardware. Using the AMD Kria KV260 Vision AI platform, a custom CNN model was trained, quantised and compiled through the Vitis AI tool flow for execution on a deep learning processing unit (DPU). The model directly regresses image coordinates from monocular input, enabling on board visual localisation without reliance on external compute resources.
Preliminary results demonstrate real time inference at approximately 32 frames per second with total board power consumption below 11 watts, highlighting the efficiency of FPGA acceleration. Accuracy metrics are reduced by quantisation effects and unoptimised datasets, with error rates higher than the full precision baseline, but ongoing work is directed towards quantisation aware training and improved data handling.
In preliminary real world testing, localisation accuracy was noticeably affected, likely due to differences between the datasets used for training and the imaging characteristics of the deployed camera system. In ideal conditions, datasets would be calibrated and augmented to align with the specific properties of the imaging hardware, including field of view, expected flight altitude, and colour or lighting variations, to ensure consistency between training and deployment environments.
The outcomes of this research contribute towards practical, energy efficient localisation pipelines for small UAVs operating in GPS denied scenarios.
Problem
UAVs operating in GPS denied environments require reliable localisation under tight compute and power constraints.
Key constraints
- On board compute only (no cloud dependency)
- Real time throughput with stable latency
- Limited power budget suitable for small UAV platforms
- Robustness under viewpoint, lighting, and scene variation
Approach
The work was structured around three progressive studies, each addressing a stage of the pipeline from data design through to deployment and real time integration.
Study 1: Dataset diversity and augmentation
Localisation performance was strongly governed by the diversity and realism of the training data.
- Models trained on a single static image displayed overfitting and poor generalisation
- Augmented datasets incorporating AI generated imagery and altitude variation improved accuracy and robustness
- Data centric strategies were as influential as architecture choices in this visual regression task
Study 2: Quantisation and hardware deployment
This study evaluated 8 bit integer quantisation and FPGA execution via the AMD Vitis AI tool flow.
- Trained a compact CoordinateCNN style architecture for coordinate regression
- Applied post training quantisation and compiled for DPU execution on the Kria KV260
- Observed only minor degradation in mean and median error (approximately 1 to 2 percent) while enabling efficient deployment
Study 3: Real time inference and emulation
This study validated end to end behaviour in a live streaming pipeline.
- Integrated the quantised model into a real time video streaming pipeline
- Confirmed deterministic and repeatable behaviour across runs
- Verified stable latency and coherent spatial error patterns aligned with scene structure
Results
Throughput and power
- Inference: approximately 32 to 33 fps on Kria KV260
- Total board power: below 11 W
- Latency: tightly clustered around 22 to 24 ms in real time emulation
Accuracy and generalisation
- Post training quantisation introduced only minor accuracy degradation compared to full precision baseline
- Real world testing showed a noticeable drop in localisation accuracy, likely due to dataset mismatch with the deployed camera system
- This highlights the importance of camera aligned calibration and augmentation (field of view, altitude distribution, colour and lighting variation)
Discussion
Overall interpretation of findings
Across all studies, results show dataset diversity is the dominant driver of localisation performance. Training on single or overly narrow datasets led to poor generalisation, while augmented datasets improved robustness. Quantised deployment retained high accuracy under 8 bit precision while delivering more than 30 fps under 11 W on the KV260, demonstrating FPGA inference as a practical pathway for on board visual localisation.
Real time emulation confirmed predictable behaviour, stable latency around 22 ms, and stable error distributions, supporting use in navigation pipelines where consistency is essential.
Critical evaluation
Key limitations and considerations:
- Training data was limited to a single geographic region (UQ and surrounding suburbs)
- Absolute regression assumes a fixed reference map and limits adaptability in evolving environments
- Single frame localisation does not leverage temporal coherence that could smooth estimates
- DPU profiling was coarse, limiting precise co design optimisation
- AI generated imagery can introduce bias and should be supported by automated diversity and verification metrics
Practical implications
- Demonstrates high performance visual localisation without GPUs or cloud connectivity
- Shows compact FPGA systems can deliver real time performance within strict energy budgets
- Provides a reproducible pipeline from data generation through to hardware deployment
- Deterministic latency and repeatable behaviour suit sensor fusion frameworks (for example visual inertial fusion)
Future work
- Quantisation aware training to close the remaining accuracy gap
- Mixed precision inference to improve accuracy in sensitive layers while retaining efficiency
- Expanded datasets across multiple regions, seasons, and environments with automated quality metrics
- Sensor fusion with inertial, barometric, and magnetometer data for continuous navigation
- Temporal modelling through lightweight filtering or recurrent components to exploit frame continuity
- Hardware co design including pruning, pipeline tuning, and DPU scheduling optimisation
- Field validation on a UAV platform with KV260 or equivalent Zynq UltraScale+ hardware
- Real time mapping or incremental map adaptation beyond static reference assumptions