Drone-Guard: A Self-Supervised Deep Learning Framework for Spatiotemporal Anomaly Detection in UAV Surveillance Videos

1SECROM Laboratory, EPT, University of Carthage, La Marsa 2078, Tunisia,
2ISITCOM, University of Sousse, Hammam Sousse 4011, Tunisia

Abstract

Anomaly detection is a cornerstone of intelligent video surveillance, facilitating the early identification of irregular or potentially dangerous events. While deep learning has significantly advanced this domain, current approaches often face a fundamental trade-off between detection accuracy and computational efficiency. High-accuracy models are typically computationally intensive and unsuitable for real-time deployment on resource-constrained platforms such as unmanned aerial vehicles (UAVs). Conversely, lightweight alternatives often lack the representational capacity to model complex spatiotemporal patterns in dynamic environments. To address this challenge, we present Drone-Guard, a self-supervised deep learning framework designed for real-time spatiotemporal anomaly detection in UAV surveillance systems. Drone-Guard introduces three core innovations: (1) a lightweight residual autoencoder architecture with multi-stage feature extraction to jointly capture fine-grained and high-level spatial structures; (2) a novel Multi-Scale Grouped Query Attention (MS-GQA) mechanism for efficient fusion of hierarchical spatial and temporal features, enabling context-aware anomaly modeling; and (3) a Residual Vector Quantization (RVQ) module that enhances latent representation compactness and reconstruction fidelity, crucial for discriminative anomaly detection. To overcome the lack of labeled anomalies—a central limitation in self-supervised learning—we further propose a latent-space pseudo-anomaly synthesizer. This component perturbs the learned representations of normal samples to generate synthetic anomalies, thereby facilitating effective decision boundary learning without requiring manual annotations. Extensive experiments and ablation studies on multiple benchmark datasets demonstrate that Drone-Guard outperforms state-of-the-art methods in both accuracy and efficiency. Its low computational footprint and robust anomaly localization capabilities make it well-suited for real-time deployment in edge-aware, IoT-enabled UAV surveillance applications.Code and pre-trained models are publicly available at: https://github.com/slitiWassim/Drone-Guard

Video

Ground-based Videos

UCSD Ped2

In this scene, a cow herd walking on the highway is anomalous.

ShanghaiTech Campus

Vehicles moving on the roundabout used for bikes is anomalous in this scene.

Aerial Videos

Bike

Vehicles moving on the roundabout used for bikes is anomalous in this scene.

Anomaly score

In each image, the anomaly score is represented by a red line, showing rapid fluctuations between normal and abnormal events.

Ground-based Datasets

Interpolation end reference image.

UCSD Ped2 Dataset

Interpolation end reference image.

CUHK Avenue Dataset

Interpolation end reference image.

ShanghaiTech Dataset

Drone-anomaly Dataset

Interpolation end reference image.

Bike roundabout

Tunisia Polytechnic School (EPT)

Laboratory of Electronic Systems and Communications Networks

BibTeX

@Article{}