Learned Descriptors for Scalable and Efficient
Visual Place Recognition

Ramachandran, Saravanabalagi

Learned Descriptors for Scalable and Efficient Visual Place Recognition

Share and Export

Ramachandran, Saravanabalagi (2025) Learned Descriptors for Scalable and Efficient Visual Place Recognition. PhD thesis, National University of Ireland Maynooth.

[thumbnail of SaravanabalagiRamachandran_PhD2025.pdf]

Preview

Text
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (30MB) | Preview

Abstract

Visual Place Recognition (VPR) is a task in computer vision that involves matching images of an environment to previously visited locations, enabling systems to identify and recognise places based on visual information. VPR has emerged as a widely studied topic in computer vision and mobile robotics, driven by its applications in autonomous navigation, image retrieval, and loop closure detection. Over the past decade, the field has witnessed significant progress, fuelled by improvements in camera hardware, the proliferation of mobile devices, and the growing availability of public image datasets. Researchers have increasingly utilised deep learning techniques to tackle the challenges of VPR, particularly those related to appearance changes and varying viewpoints that traditional descriptors struggled to address. Despite these advancements, several interconnected challenges hinder the deployment of reliable and scalable VPR systems in automotive applications. Utilising largescale sequential datasets poses significant difficulties due to diverse recording conventions, redundant visual content, and limited viewpoint variance, complicating training processes for deep learning. Additionally, efficiently categorising scenes without explicit object identification introduces considerable computational and methodological complexities. Furthermore, VPR systems face challenges related to scalability, primarily due to the computational demands associated with rapid retrieval of images for localisation along extensive trajectories spanning several kilometres. This thesis specifically addresses these critical challenges. First, we introduce OdoViz, a comprehensive and unified framework designed for efficient dataset exploration, visualisation, analysis, curation, and preparation of bespoke training data from heterogeneous datasets. OdoViz streamlines the creation of standardised, tailored datasets essential for robust VPR model training. Secondly, we elaborate on the development of robust learned image descriptors utilising large sequential datasets. We introduce a novel discretisation approach that segments trajectories into visually similar regions, facilitating efficient online sampling of triplets for contrastive learning. We present a detailed training regime involving tailored data subsets, a modified architecture, and a custom loss function for stable contrastive training, optimised to generate robust learned image representations. Thirdly, we propose an efficient scene categorisation method leveraging Variational Autoencoders (VAEs). Our approach encodes images into compact, disentangled latent spaces without explicit object recognition, enabling rapid categorisation into urban, rural, and suburban contexts. This method achieves exceptional computational efficiency, with inference times under 100μs, making it suitable for use as a pretext task in real-time automotive applications. Finally, addressing the scalability concerns, we introduce a hierarchical framework utilising learned global descriptors to facilitate rapid retrieval over extensive distances while maintaining robust localisation performance. Through extensive experimentation, we identify continuity and distinctiveness as key properties of effective global descriptors for scalable hierarchical mapping, and propose a systematic method to quantify and compare these characteristics across various descriptor types. Our VAE-based scene descriptors achieve up to 9.5x speedup on the longest evaluated track, St Lucia (17.6km), while maintaining the same recall performance over longer trajectories, demonstrating their effectiveness in hierarchical localisation. Together, these contributions address the identified VPR challenges, laying the groundwork for scalable and efficient VPR systems leveraging learned representations, suited for deployment in diverse real-world automotive environments.

Item Type:	Thesis (PhD)
Keywords:	Learned Descriptors; Scalable and Efficient Visual; Place Recognition;
Academic Unit:	Faculty of Science and Engineering > Computer Science
Item ID:	20668
Depositing User:	IR eTheses
Date Deposited:	09 Oct 2025 16:14
Funders:	Science Foundation Ireland 13/RC/2094, Lero, Irish Software Research Centre 16/RI/3399
Use Licence:	This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here

MURAL - Maynooth University Research Archive Library

Learned Descriptors for Scalable and Efficient Visual Place Recognition

Abstract

Downloads

Origin of downloads

Repository Staff Only (login required)