Ramachandran, Saravanabalagi (2025) Learned Descriptors for Scalable and Efficient Visual Place Recognition. PhD thesis, National University of Ireland Maynooth.
Preview
SaravanabalagiRamachandran_PhD2025.pdf
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (30MB) | Preview
Abstract
Visual Place Recognition (VPR) is a task in computer vision that involves matching
images of an environment to previously visited locations, enabling systems to identify
and recognise places based on visual information. VPR has emerged as a widely studied
topic in computer vision and mobile robotics, driven by its applications in autonomous
navigation, image retrieval, and loop closure detection. Over the past decade, the field
has witnessed significant progress, fuelled by improvements in camera hardware, the
proliferation of mobile devices, and the growing availability of public image datasets.
Researchers have increasingly utilised deep learning techniques to tackle the challenges
of VPR, particularly those related to appearance changes and varying viewpoints that
traditional descriptors struggled to address.
Despite these advancements, several interconnected challenges hinder the deployment
of reliable and scalable VPR systems in automotive applications. Utilising largescale
sequential datasets poses significant difficulties due to diverse recording conventions,
redundant visual content, and limited viewpoint variance, complicating training
processes for deep learning. Additionally, efficiently categorising scenes without explicit
object identification introduces considerable computational and methodological
complexities. Furthermore, VPR systems face challenges related to scalability, primarily
due to the computational demands associated with rapid retrieval of images
for localisation along extensive trajectories spanning several kilometres. This thesis
specifically addresses these critical challenges.
First, we introduce OdoViz, a comprehensive and unified framework designed
for efficient dataset exploration, visualisation, analysis, curation, and preparation of
bespoke training data from heterogeneous datasets. OdoViz streamlines the creation of
standardised, tailored datasets essential for robust VPR model training.
Secondly, we elaborate on the development of robust learned image descriptors
utilising large sequential datasets. We introduce a novel discretisation approach that
segments trajectories into visually similar regions, facilitating efficient online sampling
of triplets for contrastive learning. We present a detailed training regime involving
tailored data subsets, a modified architecture, and a custom loss function for stable
contrastive training, optimised to generate robust learned image representations.
Thirdly, we propose an efficient scene categorisation method leveraging Variational
Autoencoders (VAEs). Our approach encodes images into compact, disentangled
latent spaces without explicit object recognition, enabling rapid categorisation into
urban, rural, and suburban contexts. This method achieves exceptional computational
efficiency, with inference times under 100μs, making it suitable for use as a pretext task
in real-time automotive applications.
Finally, addressing the scalability concerns, we introduce a hierarchical framework
utilising learned global descriptors to facilitate rapid retrieval over extensive distances
while maintaining robust localisation performance. Through extensive experimentation,
we identify continuity and distinctiveness as key properties of effective global descriptors
for scalable hierarchical mapping, and propose a systematic method to quantify and
compare these characteristics across various descriptor types. Our VAE-based scene descriptors
achieve up to 9.5x speedup on the longest evaluated track, St Lucia (17.6km),
while maintaining the same recall performance over longer trajectories, demonstrating
their effectiveness in hierarchical localisation.
Together, these contributions address the identified VPR challenges, laying the
groundwork for scalable and efficient VPR systems leveraging learned representations,
suited for deployment in diverse real-world automotive environments.
Item Type: | Thesis (PhD) |
---|---|
Keywords: | Learned Descriptors; Scalable and Efficient Visual; Place Recognition; |
Academic Unit: | Faculty of Science and Engineering > Computer Science |
Item ID: | 20668 |
Depositing User: | IR eTheses |
Date Deposited: | 09 Oct 2025 16:14 |
Funders: | Science Foundation Ireland 13/RC/2094, Lero, Irish Software Research Centre 16/RI/3399 |
URI: | https://mural.maynoothuniversity.ie/id/eprint/20668 |
Use Licence: | This item is available under a Creative Commons Attribution Non Commercial Share Alike Licence (CC BY-NC-SA). Details of this licence are available here |
Repository Staff Only (login required)
Downloads
Downloads per month over past year