Ayuda
Ir al contenido

Dialnet


Resumen de Topological place recognition for life-long visual localization

Roberto Arroyo Contera

  • español

    La navegación de vehículos inteligentes o robots móviles en períodos largos de tiempo ha experimentado un gran interés por parte de la comunidad investigadora en los últimos años. Los sistemas basados en cámaras se han extendido ampliamente en el pasado reciente gracias a las mejoras en sus características, precio y reducción de tamaño, añadidos a los progresos en técnicas de visión artificial. Por ello, la localización basada en visión es una aspecto clave para desarrollar una navegación autónoma robusta en situaciones a largo plazo. Teniendo en cuenta esto, la identificación de localizaciones por medio de técnicas de reconocimiento de lugar topológicas puede ser complementaria a otros enfoques como son las soluciones basadas en el Global Positioning System (GPS), o incluso suplementaria cuando la señal GPS no está disponible.El estado del arte en reconocimiento de lugar topológico ha mostrado un funcionamiento satisfactorio en el corto plazo. Sin embargo, la localización visual a largo plazo es problemática debido a los grandes cambios de apariencia que un lugar sufre como consecuencia de elementos dinámicos, la iluminación o la climatología, entre otros. El objetivo de esta tesis es enfrentarse a las dificultades de llevar a cabo una localización topológica eficiente y robusta a lo largo del tiempo. En consecuencia, se van a contribuir dos nuevos enfoques basados en reconocimiento visual de lugar para resolver los diferentes problemas asociados a una localización visual a largo plazo.Por un lado, un método de reconocimiento de lugar visual basado en descriptores binarios es propuesto. La innovación de este enfoque reside en la descripción global de secuencias de imágenes como códigos binarios, que son extraídos mediante un descriptor basado en la técnica denominada Local Difference Binary (LDB). Los descriptores son eficientemente asociados usando la distancia de Hamming y un método de búsqueda conocido como Approximate Nearest Neighbors (ANN). Además, una técnica de iluminación invariante es aplicada para mejorar el funcionamiento en condiciones luminosas cambiantes. El empleo de la descripción binaria previamente introducida proporciona una reducción de los costes computacionales y de memoria.Por otro lado, también se presenta un método de reconocimiento de lugar visual basado en deep learning, en el cual los descriptores aplicados son procesados por una Convolutional Neural Network (CNN). Este es un concepto recientemente popularizado en visión artificial que ha obtenido resultados impresionantes en problemas de clasificación de imagen. La novedad de nuestro enfoque reside en la fusión de la información de imagen de múltiples capas convolucionales a varios niveles y granularidades. Además, los datos redundantes de los descriptores basados en CNNs son comprimidos en un número reducido de bits para una localización más eficiente.

  • English

    The navigation of intelligent vehicles or mobile robots in long periods of time has experienced a great interest by the research community in the last years. In this sense, visual information has become a valuable asset in any perception scheme designed to improve the scene understanding for autonomous driving. Camera-based systems have been broadly extended within the recent past due to the improvements in camera features, price and size reduction, added to the progress in computer vision. For this reason, vision-based localization is a key aspect to develop a robust automated navigation approach in long-term situations. According to this, the identification of locations by means of topological place recognition techniques can be complementary to other sensing technologies such as solutions based on Global Positioning System (GPS), or even supplementary when GPS signal is not completely available or denied.

    The state of the art in topological place recognition has shown satisfactory performance in short-term problems. However, life-long visual localization is a challenging topic that is more problematic because of the strong appearance changes that a place suffers due to dynamic elements, illumination or weather, among others. In this regard, one of the most popularized research areas appeared in recent times is related to the identification of places across the four seasons of the year. The goal of this dissertation is to cope with the main difficulties of carrying out an efficient and robust topological localization along the time course. Consequently, we contribute two novel approaches based on visual place recognition in order to solve the different problems associated with life-long visual localization.

    On the one hand, a visual place recognition method based on hand-crafted binary descriptors is proposed. The innovation of this approach resides in the global description of sequences of images as binary codes, which are extracted from a Local Difference Binary (LDB) descriptor and efficiently matched using the Hamming distance in an Approximate Nearest Neighbors (ANN) search. Besides, an illumination invariant technique is applied for improving the performance in changing lighting conditions. The usage of the introduced binary description and matching method provides a reduction of memory and computational costs, which is necessary for a long-term operation. In addition, three versions of this proposal are designed with the aim of exploiting the advantages of different types of cameras: monocular, stereo and panoramic.

    On the other hand, we also present a visual place recognition method based on deep learning, in which the applied features are processed by a Convolutional Neural Network (CNN). This is a concept recently popularized in the computer vision community that has obtained impressive results in image classification problems. Here, we take advantage of CNNs in order to improve the accuracy of topological localization against seasonal changes, because traditional solutions focused on hand-crafted descriptors have more difficulties in these conditions. The novelty in our approach relies on fusing the image information from multiple convolutional layers at several levels and granularities. In addition, the redundant data of CNN features is compressed into a tractable number of bits for a more efficient and robust life-long localization. The final descriptor is reduced by applying simple compression and binarization techniques for fast matching using again the Hamming distance. Along the dissertation, we discuss the pros and cons of this place recognition proposal with respect to the one based on hand-crafted features. In general terms, methods focused on CNNs improve the precision by generating more detailed visual representations of locations. However, the disadvantage is that computational costs are also incremented compared to the ones required for processing hand-crafted descriptors.

    Both topological place recognition approaches are extensively evaluated over several publicly available datasets. These tests yield a satisfactory precision in long-term conditions, as corroborated by the exhibited results, which compare our methods against the main state-of-the-art algorithms, showing better results for all the cases. In these experiments, our proposals are validated in life-long visual localization, especially if we consider that a distance higher than 3000 km is traversed in the performed tests across the seasons.

    Additionally, we also analyze the applicability of our topological place recognition in different localization problems. These applications include the detection of loop closures based on the recognized places or the correction of the accumulated drift in visual odometry estimations using the loop closure information. The Simultaneous Localization And Mapping (SLAM) problem is also studied by taking into account the corrected measurements obtained by means of camera-based localization and the information provided by other sensing technologies, such as GPS or Light Detection And Ranging (LiDAR). Besides, we also consider the applications of geometric change detection across seasons, which is essential for mapping updates in autonomous driving systems focused on a long-term operation. All these contributions are discussed at the end of the dissertation, including several conclusions about the presented work and some future research lines.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus