Event Vision Sensors (EVS) offer a different approach to imaging by capturing visual information in an event-driven, asynchronous manner, providing high temporal resolution, low latency, and sparse data. However, current computer vision methods often transform event streams into grid-based frames, compromising these advantages. This thesis introduces an innovative approach to processing event-based data, which converts an event stream into a sequence of learnt high-dimensional vectors that can be used in conventional computer vision pipelines. The proposed sequence is denoted as Learnt Embedding Representation of Spatio-Temporal Point Clouds (LERT) and, combined with application-specific machine learning modules, allows for end-to-end learning systems from sensing pixels to downstream processing. Furthermore, this work also introduces ALERT - Asynchronous Learnt Embedding Representation of Spatio-Temporal Point Clouds -, the asynchronous version of this representation, which can be used during inference for real-time, event-driven processing of triggered events. ALERT allows to leverage the sparse, low latency and event-driven nature of event data without compromising the precision and accuracy of standard dense models. The proposed representation is evaluated on classification tasks, by leveraging Visual Transformers [1]. The LERT-Transformer and ALERT-Transformer models are presented, and experiments on established datasets demonstrate their versatility and potential for real-time applications. This work advances the understanding and utilization of EVS technology in computer vision, paving the way for further research towards highly efficient event-driven processing.

Learnt Representation of Spatio-Temporal Point Clouds for Event Transformers

MARTIN TURRERO, CARMEN
2022/2023

Abstract

Event Vision Sensors (EVS) offer a different approach to imaging by capturing visual information in an event-driven, asynchronous manner, providing high temporal resolution, low latency, and sparse data. However, current computer vision methods often transform event streams into grid-based frames, compromising these advantages. This thesis introduces an innovative approach to processing event-based data, which converts an event stream into a sequence of learnt high-dimensional vectors that can be used in conventional computer vision pipelines. The proposed sequence is denoted as Learnt Embedding Representation of Spatio-Temporal Point Clouds (LERT) and, combined with application-specific machine learning modules, allows for end-to-end learning systems from sensing pixels to downstream processing. Furthermore, this work also introduces ALERT - Asynchronous Learnt Embedding Representation of Spatio-Temporal Point Clouds -, the asynchronous version of this representation, which can be used during inference for real-time, event-driven processing of triggered events. ALERT allows to leverage the sparse, low latency and event-driven nature of event data without compromising the precision and accuracy of standard dense models. The proposed representation is evaluated on classification tasks, by leveraging Visual Transformers [1]. The LERT-Transformer and ALERT-Transformer models are presented, and experiments on established datasets demonstrate their versatility and potential for real-time applications. This work advances the understanding and utilization of EVS technology in computer vision, paving the way for further research towards highly efficient event-driven processing.
2022
Learnt Representation of Spatio-Temporal Point Clouds for Event Transformers
DeepLearning
ComputerVision
EventData
File in questo prodotto:
File Dimensione Formato  
MartinTurrero_Carmen.pdf

accesso riservato

Dimensione 12.53 MB
Formato Adobe PDF
12.53 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/54904