Event Vision Sensors (EVS) offer a different approach to imaging by capturing visual information in an event-driven, asynchronous manner, providing high temporal resolution, low latency, and sparse data. However, current computer vision methods often transform event streams into grid-based frames, compromising these advantages. This thesis introduces an innovative approach to processing event-based data, which converts an event stream into a sequence of learnt high-dimensional vectors that can be used in conventional computer vision pipelines. The proposed sequence is denoted as Learnt Embedding Representation of Spatio-Temporal Point Clouds (LERT) and, combined with application-specific machine learning modules, allows for end-to-end learning systems from sensing pixels to downstream processing. Furthermore, this work also introduces ALERT - Asynchronous Learnt Embedding Representation of Spatio-Temporal Point Clouds -, the asynchronous version of this representation, which can be used during inference for real-time, event-driven processing of triggered events. ALERT allows to leverage the sparse, low latency and event-driven nature of event data without compromising the precision and accuracy of standard dense models. The proposed representation is evaluated on classification tasks, by leveraging Visual Transformers [1]. The LERT-Transformer and ALERT-Transformer models are presented, and experiments on established datasets demonstrate their versatility and potential for real-time applications. This work advances the understanding and utilization of EVS technology in computer vision, paving the way for further research towards highly efficient event-driven processing.
Learnt Representation of Spatio-Temporal Point Clouds for Event Transformers
MARTIN TURRERO, CARMEN
2022/2023
Abstract
Event Vision Sensors (EVS) offer a different approach to imaging by capturing visual information in an event-driven, asynchronous manner, providing high temporal resolution, low latency, and sparse data. However, current computer vision methods often transform event streams into grid-based frames, compromising these advantages. This thesis introduces an innovative approach to processing event-based data, which converts an event stream into a sequence of learnt high-dimensional vectors that can be used in conventional computer vision pipelines. The proposed sequence is denoted as Learnt Embedding Representation of Spatio-Temporal Point Clouds (LERT) and, combined with application-specific machine learning modules, allows for end-to-end learning systems from sensing pixels to downstream processing. Furthermore, this work also introduces ALERT - Asynchronous Learnt Embedding Representation of Spatio-Temporal Point Clouds -, the asynchronous version of this representation, which can be used during inference for real-time, event-driven processing of triggered events. ALERT allows to leverage the sparse, low latency and event-driven nature of event data without compromising the precision and accuracy of standard dense models. The proposed representation is evaluated on classification tasks, by leveraging Visual Transformers [1]. The LERT-Transformer and ALERT-Transformer models are presented, and experiments on established datasets demonstrate their versatility and potential for real-time applications. This work advances the understanding and utilization of EVS technology in computer vision, paving the way for further research towards highly efficient event-driven processing.File | Dimensione | Formato | |
---|---|---|---|
MartinTurrero_Carmen.pdf
accesso riservato
Dimensione
12.53 MB
Formato
Adobe PDF
|
12.53 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/54904