Spiking Neural Networks (SNNs) have emerged as a promising and more energy-efficient paradigm for spatiotemporal processing compared to standard Artificial Neural Networks (ANNs), as they process information through sparse, binary discrete events called "spikes", whereas ANNs rely on dense, continuous real-valued computations that are typically more energy-intensive. Despite these advantages, effectively capturing complex temporal dynamics remains a significant challenge. In this work, we investigate the integration of two established and high-performing mechanisms: the Convolutional Spiking Gated Recurrent Unit (CS-GRU) architecture and "learnable axonal delays". While CS-GRU models are well-suited to handle long-term dependencies and extract local spatiotemporal features through convolutional operations, learnable delays provide a complementary mechanism for fine-grained temporal alignment of asynchronous spike events. Building on these observations, we propose an architecture that embeds learnable axonal delays directly within the recurrent core of a CS-GRU. This design aims to jointly leverage structured spatial feature extraction, recurrent memory, and adaptive temporal synchronization within a unified framework. We conduct an extensive empirical evaluation on three challenging spatiotemporal benchmarks: the Spiking Heidelberg Digits (SHD), Neuromorphic-TIDIGITS (N-TIDIGITS), and Spiking Speech Commands (SSC) datasets. Our analysis focuses on both the integrability and scalability of the proposed approach. Results indicate that a simple single-layer integration struggles to effectively reconcile the coupled optimization of weights and delay parameters, often leading to sub-optimal convergence. In contrast, the transition to a two-layer architecture yields improved classification accuracy and more stable training dynamics. Although the proposed method does not yet surpass current state-of-the-art performance, this work provides a detailed investigation of the underlying architectural bottlenecks and the interaction between recurrent delay learning and Convolutional Spiking GRU architecture. Our findings suggest that decoupling convolutional, recurrent, and delay-based operations into specialized components may be a crucial design principle for future delay-augmented SNNs, offering a promising direction for the development of more expressive and efficient neuromorphic systems.
Spiking Neural Networks (SNNs) have emerged as a promising and more energy-efficient paradigm for spatiotemporal processing compared to standard Artificial Neural Networks (ANNs), as they process information through sparse, binary discrete events called "spikes", whereas ANNs rely on dense, continuous real-valued computations that are typically more energy-intensive. Despite these advantages, effectively capturing complex temporal dynamics remains a significant challenge. In this work, we investigate the integration of two established and high-performing mechanisms: the Convolutional Spiking Gated Recurrent Unit (CS-GRU) architecture and "learnable axonal delays". While CS-GRU models are well-suited to handle long-term dependencies and extract local spatiotemporal features through convolutional operations, learnable delays provide a complementary mechanism for fine-grained temporal alignment of asynchronous spike events. Building on these observations, we propose an architecture that embeds learnable axonal delays directly within the recurrent core of a CS-GRU. This design aims to jointly leverage structured spatial feature extraction, recurrent memory, and adaptive temporal synchronization within a unified framework. We conduct an extensive empirical evaluation on three challenging spatiotemporal benchmarks: the Spiking Heidelberg Digits (SHD), Neuromorphic-TIDIGITS (N-TIDIGITS), and Spiking Speech Commands (SSC) datasets. Our analysis focuses on both the integrability and scalability of the proposed approach. Results indicate that a simple single-layer integration struggles to effectively reconcile the coupled optimization of weights and delay parameters, often leading to sub-optimal convergence. In contrast, the transition to a two-layer architecture yields improved classification accuracy and more stable training dynamics. Although the proposed method does not yet surpass current state-of-the-art performance, this work provides a detailed investigation of the underlying architectural bottlenecks and the interaction between recurrent delay learning and Convolutional Spiking GRU architecture. Our findings suggest that decoupling convolutional, recurrent, and delay-based operations into specialized components may be a crucial design principle for future delay-augmented SNNs, offering a promising direction for the development of more expressive and efficient neuromorphic systems.
Modeling Spatio-Temporal Data via Learnable Axonal Delays in Convolutional Recurrent Spiking Neural Networks
PLANTAMURA, DOMENICO
2025/2026
Abstract
Spiking Neural Networks (SNNs) have emerged as a promising and more energy-efficient paradigm for spatiotemporal processing compared to standard Artificial Neural Networks (ANNs), as they process information through sparse, binary discrete events called "spikes", whereas ANNs rely on dense, continuous real-valued computations that are typically more energy-intensive. Despite these advantages, effectively capturing complex temporal dynamics remains a significant challenge. In this work, we investigate the integration of two established and high-performing mechanisms: the Convolutional Spiking Gated Recurrent Unit (CS-GRU) architecture and "learnable axonal delays". While CS-GRU models are well-suited to handle long-term dependencies and extract local spatiotemporal features through convolutional operations, learnable delays provide a complementary mechanism for fine-grained temporal alignment of asynchronous spike events. Building on these observations, we propose an architecture that embeds learnable axonal delays directly within the recurrent core of a CS-GRU. This design aims to jointly leverage structured spatial feature extraction, recurrent memory, and adaptive temporal synchronization within a unified framework. We conduct an extensive empirical evaluation on three challenging spatiotemporal benchmarks: the Spiking Heidelberg Digits (SHD), Neuromorphic-TIDIGITS (N-TIDIGITS), and Spiking Speech Commands (SSC) datasets. Our analysis focuses on both the integrability and scalability of the proposed approach. Results indicate that a simple single-layer integration struggles to effectively reconcile the coupled optimization of weights and delay parameters, often leading to sub-optimal convergence. In contrast, the transition to a two-layer architecture yields improved classification accuracy and more stable training dynamics. Although the proposed method does not yet surpass current state-of-the-art performance, this work provides a detailed investigation of the underlying architectural bottlenecks and the interaction between recurrent delay learning and Convolutional Spiking GRU architecture. Our findings suggest that decoupling convolutional, recurrent, and delay-based operations into specialized components may be a crucial design principle for future delay-augmented SNNs, offering a promising direction for the development of more expressive and efficient neuromorphic systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
Plantamura_Domenico.pdf
accesso aperto
Dimensione
1.3 MB
Formato
Adobe PDF
|
1.3 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/108237