Audio classification involves complex temporal dependencies spanning multiple time scales, requiring models that integrate information over time while preserving fine-grained temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework for such processing, by encoding information as sparse, asynchronous binary events (spikes). Their event-driven nature makes them particularly suited for temporal data streams, enabling high responsiveness and significant energy savings compared to conventional neural networks. Extending SNNs with recurrent connections yields Recurrent Spiking Neural Networks (RSNNs), which are better suited to capturing long-range temporal dependencies. However, training RSNNs with gradient-based optimization remains challenging due to vanishing and exploding gradients over long sequences, limiting the scalability of deep architectures, as in standard RNNs. A promising solution is the introduction of learnable delays in recurrent connections. These delays model the propagation time of spikes between neurons and effectively act as temporal skip connections, facilitating gradient flow. The DELREC method introduces a way to learn axonal delays, together with the other network weights, using surrogate gradient learning. The method achieved state-of-the-art performance on the Spiking Speech Commands (SSC) dataset and competitive results on the Spiking Heidelberg Digits (SHD) one. Despite its effectiveness, DELREC relies on fully dense recurrent connections, resulting in a quadratic parameter complexity that hinders scalability. In this thesis we propose replacing dense recurrent connectivity with lightweight one-dimensional (1D) convolutions. This design leverages the strong local correlations present in audio representations, where adjacent frequency channels exhibit similar activation patterns due to the harmonic structure of speech, the spectral continuity of acoustic events, and overlapping cochlear filter responses. By exploiting this locality, the model maintains expressive power while significantly reducing computational cost. The proposed approach achieves a 99.995% reduction in recurrent parameters, resulting in a 25–52× faster inference compared to the original DELREC method. Evaluated on the SHD and SSC datasets, it attains test accuracies of 91.51% ± 0.70% and 78.59% ± 0.39%, respectively. An ablation study further highlights the importance of learnable delays, with improvements of 5.23 (SHD) and 3.50 (SSC) percentage points in test accuracy. Additionally, learnable delays significantly reduce cross-seed variance, contributing to more stable and reliable training compared to fixed-delay approaches. These results demonstrate that convolutional recurrence with learnable delays constitutes an efficient and scalable alternative to fully connected RSNN architectures.

Audio classification involves complex temporal dependencies spanning multiple time scales, requiring models that integrate information over time while preserving fine-grained temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework for such processing, by encoding information as sparse, asynchronous binary events (spikes). Their event-driven nature makes them particularly suited for temporal data streams, enabling high responsiveness and significant energy savings compared to conventional neural networks. Extending SNNs with recurrent connections yields Recurrent Spiking Neural Networks (RSNNs), which are better suited to capturing long-range temporal dependencies. However, training RSNNs with gradient-based optimization remains challenging due to vanishing and exploding gradients over long sequences, limiting the scalability of deep architectures, as in standard RNNs. A promising solution is the introduction of learnable delays in recurrent connections. These delays model the propagation time of spikes between neurons and effectively act as temporal skip connections, facilitating gradient flow. The DELREC method introduces a way to learn axonal delays, together with the other network weights, using surrogate gradient learning. The method achieved state-of-the-art performance on the Spiking Speech Commands (SSC) dataset and competitive results on the Spiking Heidelberg Digits (SHD) one. Despite its effectiveness, DELREC relies on fully dense recurrent connections, resulting in a quadratic parameter complexity that hinders scalability. In this thesis we propose replacing dense recurrent connectivity with lightweight one-dimensional (1D) convolutions. This design leverages the strong local correlations present in audio representations, where adjacent frequency channels exhibit similar activation patterns due to the harmonic structure of speech, the spectral continuity of acoustic events, and overlapping cochlear filter responses. By exploiting this locality, the model maintains expressive power while significantly reducing computational cost. The proposed approach achieves a 99.995% reduction in recurrent parameters, resulting in a 25–52× faster inference compared to the original DELREC method. Evaluated on the SHD and SSC datasets, it attains test accuracies of 91.51% ± 0.70% and 78.59% ± 0.39%, respectively. An ablation study further highlights the importance of learnable delays, with improvements of 5.23 (SHD) and 3.50 (SSC) percentage points in test accuracy. Additionally, learnable delays significantly reduce cross-seed variance, contributing to more stable and reliable training compared to fixed-delay approaches. These results demonstrate that convolutional recurrence with learnable delays constitutes an efficient and scalable alternative to fully connected RSNN architectures.

Convolutional Recurrence in Spiking Neural Networks: A Parameter-Efficient Approach to Learnable Delays for Audio Classification

FOLLY SANCHES ZEBENDO, LÚCIO
2025/2026

Abstract

Audio classification involves complex temporal dependencies spanning multiple time scales, requiring models that integrate information over time while preserving fine-grained temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework for such processing, by encoding information as sparse, asynchronous binary events (spikes). Their event-driven nature makes them particularly suited for temporal data streams, enabling high responsiveness and significant energy savings compared to conventional neural networks. Extending SNNs with recurrent connections yields Recurrent Spiking Neural Networks (RSNNs), which are better suited to capturing long-range temporal dependencies. However, training RSNNs with gradient-based optimization remains challenging due to vanishing and exploding gradients over long sequences, limiting the scalability of deep architectures, as in standard RNNs. A promising solution is the introduction of learnable delays in recurrent connections. These delays model the propagation time of spikes between neurons and effectively act as temporal skip connections, facilitating gradient flow. The DELREC method introduces a way to learn axonal delays, together with the other network weights, using surrogate gradient learning. The method achieved state-of-the-art performance on the Spiking Speech Commands (SSC) dataset and competitive results on the Spiking Heidelberg Digits (SHD) one. Despite its effectiveness, DELREC relies on fully dense recurrent connections, resulting in a quadratic parameter complexity that hinders scalability. In this thesis we propose replacing dense recurrent connectivity with lightweight one-dimensional (1D) convolutions. This design leverages the strong local correlations present in audio representations, where adjacent frequency channels exhibit similar activation patterns due to the harmonic structure of speech, the spectral continuity of acoustic events, and overlapping cochlear filter responses. By exploiting this locality, the model maintains expressive power while significantly reducing computational cost. The proposed approach achieves a 99.995% reduction in recurrent parameters, resulting in a 25–52× faster inference compared to the original DELREC method. Evaluated on the SHD and SSC datasets, it attains test accuracies of 91.51% ± 0.70% and 78.59% ± 0.39%, respectively. An ablation study further highlights the importance of learnable delays, with improvements of 5.23 (SHD) and 3.50 (SSC) percentage points in test accuracy. Additionally, learnable delays significantly reduce cross-seed variance, contributing to more stable and reliable training compared to fixed-delay approaches. These results demonstrate that convolutional recurrence with learnable delays constitutes an efficient and scalable alternative to fully connected RSNN architectures.
2025
Convolutional Recurrence in Spiking Neural Networks: A Parameter-Efficient Approach to Learnable Delays for Audio Classification
Audio classification involves complex temporal dependencies spanning multiple time scales, requiring models that integrate information over time while preserving fine-grained temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework for such processing, by encoding information as sparse, asynchronous binary events (spikes). Their event-driven nature makes them particularly suited for temporal data streams, enabling high responsiveness and significant energy savings compared to conventional neural networks. Extending SNNs with recurrent connections yields Recurrent Spiking Neural Networks (RSNNs), which are better suited to capturing long-range temporal dependencies. However, training RSNNs with gradient-based optimization remains challenging due to vanishing and exploding gradients over long sequences, limiting the scalability of deep architectures, as in standard RNNs. A promising solution is the introduction of learnable delays in recurrent connections. These delays model the propagation time of spikes between neurons and effectively act as temporal skip connections, facilitating gradient flow. The DELREC method introduces a way to learn axonal delays, together with the other network weights, using surrogate gradient learning. The method achieved state-of-the-art performance on the Spiking Speech Commands (SSC) dataset and competitive results on the Spiking Heidelberg Digits (SHD) one. Despite its effectiveness, DELREC relies on fully dense recurrent connections, resulting in a quadratic parameter complexity that hinders scalability. In this thesis we propose replacing dense recurrent connectivity with lightweight one-dimensional (1D) convolutions. This design leverages the strong local correlations present in audio representations, where adjacent frequency channels exhibit similar activation patterns due to the harmonic structure of speech, the spectral continuity of acoustic events, and overlapping cochlear filter responses. By exploiting this locality, the model maintains expressive power while significantly reducing computational cost. The proposed approach achieves a 99.995% reduction in recurrent parameters, resulting in a 25–52× faster inference compared to the original DELREC method. Evaluated on the SHD and SSC datasets, it attains test accuracies of 91.51% ± 0.70% and 78.59% ± 0.39%, respectively. An ablation study further highlights the importance of learnable delays, with improvements of 5.23 (SHD) and 3.50 (SSC) percentage points in test accuracy. Additionally, learnable delays significantly reduce cross-seed variance, contributing to more stable and reliable training compared to fixed-delay approaches. These results demonstrate that convolutional recurrence with learnable delays constitutes an efficient and scalable alternative to fully connected RSNN architectures.
SNN
Learnable Delays
Conv. Recurrance
Audio Classification
Parameter Efficiency
File in questo prodotto:
File Dimensione Formato  
Zebendo_Lucio.pdf

accesso aperto

Dimensione 492.21 kB
Formato Adobe PDF
492.21 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106018