Convolutional Recurrence in Spiking Neural Networks: A Parameter-Efficient Approach to Learnable Delays for Audio Classification

Audio classification involves complex temporal dependencies spanning multiple time scales, requiring models that integrate information over time while preserving fine-grained temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework for such processing, by encoding information as sparse, asynchronous binary events (spikes). Their event-driven nature makes them particularly suited for temporal data streams, enabling high responsiveness and significant energy savings compared to conventional neural networks. Extending SNNs with recurrent connections yields Recurrent Spiking Neural Networks (RSNNs), which are better suited to capturing long-range temporal dependencies. However, training RSNNs with gradient-based optimization remains challenging due to vanishing and exploding gradients over long sequences, limiting the scalability of deep architectures, as in standard RNNs. A promising solution is the introduction of learnable delays in recurrent connections. These delays model the propagation time of spikes between neurons and effectively act as temporal skip connections, facilitating gradient flow. The DELREC method introduces a way to learn axonal delays, together with the other network weights, using surrogate gradient learning. The method achieved state-of-the-art performance on the Spiking Speech Commands (SSC) dataset and competitive results on the Spiking Heidelberg Digits (SHD) one. Despite its effectiveness, DELREC relies on fully dense recurrent connections, resulting in a quadratic parameter complexity that hinders scalability. In this thesis we propose replacing dense recurrent connectivity with lightweight one-dimensional (1D) convolutions. This design leverages the strong local correlations present in audio representations, where adjacent frequency channels exhibit similar activation patterns due to the harmonic structure of speech, the spectral continuity of acoustic events, and overlapping cochlear filter responses. By exploiting this locality, the model maintains expressive power while significantly reducing computational cost. The proposed approach achieves a 99.995% reduction in recurrent parameters, resulting in a 25–52× faster inference compared to the original DELREC method. Evaluated on the SHD and SSC datasets, it attains test accuracies of 91.51% ± 0.70% and 78.59% ± 0.39%, respectively. An ablation study further highlights the importance of learnable delays, with improvements of 5.23 (SHD) and 3.50 (SSC) percentage points in test accuracy. Additionally, learnable delays significantly reduce cross-seed variance, contributing to more stable and reliable training compared to fixed-delay approaches. These results demonstrate that convolutional recurrence with learnable delays constitutes an efficient and scalable alternative to fully connected RSNN architectures.

Convolutional Recurrence in Spiking Neural Networks: A Parameter-Efficient Approach to Learnable Delays for Audio Classification

FOLLY SANCHES ZEBENDO, LÚCIO

2025/2026

Abstract

Audio classification involves complex temporal dependencies spanning multiple time scales, requiring models that integrate information over time while preserving fine-grained temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework for such processing, by encoding information as sparse, asynchronous binary events (spikes). Their event-driven nature makes them particularly suited for temporal data streams, enabling high responsiveness and significant energy savings compared to conventional neural networks. Extending SNNs with recurrent connections yields Recurrent Spiking Neural Networks (RSNNs), which are better suited to capturing long-range temporal dependencies. However, training RSNNs with gradient-based optimization remains challenging due to vanishing and exploding gradients over long sequences, limiting the scalability of deep architectures, as in standard RNNs. A promising solution is the introduction of learnable delays in recurrent connections. These delays model the propagation time of spikes between neurons and effectively act as temporal skip connections, facilitating gradient flow. The DELREC method introduces a way to learn axonal delays, together with the other network weights, using surrogate gradient learning. The method achieved state-of-the-art performance on the Spiking Speech Commands (SSC) dataset and competitive results on the Spiking Heidelberg Digits (SHD) one. Despite its effectiveness, DELREC relies on fully dense recurrent connections, resulting in a quadratic parameter complexity that hinders scalability. In this thesis we propose replacing dense recurrent connectivity with lightweight one-dimensional (1D) convolutions. This design leverages the strong local correlations present in audio representations, where adjacent frequency channels exhibit similar activation patterns due to the harmonic structure of speech, the spectral continuity of acoustic events, and overlapping cochlear filter responses. By exploiting this locality, the model maintains expressive power while significantly reducing computational cost. The proposed approach achieves a 99.995% reduction in recurrent parameters, resulting in a 25–52× faster inference compared to the original DELREC method. Evaluated on the SHD and SSC datasets, it attains test accuracies of 91.51% ± 0.70% and 78.59% ± 0.39%, respectively. An ablation study further highlights the importance of learnable delays, with improvements of 5.23 (SHD) and 3.50 (SSC) percentage points in test accuracy. Additionally, learnable delays significantly reduce cross-seed variance, contributing to more stable and reliable training compared to fixed-delay approaches. These results demonstrate that convolutional recurrence with learnable delays constitutes an efficient and scalable alternative to fully connected RSNN architectures.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				Convolutional Recurrence in Spiking Neural Networks: A Parameter-Efficient Approach to Learnable Delays for Audio Classification
			
	Abstract in italiano
	
				Audio classification involves complex temporal dependencies spanning multiple time scales, requiring models that integrate information over time while preserving fine-grained temporal structure. Spiking Neural Networks (SNNs) offer a biologically inspired framework for such processing, by encoding information as sparse, asynchronous binary events (spikes). Their event-driven nature makes them particularly suited for temporal data streams, enabling high responsiveness and significant energy savings compared to conventional neural networks. Extending SNNs with recurrent connections yields Recurrent Spiking Neural Networks (RSNNs), which are better suited to capturing long-range temporal dependencies.

However, training RSNNs with gradient-based optimization remains challenging due to vanishing and exploding gradients over long sequences, limiting the scalability of deep architectures, as in standard RNNs. A promising solution is the introduction of learnable delays in recurrent connections. These delays model the propagation time of spikes between neurons and effectively act as temporal skip connections, facilitating gradient flow. The DELREC method introduces a way to learn axonal delays, together with the other network weights, using surrogate gradient learning. The method achieved state-of-the-art performance on the Spiking Speech Commands (SSC) dataset and competitive results on the Spiking Heidelberg Digits (SHD) one.

Despite its effectiveness, DELREC relies on fully dense recurrent connections, resulting in a quadratic parameter complexity that hinders scalability. In this thesis we propose replacing dense recurrent connectivity with lightweight one-dimensional (1D) convolutions. This design leverages the strong local correlations present in audio representations, where adjacent frequency channels exhibit similar activation patterns due to the harmonic structure of speech, the spectral continuity of acoustic events, and overlapping cochlear filter responses. By exploiting this locality, the model maintains expressive power while significantly reducing computational cost.

The proposed approach achieves a 99.995% reduction in recurrent parameters, resulting in a 25–52× faster inference compared to the original DELREC method. Evaluated on the SHD and SSC datasets, it attains test accuracies of 91.51% ± 0.70% and 78.59% ± 0.39%, respectively. An ablation study further highlights the importance of learnable delays, with improvements of 5.23 (SHD) and 3.50 (SSC) percentage points in test accuracy. Additionally, learnable delays significantly reduce cross-seed variance, contributing to more stable and reliable training compared to fixed-delay approaches. These results demonstrate that convolutional recurrence with learnable delays constitutes an efficient and scalable alternative to fully connected RSNN architectures.
			
	Parola chiave
	
				SNN
Learnable Delays
Conv. Recurrance
Audio Classification
Parameter Efficiency
			
	Relatore
	
				ROSSI, MICHELE
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Zebendo_Lucio.pdf accesso aperto Dimensione 492.21 kB Formato Adobe PDF Visualizza/Apri	492.21 kB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106018