Sound distance estimation is a key, yet comparatively underexplored, component of sound source localization within sound event localization and detection systems, particularly in single-channel (monaural) scenarios where spatial cues are limited. This work investigates the integration of a reverberation-oriented feature, the short-term power of the autocorrelation coefficients, into an existing convolutional recurrent neural network model for monaural speaker distance estimation. The proposed approach replaces phase-based features with the new feature while retaining the magnitude of the short-time Fourier transform. Experiments are conducted on a synthetic dataset, with added real background noise. Results show that, although phase-based features achieve higher accuracy in noiseless conditions, the reverberation-oriented feature provides more stable performance across varying noise levels.

Sound distance estimation is a key, yet comparatively underexplored, component of sound source localization within sound event localization and detection systems, particularly in single-channel (monaural) scenarios where spatial cues are limited. This work investigates the integration of a reverberation-oriented feature, the short-term power of the autocorrelation coefficients, into an existing convolutional recurrent neural network model for monaural speaker distance estimation. The proposed approach replaces phase-based features with the new feature while retaining the magnitude of the short-time Fourier transform. Experiments are conducted on a synthetic dataset, with added real background noise. Results show that, although phase-based features achieve higher accuracy in noiseless conditions, the reverberation-oriented feature provides more stable performance across varying noise levels.

Single-Channel Speaker Distance Estimation using the Short-Term Power of the Autocorrelation

TOMADA, RICCARDO
2025/2026

Abstract

Sound distance estimation is a key, yet comparatively underexplored, component of sound source localization within sound event localization and detection systems, particularly in single-channel (monaural) scenarios where spatial cues are limited. This work investigates the integration of a reverberation-oriented feature, the short-term power of the autocorrelation coefficients, into an existing convolutional recurrent neural network model for monaural speaker distance estimation. The proposed approach replaces phase-based features with the new feature while retaining the magnitude of the short-time Fourier transform. Experiments are conducted on a synthetic dataset, with added real background noise. Results show that, although phase-based features achieve higher accuracy in noiseless conditions, the reverberation-oriented feature provides more stable performance across varying noise levels.
2025
Single-Channel Speaker Distance Estimation using the Short-Term Power of the Autocorrelation
Sound distance estimation is a key, yet comparatively underexplored, component of sound source localization within sound event localization and detection systems, particularly in single-channel (monaural) scenarios where spatial cues are limited. This work investigates the integration of a reverberation-oriented feature, the short-term power of the autocorrelation coefficients, into an existing convolutional recurrent neural network model for monaural speaker distance estimation. The proposed approach replaces phase-based features with the new feature while retaining the magnitude of the short-time Fourier transform. Experiments are conducted on a synthetic dataset, with added real background noise. Results show that, although phase-based features achieve higher accuracy in noiseless conditions, the reverberation-oriented feature provides more stable performance across varying noise levels.
Autocorrelation
SELD
Distance Estimation
File in questo prodotto:
File Dimensione Formato  
Tomada_Riccardo.pdf

accesso aperto

Dimensione 489.09 kB
Formato Adobe PDF
489.09 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/104356