Sound distance estimation is a key, yet comparatively underexplored, component of sound source localization within sound event localization and detection systems, particularly in single-channel (monaural) scenarios where spatial cues are limited. This work investigates the integration of a reverberation-oriented feature, the short-term power of the autocorrelation coefficients, into an existing convolutional recurrent neural network model for monaural speaker distance estimation. The proposed approach replaces phase-based features with the new feature while retaining the magnitude of the short-time Fourier transform. Experiments are conducted on a synthetic dataset, with added real background noise. Results show that, although phase-based features achieve higher accuracy in noiseless conditions, the reverberation-oriented feature provides more stable performance across varying noise levels.
Sound distance estimation is a key, yet comparatively underexplored, component of sound source localization within sound event localization and detection systems, particularly in single-channel (monaural) scenarios where spatial cues are limited. This work investigates the integration of a reverberation-oriented feature, the short-term power of the autocorrelation coefficients, into an existing convolutional recurrent neural network model for monaural speaker distance estimation. The proposed approach replaces phase-based features with the new feature while retaining the magnitude of the short-time Fourier transform. Experiments are conducted on a synthetic dataset, with added real background noise. Results show that, although phase-based features achieve higher accuracy in noiseless conditions, the reverberation-oriented feature provides more stable performance across varying noise levels.
Single-Channel Speaker Distance Estimation using the Short-Term Power of the Autocorrelation
TOMADA, RICCARDO
2025/2026
Abstract
Sound distance estimation is a key, yet comparatively underexplored, component of sound source localization within sound event localization and detection systems, particularly in single-channel (monaural) scenarios where spatial cues are limited. This work investigates the integration of a reverberation-oriented feature, the short-term power of the autocorrelation coefficients, into an existing convolutional recurrent neural network model for monaural speaker distance estimation. The proposed approach replaces phase-based features with the new feature while retaining the magnitude of the short-time Fourier transform. Experiments are conducted on a synthetic dataset, with added real background noise. Results show that, although phase-based features achieve higher accuracy in noiseless conditions, the reverberation-oriented feature provides more stable performance across varying noise levels.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tomada_Riccardo.pdf
accesso aperto
Dimensione
489.09 kB
Formato
Adobe PDF
|
489.09 kB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/104356