This thesis studies the automatic detection of Bombina sounds in environmental audio recordings. The task is formulated as a binary classification problem on short audio segments derived from long field recordings. The approach is based on convolutional neural networks trained on log-mel spectrograms. Model performance is evaluated using date-based 5-fold cross-validation to prevent data leakage and to account for strong variability between recording days. Additional robustness is achieved through on-the-fly data augmentation applied to the minority class. Overall, two neural network architectures are investigated: a baseline convolutional neural network and an enhanced model with a temporal attention mechanism. Experimental results show that both models achieve high classification performance across all folds, with ROC–AUC values consistently close to 1.0. The attention-based model provides slightly better results, while maintaining a comparable number of parameters. These findings demonstrate that convolutional neural networks with temporal attention can be suited for bioacoustic classification of Bombina sounds in real-world field recordings.
This thesis studies the automatic detection of Bombina sounds in environmental audio recordings. The task is formulated as a binary classification problem on short audio segments derived from long field recordings. The approach is based on convolutional neural networks trained on log-mel spectrograms. Model performance is evaluated using date-based 5-fold cross-validation to prevent data leakage and to account for strong variability between recording days. Additional robustness is achieved through on-the-fly data augmentation applied to the minority class. Overall, two neural network architectures are investigated: a baseline convolutional neural network and an enhanced model with a temporal attention mechanism. Experimental results show that both models achieve high classification performance across all folds, with ROC–AUC values consistently close to 1.0. The attention-based model provides slightly better results, while maintaining a comparable number of parameters. These findings demonstrate that convolutional neural networks with temporal attention can be suited for bioacoustic classification of Bombina sounds in real-world field recordings.
Detection of Bombina Vocalizations in Environmental Recordings Using Convolutional Neural Networks
SMIRNOV, ILIA
2025/2026
Abstract
This thesis studies the automatic detection of Bombina sounds in environmental audio recordings. The task is formulated as a binary classification problem on short audio segments derived from long field recordings. The approach is based on convolutional neural networks trained on log-mel spectrograms. Model performance is evaluated using date-based 5-fold cross-validation to prevent data leakage and to account for strong variability between recording days. Additional robustness is achieved through on-the-fly data augmentation applied to the minority class. Overall, two neural network architectures are investigated: a baseline convolutional neural network and an enhanced model with a temporal attention mechanism. Experimental results show that both models achieve high classification performance across all folds, with ROC–AUC values consistently close to 1.0. The attention-based model provides slightly better results, while maintaining a comparable number of parameters. These findings demonstrate that convolutional neural networks with temporal attention can be suited for bioacoustic classification of Bombina sounds in real-world field recordings.| File | Dimensione | Formato | |
|---|---|---|---|
|
thesis_unipd_is (1).pdf
Accesso riservato
Dimensione
2.57 MB
Formato
Adobe PDF
|
2.57 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/108240