This thesis studies the automatic detection of Bombina sounds in environmental audio recordings. The task is formulated as a binary classification problem on short audio segments derived from long field recordings. The approach is based on convolutional neural networks trained on log-mel spectrograms. Model performance is evaluated using date-based 5-fold cross-validation to prevent data leakage and to account for strong variability between recording days. Additional robustness is achieved through on-the-fly data augmentation applied to the minority class. Overall, two neural network architectures are investigated: a baseline convolutional neural network and an enhanced model with a temporal attention mechanism. Experimental results show that both models achieve high classification performance across all folds, with ROC–AUC values consistently close to 1.0. The attention-based model provides slightly better results, while maintaining a comparable number of parameters. These findings demonstrate that convolutional neural networks with temporal attention can be suited for bioacoustic classification of Bombina sounds in real-world field recordings.

This thesis studies the automatic detection of Bombina sounds in environmental audio recordings. The task is formulated as a binary classification problem on short audio segments derived from long field recordings. The approach is based on convolutional neural networks trained on log-mel spectrograms. Model performance is evaluated using date-based 5-fold cross-validation to prevent data leakage and to account for strong variability between recording days. Additional robustness is achieved through on-the-fly data augmentation applied to the minority class. Overall, two neural network architectures are investigated: a baseline convolutional neural network and an enhanced model with a temporal attention mechanism. Experimental results show that both models achieve high classification performance across all folds, with ROC–AUC values consistently close to 1.0. The attention-based model provides slightly better results, while maintaining a comparable number of parameters. These findings demonstrate that convolutional neural networks with temporal attention can be suited for bioacoustic classification of Bombina sounds in real-world field recordings.

Detection of Bombina Vocalizations in Environmental Recordings Using Convolutional Neural Networks

SMIRNOV, ILIA
2025/2026

Abstract

This thesis studies the automatic detection of Bombina sounds in environmental audio recordings. The task is formulated as a binary classification problem on short audio segments derived from long field recordings. The approach is based on convolutional neural networks trained on log-mel spectrograms. Model performance is evaluated using date-based 5-fold cross-validation to prevent data leakage and to account for strong variability between recording days. Additional robustness is achieved through on-the-fly data augmentation applied to the minority class. Overall, two neural network architectures are investigated: a baseline convolutional neural network and an enhanced model with a temporal attention mechanism. Experimental results show that both models achieve high classification performance across all folds, with ROC–AUC values consistently close to 1.0. The attention-based model provides slightly better results, while maintaining a comparable number of parameters. These findings demonstrate that convolutional neural networks with temporal attention can be suited for bioacoustic classification of Bombina sounds in real-world field recordings.
2025
Detection of Bombina Vocalizations in Environmental Recordings Using Convolutional Neural Networks
This thesis studies the automatic detection of Bombina sounds in environmental audio recordings. The task is formulated as a binary classification problem on short audio segments derived from long field recordings. The approach is based on convolutional neural networks trained on log-mel spectrograms. Model performance is evaluated using date-based 5-fold cross-validation to prevent data leakage and to account for strong variability between recording days. Additional robustness is achieved through on-the-fly data augmentation applied to the minority class. Overall, two neural network architectures are investigated: a baseline convolutional neural network and an enhanced model with a temporal attention mechanism. Experimental results show that both models achieve high classification performance across all folds, with ROC–AUC values consistently close to 1.0. The attention-based model provides slightly better results, while maintaining a comparable number of parameters. These findings demonstrate that convolutional neural networks with temporal attention can be suited for bioacoustic classification of Bombina sounds in real-world field recordings.
Bombina Vocalization
CNN
Detection
File in questo prodotto:
File Dimensione Formato  
thesis_unipd_is (1).pdf

Accesso riservato

Dimensione 2.57 MB
Formato Adobe PDF
2.57 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108240