This thesis tackles the growing need for privacy‑preserving speaker‑targeted separation by designing an ultralight algorithm that can run entirely on sub‑120 MHz ear‑bud microcontrollers with under 100 KB of memory. Building on compact FiLM‑conditioned convolutional masks and global layer normalisation, the system isolates a chosen voice from real‑world to OpenSMILE for downstream psychometric analysis. By keeping all computation on‑device, the approach eliminates cloud‑processing risks, satisfying stringent privacy requirements while still enabling rich mental‑health and affective‑computing applications. The thesis therefore bridges the gap between large‑model separation research and the practical constraints of wearable hardware, laying the groundwork for privacy‑safe hearing aids, context‑aware virtual assistants, and next‑generation biometric wearables.
Questa tesi affronta la crescente necessità di separazione vocale mirata e rispettosa della privacy, progettando un algoritmo ultraleggero in grado di funzionare interamente su microcontrollori per auricolari con frequenza inferiore a 120 MHz e meno di 100 KB di memoria. Basato su maschere convoluzionali compatte condizionate da FiLM e sulla normalizzazione globale dei layer, il sistema isola una voce specifica da scenari acustici reali per inviarla successivamente a OpenSMILE, finalizzato all’analisi psicometrica. Mantenendo l’elaborazione completamente on-device, l’approccio elimina i rischi legati al cloud computing, soddisfacendo requisiti di privacy stringenti pur permettendo applicazioni avanzate nel monitoraggio della salute mentale e nel computing affettivo. La tesi colma così il divario tra la ricerca su modelli di separazione di grandi dimensioni e i vincoli pratici dell’hardware indossabile, ponendo le basi per apparecchi acustici sicuri, assistenti virtuali contestualmente consapevoli e dispositivi biometrici di nuova generazione.
Target Speaker Separation on edge devices
KHOSRAVI, ELHAM
2024/2025
Abstract
This thesis tackles the growing need for privacy‑preserving speaker‑targeted separation by designing an ultralight algorithm that can run entirely on sub‑120 MHz ear‑bud microcontrollers with under 100 KB of memory. Building on compact FiLM‑conditioned convolutional masks and global layer normalisation, the system isolates a chosen voice from real‑world to OpenSMILE for downstream psychometric analysis. By keeping all computation on‑device, the approach eliminates cloud‑processing risks, satisfying stringent privacy requirements while still enabling rich mental‑health and affective‑computing applications. The thesis therefore bridges the gap between large‑model separation research and the practical constraints of wearable hardware, laying the groundwork for privacy‑safe hearing aids, context‑aware virtual assistants, and next‑generation biometric wearables.| File | Dimensione | Formato | |
|---|---|---|---|
|
UniPD_KIT_Thesis.pdf
accesso aperto
Dimensione
1.96 MB
Formato
Adobe PDF
|
1.96 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91175