EEG Representation Learning via Contrastive Embedding Alignment for Imagined Speech Decoding

This work investigates the potential of electroencephalography (EEG)-based brain– computer interfaces as an alternative communication system for individuals with speech impairments. Rather than focusing on isolated words or phonemes, the study adopts the syllabic structure of the Italian language as the fundamental unit of speech representation. A dataset was collected from 21 participants during the pronunciation and syllabification of a set of sentences. EEG segments were then aligned with acoustic representations by training a neural network to learn a shared latent space through a contrastive learning approach, using embeddings extracted from a pre-trained automatic speech recognition (ASR) model.The evaluation was formulated as an information retrieval task, where a k-nearest neighbors search was performed within the learned EEG embedding space and later a language model was applied to improve the prediction. Both a multi-subject model and subject-specific models were considered and compared. The results show an average Top-10 accuracy of 37.4% for the multisubject model and 38.1% for subject-specific models over a 192-class syllabic vocabulary. Overall, this work demonstrates the feasibility of leveraging syllablebased representations and cross-modal alignment techniques for non-invasive speech decoding. The proposed framework represents a step toward more expressive and scalable assistive communication systems based on EEG signals.

EEG Representation Learning via Contrastive Embedding Alignment for Imagined Speech Decoding

VOLPATO, PIETRO

2025/2026

Abstract

This work investigates the potential of electroencephalography (EEG)-based brain– computer interfaces as an alternative communication system for individuals with speech impairments. Rather than focusing on isolated words or phonemes, the study adopts the syllabic structure of the Italian language as the fundamental unit of speech representation. A dataset was collected from 21 participants during the pronunciation and syllabification of a set of sentences. EEG segments were then aligned with acoustic representations by training a neural network to learn a shared latent space through a contrastive learning approach, using embeddings extracted from a pre-trained automatic speech recognition (ASR) model.The evaluation was formulated as an information retrieval task, where a k-nearest neighbors search was performed within the learned EEG embedding space and later a language model was applied to improve the prediction. Both a multi-subject model and subject-specific models were considered and compared. The results show an average Top-10 accuracy of 37.4% for the multisubject model and 38.1% for subject-specific models over a 192-class syllabic vocabulary. Overall, this work demonstrates the feasibility of leveraging syllablebased representations and cross-modal alignment techniques for non-invasive speech decoding. The proposed framework represents a step toward more expressive and scalable assistive communication systems based on EEG signals.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				EEG Representation Learning via Contrastive Embedding Alignment for Imagined Speech Decoding
			
	Abstract in italiano
	
				This work investigates the potential of electroencephalography (EEG)-based brain–
computer interfaces as an alternative communication system for individuals with
speech impairments. Rather than focusing on isolated words or phonemes, the
study adopts the syllabic structure of the Italian language as the fundamental
unit of speech representation. A dataset was collected from 21 participants during
the pronunciation and syllabification of a set of sentences. EEG segments
were then aligned with acoustic representations by training a neural network
to learn a shared latent space through a contrastive learning approach, using
embeddings extracted from a pre-trained automatic speech recognition (ASR)
model.The evaluation was formulated as an information retrieval task, where a
k-nearest neighbors search was performed within the learned EEG embedding
space and later a language model was applied to improve the prediction. Both
a multi-subject model and subject-specific models were considered and compared.
The results show an average Top-10 accuracy of 37.4% for the multisubject
model and 38.1% for subject-specific models over a 192-class syllabic vocabulary.
Overall, this work demonstrates the feasibility of leveraging syllablebased
representations and cross-modal alignment techniques for non-invasive
speech decoding. The proposed framework represents a step toward more expressive
and scalable assistive communication systems based on EEG signals.
			
	Parola chiave
	
				Speech Decoding
BCI
Contrastive Learning
EEG Processing
			
	Relatore
	
				TONIN, LUCA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Volpato_Pietro.pdf accesso aperto Dimensione 1.84 MB Formato Adobe PDF Visualizza/Apri	1.84 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106238