Statistical Learning Methods for Psychiatric Disorder Classification Using Resting-State Electroencephalography Recordings

In this thesis, we focus on supervised learning analysis of electroencephalography (EEG) data for classifying major psychiatric disorders using Random Forest models. A reworking of an approach proposed in the literature is presented, aiming to distinguish between healthy subjects and patients with a clinical diagnosis using a real dataset comprising 945 subjects. The dataset includes sociodemographic and clinical covariates, as well as variables derived from EEG recordings via the Fast Fourier Transform. These variables include measures of spectral power and phase coherence, computed across the main frequency bands of brain activity. The main objective of this work is to evaluate the impact of dimensionality reduction of EEG-derived variables via Principal Component Analysis on the performance of Random Forest classifiers. Principal components are used, together with sociodemographic variables, to train models for classifying subjects based on their clinical condition. The models are trained under different configurations, distinguished by the parameter type (spectral power or phase coherence) and the reference frequency band. The best configurations are selected using a 5-fold cross-validation procedure, with the Area Under the Curve as the evaluation metric. The results highlight limitations in the stability of the estimates, attributable to the limited sample sizes for several diagnostic categories. A comparison is also performed between models built using only sociodemographic covariates and models based exclusively on EEG-derived variables. This comparison shows that the latter provides only a marginal contribution to classification performance, whereas the predictive ability is largely driven by sociodemographic variables alone.

Statistical Learning Methods for Psychiatric Disorder Classification Using Resting-State Electroencephalography Recordings

VINCENZI, MARGHERITA

2025/2026

Abstract

In this thesis, we focus on supervised learning analysis of electroencephalography (EEG) data for classifying major psychiatric disorders using Random Forest models. A reworking of an approach proposed in the literature is presented, aiming to distinguish between healthy subjects and patients with a clinical diagnosis using a real dataset comprising 945 subjects. The dataset includes sociodemographic and clinical covariates, as well as variables derived from EEG recordings via the Fast Fourier Transform. These variables include measures of spectral power and phase coherence, computed across the main frequency bands of brain activity. The main objective of this work is to evaluate the impact of dimensionality reduction of EEG-derived variables via Principal Component Analysis on the performance of Random Forest classifiers. Principal components are used, together with sociodemographic variables, to train models for classifying subjects based on their clinical condition. The models are trained under different configurations, distinguished by the parameter type (spectral power or phase coherence) and the reference frequency band. The best configurations are selected using a 5-fold cross-validation procedure, with the Area Under the Curve as the evaluation metric. The results highlight limitations in the stability of the estimates, attributable to the limited sample sizes for several diagnostic categories. A comparison is also performed between models built using only sociodemographic covariates and models based exclusively on EEG-derived variables. This comparison shows that the latter provides only a marginal contribution to classification performance, whereas the predictive ability is largely driven by sociodemographic variables alone.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Scienze Statistiche
			
	Corso di studio
	
				STATISTICA PER L'ECONOMIA E L'IMPRESA Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				Statistical Learning Methods for Psychiatric Disorder Classification Using Resting-State Electroencephalography Recordings
			
	Parola chiave
	
				Random Forest
Classification
Psychiatry
EEG
			
	Relatore
	
				DENTI, FRANCESCO
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
Vincenzi_Margherita.pdf accesso aperto Dimensione 1.71 MB Formato Adobe PDF Visualizza/Apri	1.71 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106088