In this thesis, we focus on supervised learning analysis of electroencephalography (EEG) data for classifying major psychiatric disorders using Random Forest models. A reworking of an approach proposed in the literature is presented, aiming to distinguish between healthy subjects and patients with a clinical diagnosis using a real dataset comprising 945 subjects. The dataset includes sociodemographic and clinical covariates, as well as variables derived from EEG recordings via the Fast Fourier Transform. These variables include measures of spectral power and phase coherence, computed across the main frequency bands of brain activity. The main objective of this work is to evaluate the impact of dimensionality reduction of EEG-derived variables via Principal Component Analysis on the performance of Random Forest classifiers. Principal components are used, together with sociodemographic variables, to train models for classifying subjects based on their clinical condition. The models are trained under different configurations, distinguished by the parameter type (spectral power or phase coherence) and the reference frequency band. The best configurations are selected using a 5-fold cross-validation procedure, with the Area Under the Curve as the evaluation metric. The results highlight limitations in the stability of the estimates, attributable to the limited sample sizes for several diagnostic categories. A comparison is also performed between models built using only sociodemographic covariates and models based exclusively on EEG-derived variables. This comparison shows that the latter provides only a marginal contribution to classification performance, whereas the predictive ability is largely driven by sociodemographic variables alone.

Statistical Learning Methods for Psychiatric Disorder Classification Using Resting-State Electroencephalography Recordings

VINCENZI, MARGHERITA
2025/2026

Abstract

In this thesis, we focus on supervised learning analysis of electroencephalography (EEG) data for classifying major psychiatric disorders using Random Forest models. A reworking of an approach proposed in the literature is presented, aiming to distinguish between healthy subjects and patients with a clinical diagnosis using a real dataset comprising 945 subjects. The dataset includes sociodemographic and clinical covariates, as well as variables derived from EEG recordings via the Fast Fourier Transform. These variables include measures of spectral power and phase coherence, computed across the main frequency bands of brain activity. The main objective of this work is to evaluate the impact of dimensionality reduction of EEG-derived variables via Principal Component Analysis on the performance of Random Forest classifiers. Principal components are used, together with sociodemographic variables, to train models for classifying subjects based on their clinical condition. The models are trained under different configurations, distinguished by the parameter type (spectral power or phase coherence) and the reference frequency band. The best configurations are selected using a 5-fold cross-validation procedure, with the Area Under the Curve as the evaluation metric. The results highlight limitations in the stability of the estimates, attributable to the limited sample sizes for several diagnostic categories. A comparison is also performed between models built using only sociodemographic covariates and models based exclusively on EEG-derived variables. This comparison shows that the latter provides only a marginal contribution to classification performance, whereas the predictive ability is largely driven by sociodemographic variables alone.
2025
Statistical Learning Methods for Psychiatric Disorder Classification Using Resting-State Electroencephalography Recordings
Random Forest
Classification
Psychiatry
EEG
File in questo prodotto:
File Dimensione Formato  
Vincenzi_Margherita.pdf

accesso aperto

Dimensione 1.71 MB
Formato Adobe PDF
1.71 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106088