Comparison of Weakly Supervised Models for Choroid Plexus Segmentation Using Non-Expert Annotations

The Choroid Plexus (ChP) in the brain's ventricles is implicated in various neurological disorders, including Multiple Sclerosis (MS), where increased ChP volume due to brain inflammation could serve as a biomarker for disease monitoring. Deep learning has proven effective for medical image segmentation but relies heavily on substantial labeled data. Fully supervised models, dependent on expert ground truth (GT) labels, face challenges due to the high cost and time required for annotation. Weakly supervised learning (WSL) offers an alternative by using less precise annotations, which can reduce the need for extensive expert input while maintaining reasonable model performance. This study investigates WSL using annotations from bioengineering students for T1-weighted MRI images of MS patients and healthy controls. These student-generated annotations were compared to GT labels and employed as weak supervision for training deep neural networks (DNNs). Images were defaced using Pydeface to ensure privacy, and manual segmentations were performed with MATLAB’s Medical Imaging Toolbox. Consensus maps from these annotations highlighted challenging regions often left unannotated. Efforts to extend segmentations to a larger dataset by coregistering consensus labels from a small set of annotated images showed inconsistent results, indicating limitations in this approach. Concerning DNN training, an initial attempt was made to apply the DAST framework using the PyMIC environment to handle weak labels and refine noisy annotations. However, this approach did not yield meaningful results, suggesting that the PyMIC framework may not be suitable for the complex task of ChP segmentation from 3D MRI images. Subsequently, various DNN architectures were tested within the MONAI framework, including 3D U-Net, nnU-Net, and UNETR, using different loss functions (Dice and DiceCE) and patch sizes (96x96x96 and 128x128x128). The predictions on the test set were generated using a sliding window approach with overlap percentages of 0.5 and 0.8. While weakly supervised models generally showed lower performance compared to GT-trained models, they still demonstrated competitive results. Notably, the nnU-Net model performed well even with weak labels. The configuration with the best performances was the one with DiceCE loss, patch size 128, and overlap 0.8. When trained with GT labels it registered a mean Dice Coefficient of 0.79 ± 0.07 and a mean Percentage Volume Difference of 10.09 ± 9.17, while the weak-trained counterpart performed slightly worse, with a mean Dice Coefficient of 0.70 ± 0.08 and a mean Percentage Volume Difference of 20.75 ± 16.14. Despite observed variability in volume estimation and boundary accuracy, weakly supervised learning remains a practical alternative when expert data is limited. In summary, weakly supervised learning for ChP segmentation provides a viable approach, complementing or serving as an alternative to fully supervised methods. Although it does not yet match the performance of GT-trained models, further advancements in WSL techniques and model optimization could enhance its effectiveness. Future research should focus on validating results with independent datasets, expanding the annotator pool, and exploring hybrid approaches combining GT and weak labels.

Comparison of Weakly Supervised Models for Choroid Plexus Segmentation Using Non-Expert Annotations

DALL'OSTO, NICOLA

2023/2024

Abstract

The Choroid Plexus (ChP) in the brain's ventricles is implicated in various neurological disorders, including Multiple Sclerosis (MS), where increased ChP volume due to brain inflammation could serve as a biomarker for disease monitoring. Deep learning has proven effective for medical image segmentation but relies heavily on substantial labeled data. Fully supervised models, dependent on expert ground truth (GT) labels, face challenges due to the high cost and time required for annotation. Weakly supervised learning (WSL) offers an alternative by using less precise annotations, which can reduce the need for extensive expert input while maintaining reasonable model performance. This study investigates WSL using annotations from bioengineering students for T1-weighted MRI images of MS patients and healthy controls. These student-generated annotations were compared to GT labels and employed as weak supervision for training deep neural networks (DNNs). Images were defaced using Pydeface to ensure privacy, and manual segmentations were performed with MATLAB’s Medical Imaging Toolbox. Consensus maps from these annotations highlighted challenging regions often left unannotated. Efforts to extend segmentations to a larger dataset by coregistering consensus labels from a small set of annotated images showed inconsistent results, indicating limitations in this approach. Concerning DNN training, an initial attempt was made to apply the DAST framework using the PyMIC environment to handle weak labels and refine noisy annotations. However, this approach did not yield meaningful results, suggesting that the PyMIC framework may not be suitable for the complex task of ChP segmentation from 3D MRI images. Subsequently, various DNN architectures were tested within the MONAI framework, including 3D U-Net, nnU-Net, and UNETR, using different loss functions (Dice and DiceCE) and patch sizes (96x96x96 and 128x128x128). The predictions on the test set were generated using a sliding window approach with overlap percentages of 0.5 and 0.8. While weakly supervised models generally showed lower performance compared to GT-trained models, they still demonstrated competitive results. Notably, the nnU-Net model performed well even with weak labels. The configuration with the best performances was the one with DiceCE loss, patch size 128, and overlap 0.8. When trained with GT labels it registered a mean Dice Coefficient of 0.79 ± 0.07 and a mean Percentage Volume Difference of 10.09 ± 9.17, while the weak-trained counterpart performed slightly worse, with a mean Dice Coefficient of 0.70 ± 0.08 and a mean Percentage Volume Difference of 20.75 ± 16.14. Despite observed variability in volume estimation and boundary accuracy, weakly supervised learning remains a practical alternative when expert data is limited. In summary, weakly supervised learning for ChP segmentation provides a viable approach, complementing or serving as an alternative to fully supervised methods. Although it does not yet match the performance of GT-trained models, further advancements in WSL techniques and model optimization could enhance its effectiveness. Future research should focus on validating results with independent datasets, expanding the annotator pool, and exploring hybrid approaches combining GT and weak labels.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				BIOINGEGNERIA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Comparison of Weakly Supervised Models for Choroid Plexus Segmentation Using Non-Expert Annotations
			
	Parola chiave
	
				choroid plexus
segmentation
weak supervision
			
	Relatore
	
				CASTELLARO, MARCO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
DallOsto_Nicola.pdf Open Access dal 04/09/2025 Dimensione 23.03 MB Formato Adobe PDF Visualizza/Apri	23.03 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/69265