This thesis aims to develop a preprocessing pipeline for feature extraction, the initial step in diagnosing colorectal cancers through histopathological slide analysis. Colorectal cancer (CRC) represents a significant global health challenge, necessitating advanced diagnostic and therapeutic strategies. Traditional methods rely heavily on convolutional neural networks (CNNs) which despite being well performing have limitations. This work introduces a transformer-based pipeline designed to improve the accuracy, generalizability, and efficiency of biomarker prediction from pathology slides. The pipeline is structured into three modules. The first, patchextraction, utilizes RGB thresholding and Canny edge detection to isolate tissue regions and extract patches from whole-slide images (WSIs), discarding patches with minimal tissue content. The second, patchnormalization, applies the Macenko method to standardize the extracted patches, allowing consistent color representation across images. The third and last module, featuresectraction, uses the CTransPath model, a transformer-based approach, for extracting relevant features from the normalized patches. Automation is achieved through Nextflow, which coordinates the workflow and allows user customization of various parameters, such as threshold values and options to save extracted or normalized patches. The pipeline supports multi-threading, enhancing the efficiency of the entire process. Evaluation on large cohorts has demonstrated that this transformer-based pipeline is a valid option in terms of performance, data efficiency, and interpretability. The results affirm the potential of transformers in advancing CRC diagnostics, providing a robust tool for precision oncology. This pipeline not only facilitates the rapid and accurate prediction of prognostic biomarkers but also addresses the limitations of conventional genetic testing, making personalized care more accessible and effective.

This thesis aims to develop a preprocessing pipeline for feature extraction, the initial step in diagnosing colorectal cancers through histopathological slide analysis. Colorectal cancer (CRC) represents a significant global health challenge, necessitating advanced diagnostic and therapeutic strategies. Traditional methods rely heavily on convolutional neural networks (CNNs) which despite being well performing have limitations. This work introduces a transformer-based pipeline designed to improve the accuracy, generalizability, and efficiency of biomarker prediction from pathology slides. The pipeline is structured into three modules. The first, patchextraction, utilizes RGB thresholding and Canny edge detection to isolate tissue regions and extract patches from whole-slide images (WSIs), discarding patches with minimal tissue content. The second, patchnormalization, applies the Macenko method to standardize the extracted patches, allowing consistent color representation across images. The third and last module, featuresectraction, uses the CTransPath model, a transformer-based approach, for extracting relevant features from the normalized patches. Automation is achieved through Nextflow, which coordinates the workflow and allows user customization of various parameters, such as threshold values and options to save extracted or normalized patches. The pipeline supports multi-threading, enhancing the efficiency of the entire process. Evaluation on large cohorts has demonstrated that this transformer-based pipeline is a valid option in terms of performance, data efficiency, and interpretability. The results affirm the potential of transformers in advancing CRC diagnostics, providing a robust tool for precision oncology. This pipeline not only facilitates the rapid and accurate prediction of prognostic biomarkers but also addresses the limitations of conventional genetic testing, making personalized care more accessible and effective.

Transforming Histopathology: A Preprocessing Pipeline for Extracting Features from H&E Whole-Slide Images Using a Transformer-Based Model.

GIROTTO, MATTEO
2023/2024

Abstract

This thesis aims to develop a preprocessing pipeline for feature extraction, the initial step in diagnosing colorectal cancers through histopathological slide analysis. Colorectal cancer (CRC) represents a significant global health challenge, necessitating advanced diagnostic and therapeutic strategies. Traditional methods rely heavily on convolutional neural networks (CNNs) which despite being well performing have limitations. This work introduces a transformer-based pipeline designed to improve the accuracy, generalizability, and efficiency of biomarker prediction from pathology slides. The pipeline is structured into three modules. The first, patchextraction, utilizes RGB thresholding and Canny edge detection to isolate tissue regions and extract patches from whole-slide images (WSIs), discarding patches with minimal tissue content. The second, patchnormalization, applies the Macenko method to standardize the extracted patches, allowing consistent color representation across images. The third and last module, featuresectraction, uses the CTransPath model, a transformer-based approach, for extracting relevant features from the normalized patches. Automation is achieved through Nextflow, which coordinates the workflow and allows user customization of various parameters, such as threshold values and options to save extracted or normalized patches. The pipeline supports multi-threading, enhancing the efficiency of the entire process. Evaluation on large cohorts has demonstrated that this transformer-based pipeline is a valid option in terms of performance, data efficiency, and interpretability. The results affirm the potential of transformers in advancing CRC diagnostics, providing a robust tool for precision oncology. This pipeline not only facilitates the rapid and accurate prediction of prognostic biomarkers but also addresses the limitations of conventional genetic testing, making personalized care more accessible and effective.
2023
Transforming Histopathology: A Preprocessing Pipeline for Extracting Features from H&E Whole-Slide Images Using a Transformer-Based Model.
This thesis aims to develop a preprocessing pipeline for feature extraction, the initial step in diagnosing colorectal cancers through histopathological slide analysis. Colorectal cancer (CRC) represents a significant global health challenge, necessitating advanced diagnostic and therapeutic strategies. Traditional methods rely heavily on convolutional neural networks (CNNs) which despite being well performing have limitations. This work introduces a transformer-based pipeline designed to improve the accuracy, generalizability, and efficiency of biomarker prediction from pathology slides. The pipeline is structured into three modules. The first, patchextraction, utilizes RGB thresholding and Canny edge detection to isolate tissue regions and extract patches from whole-slide images (WSIs), discarding patches with minimal tissue content. The second, patchnormalization, applies the Macenko method to standardize the extracted patches, allowing consistent color representation across images. The third and last module, featuresectraction, uses the CTransPath model, a transformer-based approach, for extracting relevant features from the normalized patches. Automation is achieved through Nextflow, which coordinates the workflow and allows user customization of various parameters, such as threshold values and options to save extracted or normalized patches. The pipeline supports multi-threading, enhancing the efficiency of the entire process. Evaluation on large cohorts has demonstrated that this transformer-based pipeline is a valid option in terms of performance, data efficiency, and interpretability. The results affirm the potential of transformers in advancing CRC diagnostics, providing a robust tool for precision oncology. This pipeline not only facilitates the rapid and accurate prediction of prognostic biomarkers but also addresses the limitations of conventional genetic testing, making personalized care more accessible and effective.
Extracting Features
Histopathology
Transformer-Based
File in questo prodotto:
File Dimensione Formato  
Girotto_Matteo.pdf

embargo fino al 08/07/2027

Dimensione 23.2 MB
Formato Adobe PDF
23.2 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/66488