Background Lung cancer remains the leading cause of cancer-related mortality globally, with non–small cell lung cancer (NSCLC) accounting for approximately 80–85% of cases. Despite recent advancements in targeted therapies and immunotherapies, survival rates remain low, highlighting the need for robust predictive models based on real-world, population-level data. Methods Using data from the Surveillance, Epidemiology, and End Results (SEER) database, we employed Random Survival Forests (RSF), a nonparametric ensemble machine-learning approach, to model cancer-specific survival for 135,969 patients diagnosed with NSCLC, integrating demographic, clinical, and treatment-related variables. Model performance was evaluated using out-of-bag (OOB) predictions, specifically calculating Harrell’s concordance index (C-index) and continuous rank probability score (CRPS). Results The final RSF model demonstrated robust predictive accuracy, with a C-index of 0.793 and a CRPS of 17.28, reflecting strong model discrimination and calibration for survival prediction in a large and heterogeneous patient cohort. Important predictors included stage, tumor characteristics, treatments received, and demographic factors, effectively capturing complex relationships influencing NSCLC survival. Conclusions The RSF model applied to population-level SEER data exhibited high performance and could effectively identify subgroups of patients at heightened risk for cancer-specific mortality. Despite inherent limitations associated with registry-based observational data, including potential biases and absence of detailed clinical variables, our results highlight the value of advanced machine-learning methods in real-world oncology analytics. This approach may inform targeted clinical strategies and population health interventions to improve NSCLC outcomes.

Statistical Considerations for Lung Cancer Staging Systems

BRIGIARI, GLORIA
2022/2023

Abstract

Background Lung cancer remains the leading cause of cancer-related mortality globally, with non–small cell lung cancer (NSCLC) accounting for approximately 80–85% of cases. Despite recent advancements in targeted therapies and immunotherapies, survival rates remain low, highlighting the need for robust predictive models based on real-world, population-level data. Methods Using data from the Surveillance, Epidemiology, and End Results (SEER) database, we employed Random Survival Forests (RSF), a nonparametric ensemble machine-learning approach, to model cancer-specific survival for 135,969 patients diagnosed with NSCLC, integrating demographic, clinical, and treatment-related variables. Model performance was evaluated using out-of-bag (OOB) predictions, specifically calculating Harrell’s concordance index (C-index) and continuous rank probability score (CRPS). Results The final RSF model demonstrated robust predictive accuracy, with a C-index of 0.793 and a CRPS of 17.28, reflecting strong model discrimination and calibration for survival prediction in a large and heterogeneous patient cohort. Important predictors included stage, tumor characteristics, treatments received, and demographic factors, effectively capturing complex relationships influencing NSCLC survival. Conclusions The RSF model applied to population-level SEER data exhibited high performance and could effectively identify subgroups of patients at heightened risk for cancer-specific mortality. Despite inherent limitations associated with registry-based observational data, including potential biases and absence of detailed clinical variables, our results highlight the value of advanced machine-learning methods in real-world oncology analytics. This approach may inform targeted clinical strategies and population health interventions to improve NSCLC outcomes.
2022
Statistical Considerations for Lung Cancer Staging Systems
Cancer Staging
Prognostic Models
Population Registry
File in questo prodotto:
File Dimensione Formato  
tesi_brigiari (1).pdf

Accesso riservato

Dimensione 368.94 kB
Formato Adobe PDF
368.94 kB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/86249