Background Lung cancer remains the leading cause of cancer-related mortality globally, with non–small cell lung cancer (NSCLC) accounting for approximately 80–85% of cases. Despite recent advancements in targeted therapies and immunotherapies, survival rates remain low, highlighting the need for robust predictive models based on real-world, population-level data. Methods Using data from the Surveillance, Epidemiology, and End Results (SEER) database, we employed Random Survival Forests (RSF), a nonparametric ensemble machine-learning approach, to model cancer-specific survival for 135,969 patients diagnosed with NSCLC, integrating demographic, clinical, and treatment-related variables. Model performance was evaluated using out-of-bag (OOB) predictions, specifically calculating Harrell’s concordance index (C-index) and continuous rank probability score (CRPS). Results The final RSF model demonstrated robust predictive accuracy, with a C-index of 0.793 and a CRPS of 17.28, reflecting strong model discrimination and calibration for survival prediction in a large and heterogeneous patient cohort. Important predictors included stage, tumor characteristics, treatments received, and demographic factors, effectively capturing complex relationships influencing NSCLC survival. Conclusions The RSF model applied to population-level SEER data exhibited high performance and could effectively identify subgroups of patients at heightened risk for cancer-specific mortality. Despite inherent limitations associated with registry-based observational data, including potential biases and absence of detailed clinical variables, our results highlight the value of advanced machine-learning methods in real-world oncology analytics. This approach may inform targeted clinical strategies and population health interventions to improve NSCLC outcomes.
Statistical Considerations for Lung Cancer Staging Systems
BRIGIARI, GLORIA
2022/2023
Abstract
Background Lung cancer remains the leading cause of cancer-related mortality globally, with non–small cell lung cancer (NSCLC) accounting for approximately 80–85% of cases. Despite recent advancements in targeted therapies and immunotherapies, survival rates remain low, highlighting the need for robust predictive models based on real-world, population-level data. Methods Using data from the Surveillance, Epidemiology, and End Results (SEER) database, we employed Random Survival Forests (RSF), a nonparametric ensemble machine-learning approach, to model cancer-specific survival for 135,969 patients diagnosed with NSCLC, integrating demographic, clinical, and treatment-related variables. Model performance was evaluated using out-of-bag (OOB) predictions, specifically calculating Harrell’s concordance index (C-index) and continuous rank probability score (CRPS). Results The final RSF model demonstrated robust predictive accuracy, with a C-index of 0.793 and a CRPS of 17.28, reflecting strong model discrimination and calibration for survival prediction in a large and heterogeneous patient cohort. Important predictors included stage, tumor characteristics, treatments received, and demographic factors, effectively capturing complex relationships influencing NSCLC survival. Conclusions The RSF model applied to population-level SEER data exhibited high performance and could effectively identify subgroups of patients at heightened risk for cancer-specific mortality. Despite inherent limitations associated with registry-based observational data, including potential biases and absence of detailed clinical variables, our results highlight the value of advanced machine-learning methods in real-world oncology analytics. This approach may inform targeted clinical strategies and population health interventions to improve NSCLC outcomes.| File | Dimensione | Formato | |
|---|---|---|---|
|
tesi_brigiari (1).pdf
Accesso riservato
Dimensione
368.94 kB
Formato
Adobe PDF
|
368.94 kB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/86249