Chronic Kidney Disease (CKD) is a progressive condition characterised by a gradual decline in kidney function over months or years. This global disease burden surpasses 10–15% of the worldwide population and is associated with significant morbidity and mortality if not detected early. CKD is commonly caused by diabetes mellitus and hypertension, although other factors such as age, cardiovascular disease, and genetic predisposition also contribute to its development. The progression of CKD is typically measured using key laboratory markers, particularly the estimated glomerular filtration rate (eGFR) and markers of albuminuria, alongside complementary indicators such as Serum creatinine, blood urea, hemoglobin, and electrolytes. Early recognition of adverse trends in these parameters is crucial for timely interventions, the prevention of complications, and the reduction of progression to end-stage renal disease (ESRD). This thesis uses a clinical dataset of 200 patients with 29 demographic, clinical, and laboratory features from Enam Medical College Hospital in Bangladesh, obtained from the UCI Machine Learning Repository. The study focuses on three main objectives: (1) to perform descriptive analytics to explore distribution and relationships among key CKD biomarkers, comorbidities, and demographic factors (age); and (2) to develop a Risk Profiling Framework that groups patients into three clinically interpretable risk categories (Mild, Moderate, Severe) based on CKD status and stage; and (3) to identify statistically significant associations between biomarkers, comorbidities, and disease severity using non-parametric statistical tests. The methodological approach is cross-sectional and descriptive. Range-encoded laboratory variables were cleaned and transformed into midpoint numeric approximations to enable summary statistics, correlation analysis, and nonparametric testing. Associations between CKD statuses, risk groups, biomarkers, and comorbidities were examined using chi-square test with Cramer’s V, Spearman rank correlations, and Kruskal-Wallis Tests. The results highlight expected patterns such as the inverse relationship between eGFR and serum creatinine, higher prevalence of hypertension and diabetes among CKD patients, and progressively worse biomarker profiles across Mild, Moderate, and Severe risk groups. These findings contribute to a better understanding of laboratory-driven risk patterns in CKD management and provide a data-driven foundation for identifying high-risk patient subgroups requiring intensive nephrology monitoring.
Descriptive Analytics and Risk Profiling of Chronic Kidney Disease Patients (using data from the UCI repository)
ADEBAYO, HAZEEZAT ADEBIMPE
2024/2025
Abstract
Chronic Kidney Disease (CKD) is a progressive condition characterised by a gradual decline in kidney function over months or years. This global disease burden surpasses 10–15% of the worldwide population and is associated with significant morbidity and mortality if not detected early. CKD is commonly caused by diabetes mellitus and hypertension, although other factors such as age, cardiovascular disease, and genetic predisposition also contribute to its development. The progression of CKD is typically measured using key laboratory markers, particularly the estimated glomerular filtration rate (eGFR) and markers of albuminuria, alongside complementary indicators such as Serum creatinine, blood urea, hemoglobin, and electrolytes. Early recognition of adverse trends in these parameters is crucial for timely interventions, the prevention of complications, and the reduction of progression to end-stage renal disease (ESRD). This thesis uses a clinical dataset of 200 patients with 29 demographic, clinical, and laboratory features from Enam Medical College Hospital in Bangladesh, obtained from the UCI Machine Learning Repository. The study focuses on three main objectives: (1) to perform descriptive analytics to explore distribution and relationships among key CKD biomarkers, comorbidities, and demographic factors (age); and (2) to develop a Risk Profiling Framework that groups patients into three clinically interpretable risk categories (Mild, Moderate, Severe) based on CKD status and stage; and (3) to identify statistically significant associations between biomarkers, comorbidities, and disease severity using non-parametric statistical tests. The methodological approach is cross-sectional and descriptive. Range-encoded laboratory variables were cleaned and transformed into midpoint numeric approximations to enable summary statistics, correlation analysis, and nonparametric testing. Associations between CKD statuses, risk groups, biomarkers, and comorbidities were examined using chi-square test with Cramer’s V, Spearman rank correlations, and Kruskal-Wallis Tests. The results highlight expected patterns such as the inverse relationship between eGFR and serum creatinine, higher prevalence of hypertension and diabetes among CKD patients, and progressively worse biomarker profiles across Mild, Moderate, and Severe risk groups. These findings contribute to a better understanding of laboratory-driven risk patterns in CKD management and provide a data-driven foundation for identifying high-risk patient subgroups requiring intensive nephrology monitoring.| File | Dimensione | Formato | |
|---|---|---|---|
|
Thesis (1).pdf
Accesso riservato
Dimensione
1.51 MB
Formato
Adobe PDF
|
1.51 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/102094