This thesis project, developed in collaboration with Exacon, focuses on the in- tegration, analysis, and visualization of demographic and residential data from the city of Milan. The goal is to explore how data driven approaches can support urban planning, particularly in understanding population dynamics in relation to the con- struction of new residential buildings. The entire analytical pipeline was designed and implemented from scratch, following a full stack perspective. The first phase of the project consisted of collecting and integrating public datasets provided by the city of Milan. These datasets include historical series of births, deaths, population counts, household compositions, and newly constructed residential buildings at the district level. The data integration process was carried out using PySpark, enabling the development of scalable ETL workflows suitable for handling large and heterogeneous datasets. Special attention was dedicated to data cleaning and quality assurance, ensuring consistency across multiple sources and time frames. The second phase focused on both descriptive and predictive analysis. Ex- ploratory data analysis was conducted to identify key trends and patterns in de- mographic evolution over the past two decades. Subsequently, different predictive models were evaluated to forecast the number of residents in each district over time. The analysis also investigated potential correlations between demographic shifts and the emergence of new residential constructions, with the aim of identifying signifi- cant spatial and temporal relationships. Overall, the project demonstrates the potential of open data combined with modern data science techniques to support evidence based decision making in the context of urban development.
Analysis of Demographic and Residential Data from Milan: Integration, Modeling and Visualization
DI TULLIO, TOMMASO
2024/2025
Abstract
This thesis project, developed in collaboration with Exacon, focuses on the in- tegration, analysis, and visualization of demographic and residential data from the city of Milan. The goal is to explore how data driven approaches can support urban planning, particularly in understanding population dynamics in relation to the con- struction of new residential buildings. The entire analytical pipeline was designed and implemented from scratch, following a full stack perspective. The first phase of the project consisted of collecting and integrating public datasets provided by the city of Milan. These datasets include historical series of births, deaths, population counts, household compositions, and newly constructed residential buildings at the district level. The data integration process was carried out using PySpark, enabling the development of scalable ETL workflows suitable for handling large and heterogeneous datasets. Special attention was dedicated to data cleaning and quality assurance, ensuring consistency across multiple sources and time frames. The second phase focused on both descriptive and predictive analysis. Ex- ploratory data analysis was conducted to identify key trends and patterns in de- mographic evolution over the past two decades. Subsequently, different predictive models were evaluated to forecast the number of residents in each district over time. The analysis also investigated potential correlations between demographic shifts and the emergence of new residential constructions, with the aim of identifying signifi- cant spatial and temporal relationships. Overall, the project demonstrates the potential of open data combined with modern data science techniques to support evidence based decision making in the context of urban development.| File | Dimensione | Formato | |
|---|---|---|---|
|
Di Tullio Tommaso.pdf
accesso aperto
Dimensione
2.55 MB
Formato
Adobe PDF
|
2.55 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/102106