The "SEPTA - Regional Rail" dataset on Kaggle focuses on Traffic Planning and Management in Passenger Transport. It provides On-Time Performance data for regional trains from the Southeastern Pennsylvania Transportation System, which serves 13 branches and over 150 active stations in Philadelphia and its suburbs. Covering the period from March 23, 2016, to November 6, 2016, the dataset includes schedules for 1654 unique trains, 157 origin stations, 59 destination stations, and 155 intermediate stations. This project proposes using the XGBOOST algorithm to evaluate and predict train delays. By incorporating new features into the dataset, such as previous delays and train status at each station, the goal is to build a model that predicts both the amount of delay (regression) and the occurrence of delays (classification) for each train at each station. Each data point will include information about the current train status, the total delays along the route, delays at preceding stations, and distances between stations. The primary steps involve data cleaning and preprocessing, exploratory analysis, feature engineering, and modeling with hyperparameter tuning to forecast delay magnitudes and frequencies. Additionally, incorporating weather data aims to identify spatial or seasonal patterns that could enhance prediction accuracy.
The "SEPTA - Regional Rail" dataset on Kaggle focuses on Traffic Planning and Management in Passenger Transport. It provides On-Time Performance data for regional trains from the Southeastern Pennsylvania Transportation System, which serves 13 branches and over 150 active stations in Philadelphia and its suburbs. Covering the period from March 23, 2016, to November 6, 2016, the dataset includes schedules for 1654 unique trains, 157 origin stations, 59 destination stations, and 155 intermediate stations. This project proposes using the XGBOOST algorithm to evaluate and predict train delays. By incorporating new features into the dataset, such as previous delays and train status at each station, the goal is to build a model that predicts both the amount of delay (regression) and the occurrence of delays (classification) for each train at each station. Each data point will include information about the current train status, the total delays along the route, delays at preceding stations, and distances between stations. The primary steps involve data cleaning and preprocessing, exploratory analysis, feature engineering, and modeling with hyperparameter tuning to forecast delay magnitudes and frequencies. Additionally, incorporating weather data aims to identify spatial or seasonal patterns that could enhance prediction accuracy.
Estimation of On-Time performance of Philadelphia's Regional Trains
KARIMI, NAZANIN
2024/2025
Abstract
The "SEPTA - Regional Rail" dataset on Kaggle focuses on Traffic Planning and Management in Passenger Transport. It provides On-Time Performance data for regional trains from the Southeastern Pennsylvania Transportation System, which serves 13 branches and over 150 active stations in Philadelphia and its suburbs. Covering the period from March 23, 2016, to November 6, 2016, the dataset includes schedules for 1654 unique trains, 157 origin stations, 59 destination stations, and 155 intermediate stations. This project proposes using the XGBOOST algorithm to evaluate and predict train delays. By incorporating new features into the dataset, such as previous delays and train status at each station, the goal is to build a model that predicts both the amount of delay (regression) and the occurrence of delays (classification) for each train at each station. Each data point will include information about the current train status, the total delays along the route, delays at preceding stations, and distances between stations. The primary steps involve data cleaning and preprocessing, exploratory analysis, feature engineering, and modeling with hyperparameter tuning to forecast delay magnitudes and frequencies. Additionally, incorporating weather data aims to identify spatial or seasonal patterns that could enhance prediction accuracy.File | Dimensione | Formato | |
---|---|---|---|
master_thesis_nk_final_version_Feb (3) (1).pdf
accesso aperto
Dimensione
1.07 MB
Formato
Adobe PDF
|
1.07 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/81804