Time series measurements of analytes of current versus time are generated using nanopore-based sensing instruments. The training dataset of time series contains three classes labeled with "no event’’ when no analytes are detected, "event A’’ when analytes of type A are detected, and "event B’’ when analytes of type B are detected in measurements. The unseen time series datasets are unlabeled but contain expected ratios of each class. The unlabeled time series is analyzed and classified into three classes using machine learning. The measurements are not time-dependent. Removing it results in a univariate time series which is further split into overlapping sequences using sliding windows. The data is not normalized, as this causes the classifiers to be biased on one class. The windows are trained and compared using four classifiers: fully connected neural networks, random forest, logistic regression, and long short-term memory. Logistic regression with a window size of 0.1 seconds and balanced weights has the most optimal results out of the four tested classifiers. The predictions for the three unlabeled datasets are 2,4:1, 0,8:1, and 0,5:1 for the expected ratios of 3:1, 3:1, and 1:1, respectively. Other classifiers require further experimentation with hyperparameter tuning to produce more satisfying results.

Time series measurements of analytes of current versus time are generated using nanopore-based sensing instruments. The training dataset of time series contains three classes labeled with "no event’’ when no analytes are detected, "event A’’ when analytes of type A are detected, and "event B’’ when analytes of type B are detected in measurements. The unseen time series datasets are unlabeled but contain expected ratios of each class. The unlabeled time series is analyzed and classified into three classes using machine learning. The measurements are not time-dependent. Removing it results in a univariate time series which is further split into overlapping sequences using sliding windows. The data is not normalized, as this causes the classifiers to be biased on one class. The windows are trained and compared using four classifiers: fully connected neural networks, random forest, logistic regression, and long short-term memory. Logistic regression with a window size of 0.1 seconds and balanced weights has the most optimal results out of the four tested classifiers. The predictions for the three unlabeled datasets are 2,4:1, 0,8:1, and 0,5:1 for the expected ratios of 3:1, 3:1, and 1:1, respectively. Other classifiers require further experimentation with hyperparameter tuning to produce more satisfying results.

Time Series Event Classification with Machine Learning

ALIJA, VULNET
2021/2022

Abstract

Time series measurements of analytes of current versus time are generated using nanopore-based sensing instruments. The training dataset of time series contains three classes labeled with "no event’’ when no analytes are detected, "event A’’ when analytes of type A are detected, and "event B’’ when analytes of type B are detected in measurements. The unseen time series datasets are unlabeled but contain expected ratios of each class. The unlabeled time series is analyzed and classified into three classes using machine learning. The measurements are not time-dependent. Removing it results in a univariate time series which is further split into overlapping sequences using sliding windows. The data is not normalized, as this causes the classifiers to be biased on one class. The windows are trained and compared using four classifiers: fully connected neural networks, random forest, logistic regression, and long short-term memory. Logistic regression with a window size of 0.1 seconds and balanced weights has the most optimal results out of the four tested classifiers. The predictions for the three unlabeled datasets are 2,4:1, 0,8:1, and 0,5:1 for the expected ratios of 3:1, 3:1, and 1:1, respectively. Other classifiers require further experimentation with hyperparameter tuning to produce more satisfying results.
2021
Time Series Event Classification with Machine Learning
Time series measurements of analytes of current versus time are generated using nanopore-based sensing instruments. The training dataset of time series contains three classes labeled with "no event’’ when no analytes are detected, "event A’’ when analytes of type A are detected, and "event B’’ when analytes of type B are detected in measurements. The unseen time series datasets are unlabeled but contain expected ratios of each class. The unlabeled time series is analyzed and classified into three classes using machine learning. The measurements are not time-dependent. Removing it results in a univariate time series which is further split into overlapping sequences using sliding windows. The data is not normalized, as this causes the classifiers to be biased on one class. The windows are trained and compared using four classifiers: fully connected neural networks, random forest, logistic regression, and long short-term memory. Logistic regression with a window size of 0.1 seconds and balanced weights has the most optimal results out of the four tested classifiers. The predictions for the three unlabeled datasets are 2,4:1, 0,8:1, and 0,5:1 for the expected ratios of 3:1, 3:1, and 1:1, respectively. Other classifiers require further experimentation with hyperparameter tuning to produce more satisfying results.
Machine Learning
Time Series
Classification
File in questo prodotto:
File Dimensione Formato  
Alija_Vulnet.pdf

accesso riservato

Dimensione 2.67 MB
Formato Adobe PDF
2.67 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/42440