Time Series Event Classification with Machine Learning

Time series measurements of analytes of current versus time are generated using nanopore-based sensing instruments. The training dataset of time series contains three classes labeled with "no event’’ when no analytes are detected, "event A’’ when analytes of type A are detected, and "event B’’ when analytes of type B are detected in measurements. The unseen time series datasets are unlabeled but contain expected ratios of each class. The unlabeled time series is analyzed and classified into three classes using machine learning. The measurements are not time-dependent. Removing it results in a univariate time series which is further split into overlapping sequences using sliding windows. The data is not normalized, as this causes the classifiers to be biased on one class. The windows are trained and compared using four classifiers: fully connected neural networks, random forest, logistic regression, and long short-term memory. Logistic regression with a window size of 0.1 seconds and balanced weights has the most optimal results out of the four tested classifiers. The predictions for the three unlabeled datasets are 2,4:1, 0,8:1, and 0,5:1 for the expected ratios of 3:1, 3:1, and 1:1, respectively. Other classifiers require further experimentation with hyperparameter tuning to produce more satisfying results.

Time Series Event Classification with Machine Learning

ALIJA, VULNET

2021/2022

Abstract

Time series measurements of analytes of current versus time are generated using nanopore-based sensing instruments. The training dataset of time series contains three classes labeled with "no event’’ when no analytes are detected, "event A’’ when analytes of type A are detected, and "event B’’ when analytes of type B are detected in measurements. The unseen time series datasets are unlabeled but contain expected ratios of each class. The unlabeled time series is analyzed and classified into three classes using machine learning. The measurements are not time-dependent. Removing it results in a univariate time series which is further split into overlapping sequences using sliding windows. The data is not normalized, as this causes the classifiers to be biased on one class. The windows are trained and compared using four classifiers: fully connected neural networks, random forest, logistic regression, and long short-term memory. Logistic regression with a window size of 0.1 seconds and balanced weights has the most optimal results out of the four tested classifiers. The predictions for the three unlabeled datasets are 2,4:1, 0,8:1, and 0,5:1 for the expected ratios of 3:1, 3:1, and 1:1, respectively. Other classifiers require further experimentation with hyperparameter tuning to produce more satisfying results.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2021
			
	Titolo inglese
	
				Time Series Event Classification with Machine Learning
			
	Abstract in italiano
	
				Time series measurements of analytes of current versus time are generated using nanopore-based sensing instruments. The training dataset of time series contains three classes labeled with "no event’’ when no analytes are detected, "event A’’ when analytes of type A are detected, and "event B’’ when analytes of type B are detected in measurements. The unseen time series datasets are unlabeled but contain expected ratios of each class. The unlabeled time series is analyzed and classified into three classes using machine learning.
The measurements are not time-dependent. Removing it results in a univariate time series which is further split into overlapping sequences using sliding windows. The data is not normalized, as this causes the classifiers to be biased on one class. The windows are trained and compared using four classifiers: fully connected neural networks, random forest, logistic regression, and long short-term memory.
Logistic regression with a window size of 0.1 seconds and balanced weights has the most optimal results out of the four tested classifiers. The predictions for the three unlabeled datasets are 2,4:1, 0,8:1, and 0,5:1 for the expected ratios of 3:1, 3:1, and 1:1, respectively. Other classifiers require further experimentation with hyperparameter tuning to produce more satisfying results.
			
	Parola chiave
	
				Machine Learning
Time Series
Classification
			
	Relatore
	
				DI CAMILLO, BARBARA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Alija_Vulnet.pdf accesso riservato Dimensione 2.67 MB Formato Adobe PDF	2.67 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/42440