Unsupervised Anomaly Detection for Industry Cybersecurity Operations

In this thesis, we analyze an industry network traffic dataset containing hundreds of sensitive services related to the infrastructure of an oil and gas company. The main objective is to detect possible network and cybersecurity operations events resulting from behavioural changes in unlabelled data. Indeed, given the real-world nature of the studied dataset, no labels are found in the data, and we work in an unsupervised learning framework. We implement an automatic detection system for server and user traffic behavioural changes. We proactively detect long-term, subtle events with an observation window spanning a complete user work shift of eight hours. We start our research by grounding our intuition in server data's relatively less complex context. In particular, we perform clustering on some feature-space and try to characterize the server data with a Hidden Markov Model. Then, after exploring the difficulty of automatically learning a discrete Markov chain representation for the user data, we resort to field-expert estimations of state thresholds. There, we analyze the case of independent univariate state representations for each metric under observation and the case of a single multivariate state representation. While the first approach allows for the detection of uncharacteristic path probabilities for each metric independently, the second, multivariate, approach considers all metrics simultaneously such that not only unlikely state transitions may be detected, but also the presence of rare multivariate states. Finally, the system provides a ranking of user IP addresses behavioural change scores, allowing network administrators to plan their work capacity more efficiently.

Unsupervised Anomaly Detection for Industry Cybersecurity Operations

LEON CASTELL, ALEJANDRO

2022/2023

Abstract

In this thesis, we analyze an industry network traffic dataset containing hundreds of sensitive services related to the infrastructure of an oil and gas company. The main objective is to detect possible network and cybersecurity operations events resulting from behavioural changes in unlabelled data. Indeed, given the real-world nature of the studied dataset, no labels are found in the data, and we work in an unsupervised learning framework. We implement an automatic detection system for server and user traffic behavioural changes. We proactively detect long-term, subtle events with an observation window spanning a complete user work shift of eight hours. We start our research by grounding our intuition in server data's relatively less complex context. In particular, we perform clustering on some feature-space and try to characterize the server data with a Hidden Markov Model. Then, after exploring the difficulty of automatically learning a discrete Markov chain representation for the user data, we resort to field-expert estimations of state thresholds. There, we analyze the case of independent univariate state representations for each metric under observation and the case of a single multivariate state representation. While the first approach allows for the detection of uncharacteristic path probabilities for each metric independently, the second, multivariate, approach considers all metrics simultaneously such that not only unlikely state transitions may be detected, but also the presence of rare multivariate states. Finally, the system provides a ranking of user IP addresses behavioural change scores, allowing network administrators to plan their work capacity more efficiently.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Fisica e Astronomia "Galileo Galilei" - DFA
			
	Corso di studio
	
				PHYSICS OF DATA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Unsupervised Anomaly Detection for Industry Cybersecurity Operations
			
	Abstract in italiano
	
				In this thesis, we analyze an industry network traffic dataset containing hundreds of sensitive services related to the infrastructure of an oil and gas company. The main objective is to detect possible network and cybersecurity operations events resulting from behavioural changes in unlabelled data. Indeed, given the real-world nature of the studied dataset, no labels are found in the data, and we work in an unsupervised learning framework. We implement an automatic detection system for server and user traffic behavioural changes. We proactively detect long-term, subtle events with an observation window spanning a complete user work shift of eight hours. We start our research by grounding our intuition in server data's relatively less complex context. In particular, we perform clustering on some feature-space and try to characterize the server data with a Hidden Markov Model. Then, after exploring the difficulty of automatically learning a discrete Markov chain representation for the user data, we resort to field-expert estimations of state thresholds. There, we analyze the case of independent univariate state representations for each metric under observation and the case of a single multivariate state representation. While the first approach allows for the detection of uncharacteristic path probabilities for each metric independently, the second, multivariate, approach considers all metrics simultaneously such that not only unlikely state transitions may be detected, but also the presence of rare multivariate states. Finally, the system provides a ranking of user IP addresses behavioural change scores, allowing network administrators to plan their work capacity more efficiently.
			
	Parola chiave
	
				DataAcquisition
MachineLearning
DataAnalysis
			
	Relatore
	
				SODERI, SIMONE
			
	Correlatore
	
				CONTI, MAURO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Leon_Alejandro.pdf Accesso riservato Dimensione 3.64 MB Formato Adobe PDF	3.64 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/54843