Ransomware detection with Machine Learning algorithms using file segments and statistics

The growing danger posed by ransomware has been a significant concern for both the public and private sectors. The emergence of new strains of this malware has outpaced the development of effective defense mechanisms. Despite the numerous proposed frameworks that employ static and dynamic analysis, these approaches frequently prove ineffective in the face of advanced obfuscation and evasion techniques. One common characteristic among different ransomware strains is the need to encrypt the filesystem data at some point. The bytes' distribution of encrypted files seems random, while in normal files it tends to be more structured. By measuring such unpredictability through statistical tools, it is possible to leverage this characteristic and distinguish between encrypted and normal files. One of the metrics used to perform this task is the Shannon Entropy. Researchers tend to compute the Entropy of the bytes' distribution using the entire file, which is not precise, slow, and requires a lot of resources. To overcome these limits, Davies et al. proposed the use of only a fixed segment at the start of the file, called the header. Given the promising results of their ransomware classification method, it seems that computing the files' header Entropy provides relevant information to successfully deploy a working defense mechanism. However, computing the Entropy of a bytes sequence, whether for the entire file or only the header, is prone to Entropy neutralization techniques. Such attacks aim to reduce the Entropy of the encrypted file by encoding it in a different format, for example, Base64. Various works have explored sophisticated neutralization strategies, and over the years it has become clear that if a defense mechanism uses some form of Entropy values, its performance needs to be tested against such techniques. Among past works, only two Entropy-based ransomware detection methods proposed by Lee et al. and Venturini et al. included such verification in their proposals, leaving all the others potentially vulnerable. By collecting small fixed-length segments of the files, this thesis proposes a lightweight, fast, and reliable ransomware detection method. The proposed defense mechanism uses only small portions of the files, from which Entropy or the Differential Areas (between real and ideal file's Entropy graphs) are computed and provided to a machine learning algorithm. The use of such features allows for effectively distinguishing between ransomware-encrypted files and legitimate files, and requires very few system resources. To strengthen the feature extraction process and make it more resistant to Entropy tampering, three additional random file segment selection strategies were implemented. Unlike past works, each feature, machine learning algorithm, and feature extraction strategy were tested against different Entropy neutralization techniques to highlight which combination is the most resilient against such attacks. To do so, the ransomware headers were tampered with to lower their Entropy, and the models were tested once more. This allows the development of an Entropy-based ransomware detection method capable of adapting to both known ransomware strains and future ransomware designed to neutralize their header Entropy values.

Ransomware detection with Machine Learning algorithms using file segments and statistics

MIOTTO, EMANUELE

2023/2024

Abstract

The growing danger posed by ransomware has been a significant concern for both the public and private sectors. The emergence of new strains of this malware has outpaced the development of effective defense mechanisms. Despite the numerous proposed frameworks that employ static and dynamic analysis, these approaches frequently prove ineffective in the face of advanced obfuscation and evasion techniques. One common characteristic among different ransomware strains is the need to encrypt the filesystem data at some point. The bytes' distribution of encrypted files seems random, while in normal files it tends to be more structured. By measuring such unpredictability through statistical tools, it is possible to leverage this characteristic and distinguish between encrypted and normal files. One of the metrics used to perform this task is the Shannon Entropy. Researchers tend to compute the Entropy of the bytes' distribution using the entire file, which is not precise, slow, and requires a lot of resources. To overcome these limits, Davies et al. proposed the use of only a fixed segment at the start of the file, called the header. Given the promising results of their ransomware classification method, it seems that computing the files' header Entropy provides relevant information to successfully deploy a working defense mechanism. However, computing the Entropy of a bytes sequence, whether for the entire file or only the header, is prone to Entropy neutralization techniques. Such attacks aim to reduce the Entropy of the encrypted file by encoding it in a different format, for example, Base64. Various works have explored sophisticated neutralization strategies, and over the years it has become clear that if a defense mechanism uses some form of Entropy values, its performance needs to be tested against such techniques. Among past works, only two Entropy-based ransomware detection methods proposed by Lee et al. and Venturini et al. included such verification in their proposals, leaving all the others potentially vulnerable. By collecting small fixed-length segments of the files, this thesis proposes a lightweight, fast, and reliable ransomware detection method. The proposed defense mechanism uses only small portions of the files, from which Entropy or the Differential Areas (between real and ideal file's Entropy graphs) are computed and provided to a machine learning algorithm. The use of such features allows for effectively distinguishing between ransomware-encrypted files and legitimate files, and requires very few system resources. To strengthen the feature extraction process and make it more resistant to Entropy tampering, three additional random file segment selection strategies were implemented. Unlike past works, each feature, machine learning algorithm, and feature extraction strategy were tested against different Entropy neutralization techniques to highlight which combination is the most resilient against such attacks. To do so, the ransomware headers were tampered with to lower their Entropy, and the models were tested once more. This allows the development of an Entropy-based ransomware detection method capable of adapting to both known ransomware strains and future ransomware designed to neutralize their header Entropy values.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				CYBERSECURITY Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Ransomware detection with Machine Learning algorithms using file segments and statistics
			
	Abstract in italiano
	
				The growing danger posed by ransomware has been a significant concern for both the public and private sectors. The emergence of new strains of this malware has outpaced the development of effective defense mechanisms. Despite the numerous proposed frameworks that employ static and dynamic analysis, these approaches frequently prove ineffective in the face of advanced obfuscation and evasion techniques.

One common characteristic among different ransomware strains is the need to encrypt the filesystem data at some point. The bytes' distribution of encrypted files seems random, while in normal files it tends to be more structured. By measuring such unpredictability through statistical tools, it is possible to leverage this characteristic and distinguish between encrypted and normal files. One of the metrics used to perform this task is the Shannon Entropy.

Researchers tend to compute the Entropy of the bytes' distribution using the entire file, which is not precise, slow, and requires a lot of resources. To overcome these limits, Davies et al. proposed the use of only a fixed segment at the start of the file, called the header. Given the promising results of their ransomware classification method, it seems that computing the files' header Entropy provides relevant information to successfully deploy a working defense mechanism.

However, computing the Entropy of a bytes sequence, whether for the entire file or only the header, is prone to Entropy neutralization techniques. Such attacks aim to reduce the Entropy of the encrypted file by encoding it in a different format, for example, Base64. Various works have explored sophisticated neutralization strategies, and over the years it has become clear that if a defense mechanism uses some form of Entropy values, its performance needs to be tested against such techniques. Among past works, only two Entropy-based ransomware detection methods proposed by Lee et al. and Venturini et al. included such verification in their proposals, leaving all the others potentially vulnerable.

By collecting small fixed-length segments of the files, this thesis proposes a lightweight, fast, and reliable ransomware detection method. The proposed defense mechanism uses only small portions of the files, from which Entropy or the Differential Areas (between real and ideal file's Entropy graphs) are computed and provided to a machine learning algorithm. The use of such features allows for effectively distinguishing between ransomware-encrypted files and legitimate files, and requires very few system resources. To strengthen the feature extraction process and make it more resistant to Entropy tampering, three additional random file segment selection strategies were implemented.

Unlike past works, each feature, machine learning algorithm, and feature extraction strategy were tested against different Entropy neutralization techniques to highlight which combination is the most resilient against such attacks. To do so, the ransomware headers were tampered with to lower their Entropy, and the models were tested once more. This allows the development of an Entropy-based ransomware detection method capable of adapting to both known ransomware strains and future ransomware designed to neutralize their header Entropy values.
			
	Parola chiave
	
				ransomware
machine learning
file segments
statistics
			
	Relatore
	
				CONTI, MAURO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Emanuele_Miotto_Cybersecurity_MsC_Thesis.pdf accesso aperto Dimensione 8.83 MB Formato Adobe PDF Visualizza/Apri	8.83 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/68416