Enhanced Topic Modeling for Textual Data

In this thesis, we present an innovative approach for topic modeling and text classification using a combination of Non-Negative Matrix Factorization (NMF), Variational Autoencoder (VAE), and Bidirectional Long Short-Term Memory (Bi-LSTM) models. Our approach leverages CountVectorizer and bigrams to preprocess the text data, capturing word frequencies and co-occurrence patterns. NMF is applied to extract latent topics, while VAE reduces dimensionality and learns meaningful representations. The Bi-LSTM model is employed for sequential pattern learning and accurate classification. Through extensive experiments and evaluations, we demonstrate the effectiveness of our approach in capturing topics and achieving high classification accuracy. This research contributes to the field of text analysis by offering an advanced methodology for uncovering insights from textual data.

Enhanced Topic Modeling for Textual Data

JAVIDFAR, MASOUD

2022/2023

Abstract

In this thesis, we present an innovative approach for topic modeling and text classification using a combination of Non-Negative Matrix Factorization (NMF), Variational Autoencoder (VAE), and Bidirectional Long Short-Term Memory (Bi-LSTM) models. Our approach leverages CountVectorizer and bigrams to preprocess the text data, capturing word frequencies and co-occurrence patterns. NMF is applied to extract latent topics, while VAE reduces dimensionality and learns meaningful representations. The Bi-LSTM model is employed for sequential pattern learning and accurate classification. Through extensive experiments and evaluations, we demonstrate the effectiveness of our approach in capturing topics and achieving high classification accuracy. This research contributes to the field of text analysis by offering an advanced methodology for uncovering insights from textual data.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
			Dipartimento di Ingegneria dell'Informazione - DEI
		
	Corso di studio
	
			ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
		
	Anno Accademico
	
			2022
		
	Titolo inglese
	
			Enhanced Topic Modeling for Textual Data
Supervisor: Professor Tomaso Erseghe
tomaso.erseghe@unipd.it
		
	Abstract in italiano
	
			In this thesis, we present an innovative approach for topic modeling and text classification using a combination of Non-Negative Matrix Factorization (NMF), Variational Autoencoder (VAE), and Bidirectional Long Short-Term Memory (Bi-LSTM) models. Our approach leverages CountVectorizer and bigrams to preprocess the text data, capturing word frequencies and co-occurrence patterns. NMF is applied to extract latent topics, while VAE reduces dimensionality and learns meaningful representations. The Bi-LSTM model is employed for sequential pattern learning and accurate classification. Through extensive experiments and evaluations, we demonstrate the effectiveness of our approach in capturing topics and achieving high classification accuracy. This research contributes to the field of text analysis by offering an advanced methodology for uncovering insights from textual data.
		
	Parola chiave
	
			NMF
VAE
Bi-LSTM
NNDL
		
	Relatore
	
			ERSEGHE, TOMASO
		
	Appare nelle tipologie:
	
			Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
MsC_Thesis_Report__final-.pdf accesso aperto Dimensione 1.05 MB Formato Adobe PDF Visualizza/Apri	1.05 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/58767