Analisi dei foglietti illustrativi dei farmaci: topic modeling e previsione dei prezzi

In recent years, increasing attention has been paid to the analysis of textual data in the field of statistical research. In the first part of this paper, two of the most popular topic modeling approaches (Latent Dirichlet Allocation and Structural Topic Model) are applied to identify latent themes within a corpus of drug information leaflets for patients. Then, machine learning methods are used to predict drug prices, first employing word embeddings based on large language models (LLM) as covariates, and later using the most frequent words in the texts as covariates.

Negli ultimi anni, nel campo della ricerca statistica si è prestata sempre più attenzione all'analisi dei dati testuali. Nella prima parte del seguente elaborato si applicano due degli approcci più diffusi di topic modeling (la Latent Dirichlet Allocation e il Structural Topic Model) per identificare i temi latenti all'interno di un corpus di foglietti illustrativi di farmaci. Successivamente si utilizzano metodi di machine learning per la previsione dei prezzi dei farmaci, impiegando come covariate dapprima i word embeddings basati su modelli linguistici di grandi dimensioni (LLM) e in un secondo momento le parole più frequenti nei testi.

Analisi dei foglietti illustrativi dei farmaci: topic modeling e previsione dei prezzi

PACELLA, SILVIA

2023/2024

Abstract

In recent years, increasing attention has been paid to the analysis of textual data in the field of statistical research. In the first part of this paper, two of the most popular topic modeling approaches (Latent Dirichlet Allocation and Structural Topic Model) are applied to identify latent themes within a corpus of drug information leaflets for patients. Then, machine learning methods are used to predict drug prices, first employing word embeddings based on large language models (LLM) as covariates, and later using the most frequent words in the texts as covariates.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Scienze Statistiche
			
	Corso di studio
	
				SCIENZE STATISTICHE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Analysis of drug information leaflets for patients: topic modeling and price prediction
			
	Abstract in italiano
	
				Negli ultimi anni, nel campo della ricerca statistica si è prestata sempre più attenzione all'analisi dei dati testuali. Nella prima parte del seguente elaborato si applicano due degli approcci più diffusi di topic modeling (la Latent Dirichlet Allocation e il Structural Topic Model) per identificare i temi latenti all'interno di un corpus di foglietti illustrativi di farmaci. 
Successivamente si utilizzano metodi di machine learning per la previsione dei prezzi dei farmaci, impiegando come covariate dapprima i word embeddings basati su modelli linguistici di grandi dimensioni (LLM) e in un secondo momento le parole più frequenti nei testi.
			
	Parola chiave
	
				Text mining
Topic modeling
LLM
Machine learning
			
	Relatore
	
				SCIANDRA, ANDREA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Pacella_Silvia.pdf Accesso riservato Dimensione 640.25 kB Formato Adobe PDF	640.25 kB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/77762