In recent years, increasing attention has been paid to the analysis of textual data in the field of statistical research. In the first part of this paper, two of the most popular topic modeling approaches (Latent Dirichlet Allocation and Structural Topic Model) are applied to identify latent themes within a corpus of drug information leaflets for patients. Then, machine learning methods are used to predict drug prices, first employing word embeddings based on large language models (LLM) as covariates, and later using the most frequent words in the texts as covariates.
Negli ultimi anni, nel campo della ricerca statistica si è prestata sempre più attenzione all'analisi dei dati testuali. Nella prima parte del seguente elaborato si applicano due degli approcci più diffusi di topic modeling (la Latent Dirichlet Allocation e il Structural Topic Model) per identificare i temi latenti all'interno di un corpus di foglietti illustrativi di farmaci. Successivamente si utilizzano metodi di machine learning per la previsione dei prezzi dei farmaci, impiegando come covariate dapprima i word embeddings basati su modelli linguistici di grandi dimensioni (LLM) e in un secondo momento le parole più frequenti nei testi.
Analisi dei foglietti illustrativi dei farmaci: topic modeling e previsione dei prezzi
PACELLA, SILVIA
2023/2024
Abstract
In recent years, increasing attention has been paid to the analysis of textual data in the field of statistical research. In the first part of this paper, two of the most popular topic modeling approaches (Latent Dirichlet Allocation and Structural Topic Model) are applied to identify latent themes within a corpus of drug information leaflets for patients. Then, machine learning methods are used to predict drug prices, first employing word embeddings based on large language models (LLM) as covariates, and later using the most frequent words in the texts as covariates.File | Dimensione | Formato | |
---|---|---|---|
Pacella_Silvia.pdf
accesso riservato
Dimensione
640.25 kB
Formato
Adobe PDF
|
640.25 kB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/77762