The following paper will propose a study of the main experiments conducted on Word Embeddings, with a focus on those related to WE in the Italian language and a following analysis of the gender score in text documents. After an introduction on the history and current status of the various studies carried out on the topic, we moved on to an experimental part aimed at determining the gender score of a textual document. An initial analysis was carried out using WEs and text analysis techniques such as the removal of stopwords and punctuation characters, moving on to the study of more advanced models such as n-grams and sentence embeddings. An attempt was made to demonstrate how through the use of these natural language analysis techniques, it is possible to determine the gender score of a textual document thus providing a tool for those who write such texts to produce more gender neutral documents by reducing gender stereotypes and biases.
Il seguente elaborato proporne uno studio dei principali esperimenti condotti sui Word Embeddings, con un’attenzione particolare a quelli relativi ai WE nella lingua italiana e una seguente analisi del gender score nei documenti testuali. Dopo un’introduzione sulla storia e lo stato dell’altre dei vari studi effettuati sull’argomento, si è passati ad una parte di sperimentazione volta a determinare il gender score di un documento testuale. E’ stata effettuata una prima analisi utilizzando i WE e le tecniche di analisi testuale come la rimozione di stopwords e caratteri di interpunzione, passando poi allo studio di modelli più avanzati come gli n-grams e i sentence embeddings. Si è cercato di dimostrare come attraverso l’utilizzo di queste tecniche di analisi del linguaggio naturale, sia possibile determinare il gender score di un documento testuale fornendo così uno strumento a chi scrive tali testi che permetta di produrre documenti più gender neutral riducendo stereotipi e pregiudizi di genere.
Pregiudizi di genere nei Word Embeddings: verso un’analisi del gender score nei documenti testuali.
FRISO, LUCA
2021/2022
Abstract
The following paper will propose a study of the main experiments conducted on Word Embeddings, with a focus on those related to WE in the Italian language and a following analysis of the gender score in text documents. After an introduction on the history and current status of the various studies carried out on the topic, we moved on to an experimental part aimed at determining the gender score of a textual document. An initial analysis was carried out using WEs and text analysis techniques such as the removal of stopwords and punctuation characters, moving on to the study of more advanced models such as n-grams and sentence embeddings. An attempt was made to demonstrate how through the use of these natural language analysis techniques, it is possible to determine the gender score of a textual document thus providing a tool for those who write such texts to produce more gender neutral documents by reducing gender stereotypes and biases.File | Dimensione | Formato | |
---|---|---|---|
Friso_Luca.pdf
accesso aperto
Dimensione
3.04 MB
Formato
Adobe PDF
|
3.04 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/40248