In the field of Natural Language Processing, word embeddings are fundamental tools to represent the semantic relations among words. These tools are built by training learning algorithms on large corpora of textual data, which often reflect different types of biases and cultural peculiarities inherited by the society itself. Since word embeddings are the state-of-the-art representations in NLP tasks, biases are likely to be carried over by Machine Learning algorithms, which may, in turn, reinforce them. The present work leverages sparse optimization techniques to find a transformation among word embeddings trained on different corpora, able to highlight different types of biases in the data. Moreover, this study attempts to analyze the transformed data in order to detect the presence of cultural differences, both known and unknown.
Bias Analysis in Word Embeddings with Alignment Techniques
DELLA CASA, ELENA
2021/2022
Abstract
In the field of Natural Language Processing, word embeddings are fundamental tools to represent the semantic relations among words. These tools are built by training learning algorithms on large corpora of textual data, which often reflect different types of biases and cultural peculiarities inherited by the society itself. Since word embeddings are the state-of-the-art representations in NLP tasks, biases are likely to be carried over by Machine Learning algorithms, which may, in turn, reinforce them. The present work leverages sparse optimization techniques to find a transformation among word embeddings trained on different corpora, able to highlight different types of biases in the data. Moreover, this study attempts to analyze the transformed data in order to detect the presence of cultural differences, both known and unknown.File | Dimensione | Formato | |
---|---|---|---|
DellaCasa_Elena (2).pdf
accesso aperto
Dimensione
1.78 MB
Formato
Adobe PDF
|
1.78 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/43380