In the field of Natural Language Processing, word embeddings are fundamental tools to represent the semantic relations among words. These tools are built by training learning algorithms on large corpora of textual data, which often reflect different types of biases and cultural peculiarities inherited by the society itself. Since word embeddings are the state-of-the-art representations in NLP tasks, biases are likely to be carried over by Machine Learning algorithms, which may, in turn, reinforce them. The present work leverages sparse optimization techniques to find a transformation among word embeddings trained on different corpora, able to highlight different types of biases in the data. Moreover, this study attempts to analyze the transformed data in order to detect the presence of cultural differences, both known and unknown.

Bias Analysis in Word Embeddings with Alignment Techniques

DELLA CASA, ELENA
2021/2022

Abstract

In the field of Natural Language Processing, word embeddings are fundamental tools to represent the semantic relations among words. These tools are built by training learning algorithms on large corpora of textual data, which often reflect different types of biases and cultural peculiarities inherited by the society itself. Since word embeddings are the state-of-the-art representations in NLP tasks, biases are likely to be carried over by Machine Learning algorithms, which may, in turn, reinforce them. The present work leverages sparse optimization techniques to find a transformation among word embeddings trained on different corpora, able to highlight different types of biases in the data. Moreover, this study attempts to analyze the transformed data in order to detect the presence of cultural differences, both known and unknown.
2021
Bias Analysis in Word Embeddings with Alignment Techniques
Word embedding
Bias
Sparse Optimization
NLP
File in questo prodotto:
File Dimensione Formato  
DellaCasa_Elena (2).pdf

accesso aperto

Dimensione 1.78 MB
Formato Adobe PDF
1.78 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/43380