Sviluppo e sperimentazione di un metodo per l'analisi del bias di genere in modelli del linguaggio BERT

In the field of artificial intelligence, Natural Language Processing (NLP) has stood out for its significant advancements. A key element of this evolution has been the introduction of the Transformer architecture, crucial for models such as BERT (Bidirectional Encoder Representations from Transformers) and advanced systems like GPT-3. These tools have highlighted the growing ability of machines to capture complex linguistic nuances. Through these models, NLP has achieved significant results in various applications. Despite these achievements, concerns arise about the presence of bias and the implications of such bias in interpreting predictions in real-world contexts. In specific areas of NLP, such as machine translation, discourse interpretation, text generation, and particularly sentiment analysis, the issue of bias becomes critically important. Gender differences, in fact, can lead to incorrect or misleading interpretations. During this study, I delved into analyzing gender bias in BERT models and its variants, focusing specifically on how they might impact predictions, especially in sentiment analysis. To test this hypothesis, I examined several BERT variants, including BERT Large and DistilBERT, using two specific datasets. The first, based on IMDB, includes both positive and negative movie reviews in their original form. The second dataset, however, takes each original review and modifies it to reflect entirely a male connotation and, in another version, a female connotation, including both modified versions in the dataset. This dataset maintains a balance between positive and negative reviews. The analysis revealed a clear correlation between the use of gender terms in texts and predictions obtained through sentiment analysis, a branch of NLP focused on identifying and classifying emotions expressed in a text. In particular, the presence of terms related to a specific gender within a review can skew BERT's predictions in tasks like sentiment analysis. It's notable to find that such biases do not stem from the specific fine-tuning phase for certain tasks, but are inherent in BERT's base model, acquired during its initial training phase. This discovery underscores the importance of addressing and reducing bias right from the preliminary stages of model training, aiming for more balanced and fair developments in the NLP domain.

Nel campo dell'intelligenza artificiale, il Natural Language Processing (NLP) si è distinto per i suoi notevoli progressi. Un elemento chiave di questa evoluzione è stata l'introduzione dell'architettura Transformer, fondamentale per modelli come BERT (Bidirectional Encoder Representations from Transformers) e sistemi avanzati come GPT-3. Questi strumenti hanno evidenziato la crescente capacità delle macchine di catturare sfumature linguistiche complesse. Attraverso tali modelli, il NLP ha raggiunto risultati significativi in svariate applicazioni. Nonostante queste realizzazioni, emergono questioni relative alla presenza di bias e alle implicazioni di tali bias nell'interpretazione delle predizioni in contesti reali. In particolari ambiti del NLP, come la traduzione automatica, l'interpretazione del discorso, la generazione di testo e, in particolare, la sentiment analysis, la problematica del bias assume un'importanza cruciale. Differenze di genere possono, infatti, portare a interpretazioni errate o fuorvianti. Nel corso di questo studio, ho approfondito l'analisi dei bias di genere presenti nei modelli BERT e nelle sue varianti, con un'attenzione specifica su come questi possano influire sulle predizioni, specialmente in ambito di sentiment analysis. Per verificare tale ipotesi, ho esaminato diverse varianti di BERT, incluse BERT Large e DistilBERT, utilizzando due dataset specifici. Il primo, basato su IMDB, comprende recensioni di film sia positive che negative in forma originale. Il secondo dataset, invece, prende ciascuna recensione originale e la modifica per riflettere interamente una connotazione maschile e, in un'altra versione, femminile, inserendo entrambe le versioni modificate nel dataset. Quest'ultimo mantiene un equilibrio tra recensioni positive e negative. L'analisi condotta ha messo in luce una chiara correlazione tra l'uso dei termini di genere nei testi e le previsioni ottenute attraverso la sentiment analysis, una branca del campo dell'elaborazione del linguaggio naturale (NLP) focalizzata sull'identificazione e classificazione delle emozioni espresse in un testo. In particolare, la presenza di termini legati a un genere specifico all'interno di una recensione può alterare le predizioni di BERT in attività come la sentiment analysis. È notevole rilevare che tali bias non derivano dalla fase di fine-tuning specifica per certi compiti, ma sono intrinseci al modello base di BERT, acquisiti durante la sua fase iniziale di addestramento. Questa scoperta sottolinea l'importanza di affrontare e ridurre i bias già nelle fasi preliminari di formazione del modello, puntando a sviluppi più equilibrati e giusti nel dominio del NLP.