Gender bias in artificial intelligence (AI), particularly in large language models (LLM) such as LLaMA 2, presents a significant challenge, as these models are trained on vast datasets that often encode and reflect societal biases and gender inequalities. This thesis explores how LLaMA 2 (Llama-2-7b-hf) and LLaMA 3 (Llama-3.2-1B) exhibit both explicit and implicit gender bias in occupational predictions using a prompt-based probability evaluation framework. The models were probed using occupation-based templates and responses were evaluated based on gendered pronoun probabilities. Explicit bias is measured through direct responses to gender-neutral prompts, while implicit bias is assessed through the distribution of gendered pronouns in more conversational contexts. The study uniquely incorporates both English and Italian datasets, with the latter providing additional insights due to the gendered nature of the language. To address the gender imbalances, zero-shot debiasing was applied via instructional prompts, achieving significantly reduced explicit bias and moderately improving diverse gender representation. This work demonstrates how lightweight, language-aware prompt engineering can serve as an effective and reproducible strategy for bias assessment and mitigation in multilingual LLMs, contributing to the development of fairer AI systems.
Gender bias in artificial intelligence (AI), particularly in large language models (LLM) such as LLaMA 2, presents a significant challenge, as these models are trained on vast datasets that often encode and reflect societal biases and gender inequalities. This thesis explores how LLaMA 2 (Llama-2-7b-hf) and LLaMA 3 (Llama-3.2-1B) exhibit both explicit and implicit gender bias in occupational predictions using a prompt-based probability evaluation framework. The models were probed using occupation-based templates and responses were evaluated based on gendered pronoun probabilities. Explicit bias is measured through direct responses to gender-neutral prompts, while implicit bias is assessed through the distribution of gendered pronouns in more conversational contexts. The study uniquely incorporates both English and Italian datasets, with the latter providing additional insights due to the gendered nature of the language. To address the gender imbalances, zero-shot debiasing was applied via instructional prompts, achieving significantly reduced explicit bias and moderately improving diverse gender representation. This work demonstrates how lightweight, language-aware prompt engineering can serve as an effective and reproducible strategy for bias assessment and mitigation in multilingual LLMs, contributing to the development of fairer AI systems.
Gender Bias in AI: Measuring and Debiasing Occupational Gender Representations in Large Language Models
ANTONY SAHAYAM, ANNE LINDA
2024/2025
Abstract
Gender bias in artificial intelligence (AI), particularly in large language models (LLM) such as LLaMA 2, presents a significant challenge, as these models are trained on vast datasets that often encode and reflect societal biases and gender inequalities. This thesis explores how LLaMA 2 (Llama-2-7b-hf) and LLaMA 3 (Llama-3.2-1B) exhibit both explicit and implicit gender bias in occupational predictions using a prompt-based probability evaluation framework. The models were probed using occupation-based templates and responses were evaluated based on gendered pronoun probabilities. Explicit bias is measured through direct responses to gender-neutral prompts, while implicit bias is assessed through the distribution of gendered pronouns in more conversational contexts. The study uniquely incorporates both English and Italian datasets, with the latter providing additional insights due to the gendered nature of the language. To address the gender imbalances, zero-shot debiasing was applied via instructional prompts, achieving significantly reduced explicit bias and moderately improving diverse gender representation. This work demonstrates how lightweight, language-aware prompt engineering can serve as an effective and reproducible strategy for bias assessment and mitigation in multilingual LLMs, contributing to the development of fairer AI systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
AntonySahayam_AnneLinda.pdf
accesso aperto
Dimensione
6.67 MB
Formato
Adobe PDF
|
6.67 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/87664