Gender bias in artificial intelligence (AI), particularly in large language models (LLM) such as LLaMA 2, presents a significant challenge, as these models are trained on vast datasets that often encode and reflect societal biases and gender inequalities. This thesis explores how LLaMA 2 (Llama-2-7b-hf) and LLaMA 3 (Llama-3.2-1B) exhibit both explicit and implicit gender bias in occupational predictions using a prompt-based probability evaluation framework. The models were probed using occupation-based templates and responses were evaluated based on gendered pronoun probabilities. Explicit bias is measured through direct responses to gender-neutral prompts, while implicit bias is assessed through the distribution of gendered pronouns in more conversational contexts. The study uniquely incorporates both English and Italian datasets, with the latter providing additional insights due to the gendered nature of the language. To address the gender imbalances, zero-shot debiasing was applied via instructional prompts, achieving significantly reduced explicit bias and moderately improving diverse gender representation. This work demonstrates how lightweight, language-aware prompt engineering can serve as an effective and reproducible strategy for bias assessment and mitigation in multilingual LLMs, contributing to the development of fairer AI systems.

Gender bias in artificial intelligence (AI), particularly in large language models (LLM) such as LLaMA 2, presents a significant challenge, as these models are trained on vast datasets that often encode and reflect societal biases and gender inequalities. This thesis explores how LLaMA 2 (Llama-2-7b-hf) and LLaMA 3 (Llama-3.2-1B) exhibit both explicit and implicit gender bias in occupational predictions using a prompt-based probability evaluation framework. The models were probed using occupation-based templates and responses were evaluated based on gendered pronoun probabilities. Explicit bias is measured through direct responses to gender-neutral prompts, while implicit bias is assessed through the distribution of gendered pronouns in more conversational contexts. The study uniquely incorporates both English and Italian datasets, with the latter providing additional insights due to the gendered nature of the language. To address the gender imbalances, zero-shot debiasing was applied via instructional prompts, achieving significantly reduced explicit bias and moderately improving diverse gender representation. This work demonstrates how lightweight, language-aware prompt engineering can serve as an effective and reproducible strategy for bias assessment and mitigation in multilingual LLMs, contributing to the development of fairer AI systems.

Gender Bias in AI: Measuring and Debiasing Occupational Gender Representations in Large Language Models

ANTONY SAHAYAM, ANNE LINDA
2024/2025

Abstract

Gender bias in artificial intelligence (AI), particularly in large language models (LLM) such as LLaMA 2, presents a significant challenge, as these models are trained on vast datasets that often encode and reflect societal biases and gender inequalities. This thesis explores how LLaMA 2 (Llama-2-7b-hf) and LLaMA 3 (Llama-3.2-1B) exhibit both explicit and implicit gender bias in occupational predictions using a prompt-based probability evaluation framework. The models were probed using occupation-based templates and responses were evaluated based on gendered pronoun probabilities. Explicit bias is measured through direct responses to gender-neutral prompts, while implicit bias is assessed through the distribution of gendered pronouns in more conversational contexts. The study uniquely incorporates both English and Italian datasets, with the latter providing additional insights due to the gendered nature of the language. To address the gender imbalances, zero-shot debiasing was applied via instructional prompts, achieving significantly reduced explicit bias and moderately improving diverse gender representation. This work demonstrates how lightweight, language-aware prompt engineering can serve as an effective and reproducible strategy for bias assessment and mitigation in multilingual LLMs, contributing to the development of fairer AI systems.
2024
Gender Bias in AI: Measuring and Debiasing Occupational Gender Representations in Large Language Models
Gender bias in artificial intelligence (AI), particularly in large language models (LLM) such as LLaMA 2, presents a significant challenge, as these models are trained on vast datasets that often encode and reflect societal biases and gender inequalities. This thesis explores how LLaMA 2 (Llama-2-7b-hf) and LLaMA 3 (Llama-3.2-1B) exhibit both explicit and implicit gender bias in occupational predictions using a prompt-based probability evaluation framework. The models were probed using occupation-based templates and responses were evaluated based on gendered pronoun probabilities. Explicit bias is measured through direct responses to gender-neutral prompts, while implicit bias is assessed through the distribution of gendered pronouns in more conversational contexts. The study uniquely incorporates both English and Italian datasets, with the latter providing additional insights due to the gendered nature of the language. To address the gender imbalances, zero-shot debiasing was applied via instructional prompts, achieving significantly reduced explicit bias and moderately improving diverse gender representation. This work demonstrates how lightweight, language-aware prompt engineering can serve as an effective and reproducible strategy for bias assessment and mitigation in multilingual LLMs, contributing to the development of fairer AI systems.
Gender Bias
Large Language Model
AI Stereotypes
Bias Mitigation
Ethical AI
File in questo prodotto:
File Dimensione Formato  
AntonySahayam_AnneLinda.pdf

accesso aperto

Dimensione 6.67 MB
Formato Adobe PDF
6.67 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/87664