Over the past few years, Large Language Models (LLMs) have developed quickly and become more and more part of professional and decision-support systems. Although the models exhibit highly sophisticated language and cognition, the issue of algorithmic impartiality and gender discrimination are at the center-stage. As they train on large volumes of textual information that are a mirror of the historical and societal trends, they can replicate and intensify the already existing stereotypes especially in the workplace. In this thesis, a replication study of a seminal work that tested whether advanced LLMs align male personas with agentic virtue (achievement, leadership, and power) and female personas with communal virtue (support, empathy, and nurturing) in simulated job interview settings was provided on technical replication. This research aims to review the similar tendencies in the recent DeepSeek model and to further analyze the similar trends in a bilingual environment, which involves both English and Italian. The experimental design is the large-scale factorial, which produces the model responses in various professional roles and gendered personas. Empath is an open-source lexical categorization model, which is used to measure agentic and communal linguistic markers by analyzing the outputs. Statistical analysis is performed to test the existence and the significance of gendered linguistic asymmetries which are inter-linguistic. Through the systematic analysis of DeepSeek, this research will offer quantitative data on whether modern LLM alleviates, recreates, or enhances gendered linguistic stereotypes, which will help create more transparent and responsible AI systems.
Over the past few years, Large Language Models (LLMs) have developed quickly and become more and more part of professional and decision-support systems. Although the models exhibit highly sophisticated language and cognition, the issue of algorithmic impartiality and gender discrimination are at the center-stage. As they train on large volumes of textual information that are a mirror of the historical and societal trends, they can replicate and intensify the already existing stereotypes especially in the workplace. In this thesis, a replication study of a seminal work that tested whether advanced LLMs align male personas with agentic virtue (achievement, leadership, and power) and female personas with communal virtue (support, empathy, and nurturing) in simulated job interview settings was provided on technical replication. This research aims to review the similar tendencies in the recent DeepSeek model and to further analyze the similar trends in a bilingual environment, which involves both English and Italian. The experimental design is the large-scale factorial, which produces the model responses in various professional roles and gendered personas. Empath is an open-source lexical categorization model, which is used to measure agentic and communal linguistic markers by analyzing the outputs. Statistical analysis is performed to test the existence and the significance of gendered linguistic asymmetries which are inter-linguistic. Through the systematic analysis of DeepSeek, this research will offer quantitative data on whether modern LLM alleviates, recreates, or enhances gendered linguistic stereotypes, which will help create more transparent and responsible AI systems.
Comparative analysis of gender bias in interview responses generated by Large Language Models in English and Italian
MEMON, HUFSA
2025/2026
Abstract
Over the past few years, Large Language Models (LLMs) have developed quickly and become more and more part of professional and decision-support systems. Although the models exhibit highly sophisticated language and cognition, the issue of algorithmic impartiality and gender discrimination are at the center-stage. As they train on large volumes of textual information that are a mirror of the historical and societal trends, they can replicate and intensify the already existing stereotypes especially in the workplace. In this thesis, a replication study of a seminal work that tested whether advanced LLMs align male personas with agentic virtue (achievement, leadership, and power) and female personas with communal virtue (support, empathy, and nurturing) in simulated job interview settings was provided on technical replication. This research aims to review the similar tendencies in the recent DeepSeek model and to further analyze the similar trends in a bilingual environment, which involves both English and Italian. The experimental design is the large-scale factorial, which produces the model responses in various professional roles and gendered personas. Empath is an open-source lexical categorization model, which is used to measure agentic and communal linguistic markers by analyzing the outputs. Statistical analysis is performed to test the existence and the significance of gendered linguistic asymmetries which are inter-linguistic. Through the systematic analysis of DeepSeek, this research will offer quantitative data on whether modern LLM alleviates, recreates, or enhances gendered linguistic stereotypes, which will help create more transparent and responsible AI systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
Memon_Hufsa.pdf
accesso aperto
Dimensione
2.08 MB
Formato
Adobe PDF
|
2.08 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/106858