The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.
The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.
Scalable Content Analysis of Social Media Videos Using Multimodal Large Language Models: A Video-to-Text Pipeline for Large-Scale Analysis
GORNI SILVESTRINI, MATTEO
2025/2026
Abstract
The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.| File | Dimensione | Formato | |
|---|---|---|---|
|
dissertation.pdf
accesso aperto
Dimensione
3.45 MB
Formato
Adobe PDF
|
3.45 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/108228