The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.

The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.

Scalable Content Analysis of Social Media Videos Using Multimodal Large Language Models: A Video-to-Text Pipeline for Large-Scale Analysis

GORNI SILVESTRINI, MATTEO
2025/2026

Abstract

The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.
2025
Scalable Content Analysis of Social Media Videos Using Multimodal Large Language Models: A Video-to-Text Pipeline for Large-Scale Analysis
The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.
large language model
social media
large scale analysis
File in questo prodotto:
File Dimensione Formato  
dissertation.pdf

accesso aperto

Dimensione 3.45 MB
Formato Adobe PDF
3.45 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108228