Scalable Content Analysis of Social Media Videos Using Multimodal Large Language Models: A Video-to-Text Pipeline for Large-Scale Analysis

The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.

Scalable Content Analysis of Social Media Videos Using Multimodal Large Language Models: A Video-to-Text Pipeline for Large-Scale Analysis

GORNI SILVESTRINI, MATTEO

2025/2026

Abstract

The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content. A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE  Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				Scalable Content Analysis of Social Media Videos Using Multimodal Large Language Models: A Video-to-Text Pipeline for Large-Scale Analysis
			
	Abstract in italiano
	
				The analysis of large-scale video content remains a significant challenge in social science research due to the high cost and complexity of manual annotation. This thesis investigates the use of Multimodal Large Language Models (MLLMs) as a scalable solution for the analysis of video data, with a specific focus on the study of sexualization in social media content.

A dataset of TikTok videos from Italy, the United States, and South Korea was constructed and analyzed using a structured codebook derived from prior literature on sexual objectification. A multimodal model was employed to generate both content coding annotations and textual descriptions of video content, enabling a unified video-to-text analytical pipeline. 

Results indicate that MLLMs can support large-scale analysis of video content, capturing consistent patterns aligned with theoretical expectations. In particular, systematic differences in sexualization were observed across gender, while cross-national variations were also identified. Complementary analyses of the generated textual descriptions provide additional evidence that the extracted signal reflects meaningful characteristics of the underlying content.
			
	Parola chiave
	
				large language model
social media
large scale analysis
			
	Relatore
	
				ERSEGHE, TOMASO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
dissertation.pdf accesso aperto Dimensione 3.45 MB Formato Adobe PDF Visualizza/Apri	3.45 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108228