Rhetorical Analysis of Memes to Detect Persuasion Techniques

In the digital age, memes have evolved from humorous internet content into powerful instruments of persuasion, capable of shaping public opinion and reinforcing political ideologies. This thesis addresses the computational challenge of automatically detecting persuasion techniques in memes, focusing on the design and optimization of multimodal fusion architectures under realistic computational constraints. Building upon the framework established in SemEval-2024 Task 4 by Dimitrov et al., this research investigates how different multimodal fusion strategies perform when operating within typical academic resource limitations. The work systematically compares early fusion appro\-aches with late fusion architectures enhanced by sophisticated attention mechanisms, revealing crucial insights about the practical deployment of multimodal systems. The thesis is structured around two main research directions. First, comprehensive replication and analysis of state-of-the-art methods from recent competitions, including ensemble approaches with data augmentation and hierarchical prompt tuning strategies. Second, development of novel multimodal architectures featuring Gated Attention Fusion Networks (GAFN) combined with parameter-efficient Low-Rank Adaptation (LoRA) of large language models and Technique-Aware Negative Sampling useful for contrastive learning during pretraining of multimodal architectures. A key contribution of this work is the discovery that a TinyLLaMA-based late fusion approach (1.1B parameters) with LoRA adaptation can achieve superior performance (F1: 0.704) compared to larger early fusion models, when operating under realistic computational constraints, as well as a promising technique-aware cross-modal alignment strategy - a sampling strategy for contrastive learning with early-fusion models - proposed to be useful under fewer computational constraints, which improved the overall performance of the model despite the existing environment limitations. The winning architecture incorporates several innovative components: in-domain pretraining on augmented text datasets, a two-stage gated attention mechanism, and a 5-layer classification head. Importantly, this thesis provides empirical evidence that batch size limitations, memory constraints, and training time restrictions create optimization challenges that favor simpler, more modular architectures over theoretically superior but computationally demanding alternatives. This insight has significant implications for practical multimodal system deployment and challenges the trend toward ever-increasing architectural complexity. In sum, this thesis offers a comprehensive investigation into constraint-aware multimodal architecture design for persuasion technique detection, providing both methodological innovations and practical insights for the broader multimodal learning community.

Rhetorical Analysis of Memes to Detect Persuasion Techniques

TOLEGEN, AKERKE

2024/2025

Abstract

In the digital age, memes have evolved from humorous internet content into powerful instruments of persuasion, capable of shaping public opinion and reinforcing political ideologies. This thesis addresses the computational challenge of automatically detecting persuasion techniques in memes, focusing on the design and optimization of multimodal fusion architectures under realistic computational constraints. Building upon the framework established in SemEval-2024 Task 4 by Dimitrov et al., this research investigates how different multimodal fusion strategies perform when operating within typical academic resource limitations. The work systematically compares early fusion appro\-aches with late fusion architectures enhanced by sophisticated attention mechanisms, revealing crucial insights about the practical deployment of multimodal systems. The thesis is structured around two main research directions. First, comprehensive replication and analysis of state-of-the-art methods from recent competitions, including ensemble approaches with data augmentation and hierarchical prompt tuning strategies. Second, development of novel multimodal architectures featuring Gated Attention Fusion Networks (GAFN) combined with parameter-efficient Low-Rank Adaptation (LoRA) of large language models and Technique-Aware Negative Sampling useful for contrastive learning during pretraining of multimodal architectures. A key contribution of this work is the discovery that a TinyLLaMA-based late fusion approach (1.1B parameters) with LoRA adaptation can achieve superior performance (F1: 0.704) compared to larger early fusion models, when operating under realistic computational constraints, as well as a promising technique-aware cross-modal alignment strategy - a sampling strategy for contrastive learning with early-fusion models - proposed to be useful under fewer computational constraints, which improved the overall performance of the model despite the existing environment limitations. The winning architecture incorporates several innovative components: in-domain pretraining on augmented text datasets, a two-stage gated attention mechanism, and a 5-layer classification head. Importantly, this thesis provides empirical evidence that batch size limitations, memory constraints, and training time restrictions create optimization challenges that favor simpler, more modular architectures over theoretically superior but computationally demanding alternatives. This insight has significant implications for practical multimodal system deployment and challenges the trend toward ever-increasing architectural complexity. In sum, this thesis offers a comprehensive investigation into constraint-aware multimodal architecture design for persuasion technique detection, providing both methodological innovations and practical insights for the broader multimodal learning community.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				COMPUTER SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Rhetorical Analysis of Memes to Detect Persuasion Techniques
			
	Abstract in italiano
	
				In the digital age, memes have evolved from humorous internet content into powerful instruments of persuasion, capable of shaping public opinion and reinforcing political ideologies. This thesis addresses the computational challenge of automatically detecting persuasion techniques in memes, focusing on the design and optimization of multimodal fusion architectures under realistic computational constraints.

Building upon the framework established in SemEval-2024 Task 4 by Dimitrov et al., this research investigates how different multimodal fusion strategies perform when operating within typical academic resource limitations. The work systematically compares early fusion appro\-aches with late fusion architectures enhanced by sophisticated attention mechanisms, revealing crucial insights about the practical deployment of multimodal systems.

The thesis is structured around two main research directions. First, comprehensive replication and analysis of state-of-the-art methods from recent competitions, including ensemble approaches with data augmentation and hierarchical prompt tuning strategies. Second, development of novel multimodal architectures featuring Gated Attention Fusion Networks (GAFN) combined with parameter-efficient Low-Rank Adaptation (LoRA) of large language models and Technique-Aware Negative Sampling useful for contrastive learning during pretraining of multimodal architectures. 

A key contribution of this work is the discovery that a TinyLLaMA-based late fusion approach (1.1B parameters) with LoRA adaptation can achieve superior performance (F1: 0.704) compared to larger early fusion models, when operating under realistic computational constraints, as well as a promising technique-aware cross-modal alignment strategy - a sampling strategy for contrastive learning with early-fusion models - proposed to be useful under fewer computational constraints, which improved the overall performance of the model despite the existing environment limitations. 

The winning architecture incorporates several innovative components: in-domain pretraining on augmented text datasets, a two-stage gated attention mechanism, and a 5-layer classification head. 

Importantly, this thesis provides empirical evidence that batch size limitations, memory constraints, and training time restrictions create optimization challenges that favor simpler, more modular architectures over theoretically superior but computationally demanding alternatives. This insight has significant implications for practical multimodal system deployment and challenges the trend toward ever-increasing architectural complexity.

In sum, this thesis offers a comprehensive investigation into constraint-aware multimodal architecture design for persuasion technique detection, providing both methodological innovations and practical insights for the broader multimodal learning community.
			
	Parola chiave
	
				Classification
Persuasion technique
Multi-modality
			
	Relatore
	
				DA SAN MARTINO, GIOVANNI
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
CS_MsC_Thesis___UniPD.pdf Accesso riservato Dimensione 2.17 MB Formato Adobe PDF	2.17 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89973