Unsupervised Extractive Summarization Using Hierarchical Transformers

This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.

Unsupervised Extractive Summarization Using Hierarchical Transformers

REYHANI KIVI, RAMTIN

2024/2025

Abstract

This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Unsupervised Extractive Summarization Using Hierarchical Transformers
			
	Abstract in italiano
	
				This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria  from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.
			
	Parola chiave
	
				Deep Learning
NLP
LLMs
Transformers
Text Summarization
			
	Relatore
	
				ERSEGHE, TOMASO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Ramtin-Reyhani-Kivi.pdf accesso aperto Dimensione 779.79 kB Formato Adobe PDF Visualizza/Apri	779.79 kB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91839