This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.

This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.

Unsupervised Extractive Summarization Using Hierarchical Transformers

REYHANI KIVI, RAMTIN
2024/2025

Abstract

This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.
2024
Unsupervised Extractive Summarization Using Hierarchical Transformers
This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.
Deep Learning
NLP
LLMs
Transformers
Text Summarization
File in questo prodotto:
File Dimensione Formato  
Ramtin-Reyhani-Kivi.pdf

accesso aperto

Dimensione 779.79 kB
Formato Adobe PDF
779.79 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91839