This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.
This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.
Unsupervised Extractive Summarization Using Hierarchical Transformers
REYHANI KIVI, RAMTIN
2024/2025
Abstract
This thesis investigates unsupervised extractive summarization, motivated by the lack of large-scale labeled datasets, by combining complementary ranking criteria from a hierarchical Transformer encoder plus redundancy control. Building on a pre-trained HIBERT model for contextualized sentence representations, we score sentences using a weighted combination of four criteria: a masked sentence prediction (MSP) probability score and an attention-based centrality score adapted from STAS, complemented by two pointwise mutual information (PMI)-inspired criteria to promote relevance and penalize redundancy. The methodology, designed for simplicity and reproducibility on the CNN/DailyMail dataset, involves caching sentence-level scores to facilitate hyperparameter tuning—including criteria weights, summary length, and a minimum sentence length threshold—via grid search on the validation set. Evaluation on the standard test set using ROUGE metrics demonstrates that the proposed method is competitive with state-of-the-art unsupervised baselines, particularly on ROUGE-1, with the most significant performance gains arising from the explicit redundancy penalty. While supervised methods and oracle upper bounds remain stronger, this work confirms that combining multiple unsupervised criteria from a hierarchical encoder, with a strong emphasis on redundancy control, is an effective strategy for extractive summarization.| File | Dimensione | Formato | |
|---|---|---|---|
|
Ramtin-Reyhani-Kivi.pdf
accesso aperto
Dimensione
779.79 kB
Formato
Adobe PDF
|
779.79 kB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91839