Information Retrieval (IR) aims to identify and return documents that satisfy a user’s information need. Traditional retrieval models rely on exact term matching, whereas modern approaches can exploit semantic relationships through vector representations. In this study, we investigate two vector-space operators, Meet and Join, designed to capture latent thematic intersections and unions of themes extracted from the retrieved documents for a given query. We integrate these operators into a retrieval pipeline where we first leverage a standard term-based ranker. Then the top documents are re-ranked according to semantic scores derived from the Meet and Join outputs. Finally, we evaluate this approach on the Robust04 dataset using standard metrics such as MAP, NDCG@20, and Recall@20 to determine whether we obtain beneficial improvements.
L'Information Retrieval (IR) si occupa di identificare e restituire i documenti che soddisfano le esigenze informative di un utente. I modelli di retrieval tradizionali si basano sulla corrispondenza esatta tra termini, mentre approcci più moderni cercano di sfruttare le relazioni semantiche attraverso rappresentazioni vettoriali. In questa tesi analizzeremo due operatori vettoriali, Meet e Join, progettati per rappresentare rispettivamente dei concetti vicini a quelli di intersezione e unione negli spazi vettoriali generati dai temi estratti dai documenti ottenuti in una query. Questi operatori sono stati integrati in un sistema che inizialmente utilizza un sistema di ricerca basato sui termini e successivamente i documenti sono riordinati sulla base dei punteggi semantici derivati dai risultati di Meet e Join. Infine, viene effettuata la valutazione sperimentale utilizzato la collezione sperimentale Robust04 e metriche standard come MAP, NDCG@20 and Recall@20 al fine valutare se si ottengono miglioramenti in termini di efficacia rispetto alla ricerca iniziale.
Evaluation of Theme-Based Re-Ranking with Meet and Join Operators
BRUTTOMESSO, ANDREA
2024/2025
Abstract
Information Retrieval (IR) aims to identify and return documents that satisfy a user’s information need. Traditional retrieval models rely on exact term matching, whereas modern approaches can exploit semantic relationships through vector representations. In this study, we investigate two vector-space operators, Meet and Join, designed to capture latent thematic intersections and unions of themes extracted from the retrieved documents for a given query. We integrate these operators into a retrieval pipeline where we first leverage a standard term-based ranker. Then the top documents are re-ranked according to semantic scores derived from the Meet and Join outputs. Finally, we evaluate this approach on the Robust04 dataset using standard metrics such as MAP, NDCG@20, and Recall@20 to determine whether we obtain beneficial improvements.| File | Dimensione | Formato | |
|---|---|---|---|
|
Bruttomesso_Andrea.pdf
accesso aperto
Dimensione
748.7 kB
Formato
Adobe PDF
|
748.7 kB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/96058