Semantic search is a fundamental component of information retrieval, particularly in high- impact domains where extracting precise and contextually relevant information is crucial. This research investigates the effectiveness of Prefix-Driven Contrastive Learning in improving retrieval performance for semantic search by fine-tuning pretrained models. Specifically, the study focuses on BGE (Better Generative Embeddings) and E5 (Embedding-based Efficiently- trained Encoder), leveraging contrastive learning techniques to enhance text representations. The experiments are conducted using the FiQA dataset, a benchmark for financial question- answering and retrieval tasks, ensuring the evaluation is domain-specific and aligned with real- world applications. The study systematically applies prefix-based training, where corpus and query prefixes are explicitly incorporated to improve the alignment of embeddings. The impact of this approach is assessed using multiple retrieval models, including BGE-small, BGE-base, and E5-small. To ensure consistency, all models are fine-tuned using the same hyperparame- ters, with adjustments made only for BGE-base due to computational constraints. The effectiveness of prefix-driven contrastive learning is evaluated through standard retrieval metrics, ensuring a comprehensive assessment of retrieval performance. The study employs a robust evaluation framework to analyze the impact of prefix-based training on model effective- ness in financial semantic search tasks. The findings of this study contribute to advancing semantic search methodologies by an- alyzing the role of prefix-driven training in dense retrieval models. The results offer insights into the potential benefits and limitations of prefix-based contrastive learning, particularly for financial and high-impact domains. By identifying effective strategies for enhancing repre- sentation learning in retrieval systems, this research aims to inform future developments in domain-specific search optimization.
Semantic search is a fundamental component of information retrieval, particularly in high- impact domains where extracting precise and contextually relevant information is crucial. This research investigates the effectiveness of Prefix-Driven Contrastive Learning in improving retrieval performance for semantic search by fine-tuning pretrained models. Specifically, the study focuses on BGE (Better Generative Embeddings) and E5 (Embedding-based Efficiently- trained Encoder), leveraging contrastive learning techniques to enhance text representations. The experiments are conducted using the FiQA dataset, a benchmark for financial question- answering and retrieval tasks, ensuring the evaluation is domain-specific and aligned with real- world applications. The study systematically applies prefix-based training, where corpus and query prefixes are explicitly incorporated to improve the alignment of embeddings. The impact of this approach is assessed using multiple retrieval models, including BGE-small, BGE-base, and E5-small. To ensure consistency, all models are fine-tuned using the same hyperparame- ters, with adjustments made only for BGE-base due to computational constraints. The effectiveness of prefix-driven contrastive learning is evaluated through standard retrieval metrics, ensuring a comprehensive assessment of retrieval performance. The study employs a robust evaluation framework to analyze the impact of prefix-based training on model effective- ness in financial semantic search tasks. The findings of this study contribute to advancing semantic search methodologies by an- alyzing the role of prefix-driven training in dense retrieval models. The results offer insights into the potential benefits and limitations of prefix-based contrastive learning, particularly for financial and high-impact domains. By identifying effective strategies for enhancing repre- sentation learning in retrieval systems, this research aims to inform future developments in domain-specific search optimization.
Evaluating Prefix-Driven Contrastive Learning for Semantic Search in High-Impact Societal Domains
YILMAZ, CEREN
2024/2025
Abstract
Semantic search is a fundamental component of information retrieval, particularly in high- impact domains where extracting precise and contextually relevant information is crucial. This research investigates the effectiveness of Prefix-Driven Contrastive Learning in improving retrieval performance for semantic search by fine-tuning pretrained models. Specifically, the study focuses on BGE (Better Generative Embeddings) and E5 (Embedding-based Efficiently- trained Encoder), leveraging contrastive learning techniques to enhance text representations. The experiments are conducted using the FiQA dataset, a benchmark for financial question- answering and retrieval tasks, ensuring the evaluation is domain-specific and aligned with real- world applications. The study systematically applies prefix-based training, where corpus and query prefixes are explicitly incorporated to improve the alignment of embeddings. The impact of this approach is assessed using multiple retrieval models, including BGE-small, BGE-base, and E5-small. To ensure consistency, all models are fine-tuned using the same hyperparame- ters, with adjustments made only for BGE-base due to computational constraints. The effectiveness of prefix-driven contrastive learning is evaluated through standard retrieval metrics, ensuring a comprehensive assessment of retrieval performance. The study employs a robust evaluation framework to analyze the impact of prefix-based training on model effective- ness in financial semantic search tasks. The findings of this study contribute to advancing semantic search methodologies by an- alyzing the role of prefix-driven training in dense retrieval models. The results offer insights into the potential benefits and limitations of prefix-based contrastive learning, particularly for financial and high-impact domains. By identifying effective strategies for enhancing repre- sentation learning in retrieval systems, this research aims to inform future developments in domain-specific search optimization.| File | Dimensione | Formato | |
|---|---|---|---|
|
Yilmaz_Ceren.pdf
accesso aperto
Dimensione
1.56 MB
Formato
Adobe PDF
|
1.56 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/84361