Scene Graphs (SGs) are Knowledge Graphs representing the contents of an image in terms of its elements, e.g., people, objects, and attributes, as well as their relationships. Hence, they can capture the scene’s structural and semantic organization and express it in a machine-readable model. Thanks to their expressive power, SGs have been applied in important Image Processing tasks such as Image Captioning, Visual Question Answering, and Image Search. In this work, we focus on Image Search, which, given a query, is the task of retrieving the images that best match the text given as input. The task of determining which images are the best matches for a given query becomes more and more challenging as the details and the complexity of the query increase. While several approaches have been proposed to tackle the Image-Search task by adopting pre-trained language models, the opportunity to use both a Scene Graph and a language model has not been studied yet. In this work, we employ an SG representation and a pre-trained language model with the purpose of improving the Image Search performance when dealing with complex textual queries.
Semantic Aware Image Search with Scene Knowledge Graphs
LOREGGIA, GIACOMO
2021/2022
Abstract
Scene Graphs (SGs) are Knowledge Graphs representing the contents of an image in terms of its elements, e.g., people, objects, and attributes, as well as their relationships. Hence, they can capture the scene’s structural and semantic organization and express it in a machine-readable model. Thanks to their expressive power, SGs have been applied in important Image Processing tasks such as Image Captioning, Visual Question Answering, and Image Search. In this work, we focus on Image Search, which, given a query, is the task of retrieving the images that best match the text given as input. The task of determining which images are the best matches for a given query becomes more and more challenging as the details and the complexity of the query increase. While several approaches have been proposed to tackle the Image-Search task by adopting pre-trained language models, the opportunity to use both a Scene Graph and a language model has not been studied yet. In this work, we employ an SG representation and a pre-trained language model with the purpose of improving the Image Search performance when dealing with complex textual queries.File | Dimensione | Formato | |
---|---|---|---|
Loreggia_Giacomo.pdf
accesso aperto
Dimensione
6.19 MB
Formato
Adobe PDF
|
6.19 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/36544