Vector word embeddings are powerful tools, but they do not naturally express uncertainty about the target concept. Furthermore, they do not naturally model asymmetric relations since they are compared using symmetric distance functions, such as dot product, cosine similarity, or Euclidean distance. This has led to the exploration of new ways of representing words based on Geometric Representations. Word2Box is a geometric extension of Word2Vec that learns region-based word representations that allow to perform set-theoretic operations between words. In fact, it is a fuzzy set interpretation of box embeddings, where each word is represented by an n-dimensional hyperrectangle, also called ”box”. This innovative approach enables the modeling of uncertainty and asymmetric relationships between words, offering a significant improvement over traditional vector embeddings. By leveraging the properties of fuzzy sets and geometric shapes, WordzBox provides a more nuanced and flexible representation of word meanings, allowing to capture complex linguistic relationships that cannot be represented with point-based embeddings. This thesis presents the analysis of the Word2Box algorithm, and provides a comprehensive explanation of the codebase, including some considerations on its effectiveness in capturing semantic relationships and its potential to enhance natural language processing tasks.

Word2box: analysis and exploration of a geometric word embedding algorithm

BIGARELLA, CHIARA
2023/2024

Abstract

Vector word embeddings are powerful tools, but they do not naturally express uncertainty about the target concept. Furthermore, they do not naturally model asymmetric relations since they are compared using symmetric distance functions, such as dot product, cosine similarity, or Euclidean distance. This has led to the exploration of new ways of representing words based on Geometric Representations. Word2Box is a geometric extension of Word2Vec that learns region-based word representations that allow to perform set-theoretic operations between words. In fact, it is a fuzzy set interpretation of box embeddings, where each word is represented by an n-dimensional hyperrectangle, also called ”box”. This innovative approach enables the modeling of uncertainty and asymmetric relationships between words, offering a significant improvement over traditional vector embeddings. By leveraging the properties of fuzzy sets and geometric shapes, WordzBox provides a more nuanced and flexible representation of word meanings, allowing to capture complex linguistic relationships that cannot be represented with point-based embeddings. This thesis presents the analysis of the Word2Box algorithm, and provides a comprehensive explanation of the codebase, including some considerations on its effectiveness in capturing semantic relationships and its potential to enhance natural language processing tasks.
2023
Word2box: analysis and exploration of a geometric word embedding algorithm
word2box
word embeddings
geometric word embed
box embeddings
File in questo prodotto:
File Dimensione Formato  
dissertation.pdf

accesso aperto

Dimensione 2.35 MB
Formato Adobe PDF
2.35 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/68382