This thesis presents a novel approach to generating and evaluating general-purpose user embeddings for online advertising using clickstream session data. The research extends the BERT4Rec model by incorporating fixed token embeddings from a pre-trained SentenceBERT and implementing a multi-task learning objective. For balancing the losses across various tasks, several loss weighting techniques have been compared. This self-supervised learning strategy processes sequential URL data to capture both immediate and long-term user preferences, predicting subsequent interactions, estimating time spent on websites, and classifying user attributes such as country, device type, and cohort IDs. The model's performance is evaluated using standard recommender system metrics and custom intrinsic evaluation metrics, demonstrating strong predictive capabilities and outperforming a powerful baseline model. Visualization of the embeddings using t-SNE reveals distinct clusters corresponding to different user attributes, validating the effectiveness of the approach. Results show that users close in the embedding space share similar interests and cohort memberships, highlighting the embeddings' ability to capture meaningful user relationships. This research contributes to the field of user representation by offering a sophisticated method for creating compact yet informative user profiles, balancing effectiveness with privacy concerns in online advertising and personalized user experiences.
This thesis presents a novel approach to generating and evaluating general-purpose user embeddings for online advertising using clickstream session data. The research extends the BERT4Rec model by incorporating fixed token embeddings from a pre-trained SentenceBERT and implementing a multi-task learning objective. For balancing the losses across various tasks, several loss weighting techniques have been compared. This self-supervised learning strategy processes sequential URL data to capture both immediate and long-term user preferences, predicting subsequent interactions, estimating time spent on websites, and classifying user attributes such as country, device type, and cohort IDs. The model's performance is evaluated using standard recommender system metrics and custom intrinsic evaluation metrics, demonstrating strong predictive capabilities and outperforming a powerful baseline model. Visualization of the embeddings using t-SNE reveals distinct clusters corresponding to different user attributes, validating the effectiveness of the approach. Results show that users close in the embedding space share similar interests and cohort memberships, highlighting the embeddings' ability to capture meaningful user relationships. This research contributes to the field of user representation by offering a sophisticated method for creating compact yet informative user profiles, balancing effectiveness with privacy concerns in online advertising and personalized user experiences.
Transforming clickstream session data into general-purpose user embeddings for privacy-preserving online advertising
DICHOSKI, DEJAN
2023/2024
Abstract
This thesis presents a novel approach to generating and evaluating general-purpose user embeddings for online advertising using clickstream session data. The research extends the BERT4Rec model by incorporating fixed token embeddings from a pre-trained SentenceBERT and implementing a multi-task learning objective. For balancing the losses across various tasks, several loss weighting techniques have been compared. This self-supervised learning strategy processes sequential URL data to capture both immediate and long-term user preferences, predicting subsequent interactions, estimating time spent on websites, and classifying user attributes such as country, device type, and cohort IDs. The model's performance is evaluated using standard recommender system metrics and custom intrinsic evaluation metrics, demonstrating strong predictive capabilities and outperforming a powerful baseline model. Visualization of the embeddings using t-SNE reveals distinct clusters corresponding to different user attributes, validating the effectiveness of the approach. Results show that users close in the embedding space share similar interests and cohort memberships, highlighting the embeddings' ability to capture meaningful user relationships. This research contributes to the field of user representation by offering a sophisticated method for creating compact yet informative user profiles, balancing effectiveness with privacy concerns in online advertising and personalized user experiences.File | Dimensione | Formato | |
---|---|---|---|
Dichoski_Dejan.pdf
accesso riservato
Dimensione
3.91 MB
Formato
Adobe PDF
|
3.91 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/71027