This thesis presents a novel approach to generating and evaluating general-purpose user embeddings for online advertising using clickstream session data. The research extends the BERT4Rec model by incorporating fixed token embeddings from a pre-trained SentenceBERT and implementing a multi-task learning objective. For balancing the losses across various tasks, several loss weighting techniques have been compared. This self-supervised learning strategy processes sequential URL data to capture both immediate and long-term user preferences, predicting subsequent interactions, estimating time spent on websites, and classifying user attributes such as country, device type, and cohort IDs. The model's performance is evaluated using standard recommender system metrics and custom intrinsic evaluation metrics, demonstrating strong predictive capabilities and outperforming a powerful baseline model. Visualization of the embeddings using t-SNE reveals distinct clusters corresponding to different user attributes, validating the effectiveness of the approach. Results show that users close in the embedding space share similar interests and cohort memberships, highlighting the embeddings' ability to capture meaningful user relationships. This research contributes to the field of user representation by offering a sophisticated method for creating compact yet informative user profiles, balancing effectiveness with privacy concerns in online advertising and personalized user experiences.

This thesis presents a novel approach to generating and evaluating general-purpose user embeddings for online advertising using clickstream session data. The research extends the BERT4Rec model by incorporating fixed token embeddings from a pre-trained SentenceBERT and implementing a multi-task learning objective. For balancing the losses across various tasks, several loss weighting techniques have been compared. This self-supervised learning strategy processes sequential URL data to capture both immediate and long-term user preferences, predicting subsequent interactions, estimating time spent on websites, and classifying user attributes such as country, device type, and cohort IDs. The model's performance is evaluated using standard recommender system metrics and custom intrinsic evaluation metrics, demonstrating strong predictive capabilities and outperforming a powerful baseline model. Visualization of the embeddings using t-SNE reveals distinct clusters corresponding to different user attributes, validating the effectiveness of the approach. Results show that users close in the embedding space share similar interests and cohort memberships, highlighting the embeddings' ability to capture meaningful user relationships. This research contributes to the field of user representation by offering a sophisticated method for creating compact yet informative user profiles, balancing effectiveness with privacy concerns in online advertising and personalized user experiences.

Transforming clickstream session data into general-purpose user embeddings for privacy-preserving online advertising

DICHOSKI, DEJAN
2023/2024

Abstract

This thesis presents a novel approach to generating and evaluating general-purpose user embeddings for online advertising using clickstream session data. The research extends the BERT4Rec model by incorporating fixed token embeddings from a pre-trained SentenceBERT and implementing a multi-task learning objective. For balancing the losses across various tasks, several loss weighting techniques have been compared. This self-supervised learning strategy processes sequential URL data to capture both immediate and long-term user preferences, predicting subsequent interactions, estimating time spent on websites, and classifying user attributes such as country, device type, and cohort IDs. The model's performance is evaluated using standard recommender system metrics and custom intrinsic evaluation metrics, demonstrating strong predictive capabilities and outperforming a powerful baseline model. Visualization of the embeddings using t-SNE reveals distinct clusters corresponding to different user attributes, validating the effectiveness of the approach. Results show that users close in the embedding space share similar interests and cohort memberships, highlighting the embeddings' ability to capture meaningful user relationships. This research contributes to the field of user representation by offering a sophisticated method for creating compact yet informative user profiles, balancing effectiveness with privacy concerns in online advertising and personalized user experiences.
2023
Transforming clickstream session data into general-purpose user embeddings for privacy-preserving online advertising
This thesis presents a novel approach to generating and evaluating general-purpose user embeddings for online advertising using clickstream session data. The research extends the BERT4Rec model by incorporating fixed token embeddings from a pre-trained SentenceBERT and implementing a multi-task learning objective. For balancing the losses across various tasks, several loss weighting techniques have been compared. This self-supervised learning strategy processes sequential URL data to capture both immediate and long-term user preferences, predicting subsequent interactions, estimating time spent on websites, and classifying user attributes such as country, device type, and cohort IDs. The model's performance is evaluated using standard recommender system metrics and custom intrinsic evaluation metrics, demonstrating strong predictive capabilities and outperforming a powerful baseline model. Visualization of the embeddings using t-SNE reveals distinct clusters corresponding to different user attributes, validating the effectiveness of the approach. Results show that users close in the embedding space share similar interests and cohort memberships, highlighting the embeddings' ability to capture meaningful user relationships. This research contributes to the field of user representation by offering a sophisticated method for creating compact yet informative user profiles, balancing effectiveness with privacy concerns in online advertising and personalized user experiences.
Embeddings
Clickstream data
Recommender systems
BERT
Online advertising
File in questo prodotto:
File Dimensione Formato  
Dichoski_Dejan.pdf

accesso riservato

Dimensione 3.91 MB
Formato Adobe PDF
3.91 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/71027