This thesis examines how to predict artwork auction prices by analyzing a large dataset. Un- derstanding the challenges and opportunities in this domain is crucial to improving prediction accuracy. The research uses 1,006,949 auction records, which include text, categories, and nu- merical data. In this study, I combine standard machine learning methods with recent language models. I test whether using deep contextual signals from BERT can improve predictions. The dataset is enhanced with artist details from Wikidata and cleaned text extracted from artwork titles and descriptions. Some categorical variables have many unique values. To address this, I use a compact frequency and target-based encoding method. This approach retains important information while reducing the number of features. I tested three types of models: regularized linear regression, Random Forests, and a fully connected neural network. Each model is trained with and without BERT embeddings. This measures the extent to which the text features help. BERT representations are reduced using PCA, which helps to keep most of the meaning while limiting the number of features. Performance is measured using Mean Absolute Error (MAE), the coefficient of determination (R^2), and the Zeta score. The Zeta score is especially useful for the skewed price distributions found in art markets. The results show that BERT embeddings add some information, but their effect is modest compared to that of well-designed numerical and categorical features. Random Forests perform best, giving lower errors and more stable results than linear models or neural networks. However, human auction estimates still outper- form any machine learning model by a wide margin. This shows how difficult it is to match expert intuition, individual sales knowledge, and the broader market context using only past data. The thesis concludes with a discussion of key limitations. These include mixed text qual- ity, missing or incomplete artist information, large price outliers, and the unpredictable nature of auction results. For future research, I suggest exploring multimodal transformer architec- tures that incorporate images. I also suggest better ways to model artist careers and dynamic representations that account for evolving market trends over time.

Predicting Art Auction Prices using Metadata

JAHANIANARANGE, NAHID
2024/2025

Abstract

This thesis examines how to predict artwork auction prices by analyzing a large dataset. Un- derstanding the challenges and opportunities in this domain is crucial to improving prediction accuracy. The research uses 1,006,949 auction records, which include text, categories, and nu- merical data. In this study, I combine standard machine learning methods with recent language models. I test whether using deep contextual signals from BERT can improve predictions. The dataset is enhanced with artist details from Wikidata and cleaned text extracted from artwork titles and descriptions. Some categorical variables have many unique values. To address this, I use a compact frequency and target-based encoding method. This approach retains important information while reducing the number of features. I tested three types of models: regularized linear regression, Random Forests, and a fully connected neural network. Each model is trained with and without BERT embeddings. This measures the extent to which the text features help. BERT representations are reduced using PCA, which helps to keep most of the meaning while limiting the number of features. Performance is measured using Mean Absolute Error (MAE), the coefficient of determination (R^2), and the Zeta score. The Zeta score is especially useful for the skewed price distributions found in art markets. The results show that BERT embeddings add some information, but their effect is modest compared to that of well-designed numerical and categorical features. Random Forests perform best, giving lower errors and more stable results than linear models or neural networks. However, human auction estimates still outper- form any machine learning model by a wide margin. This shows how difficult it is to match expert intuition, individual sales knowledge, and the broader market context using only past data. The thesis concludes with a discussion of key limitations. These include mixed text qual- ity, missing or incomplete artist information, large price outliers, and the unpredictable nature of auction results. For future research, I suggest exploring multimodal transformer architec- tures that incorporate images. I also suggest better ways to model artist careers and dynamic representations that account for evolving market trends over time.
2024
Predicting Art Auction Prices using Metadata
Art Market Predictio
Machine Learning
CLIP embeddings
File in questo prodotto:
File Dimensione Formato  
Nahid_Jahanian_2090040.pdf

Accesso riservato

Dimensione 2.94 MB
Formato Adobe PDF
2.94 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/102116