User Privacy on Spotify: Predicting Personal Data from Music Preferences

The way we listen to music has changed drastically in the past decade. Now we can play any kind of music from various artists around the world through our smart devices. Many music streaming providers, if not most, are built with systems to track users’ music preferences and suggest new content. The music we listen to reveals a great deal about who we are. In general, people share their playlists and songs of their favorite artists on the music platform; find people with common music genres and connect with them. It is not always easy to make friends with unknown people, but music is a good way to accomplish that. In spite of that, we must also look at other sides of the coin from a security perspective. Is it a good idea to share music interests with others or will it compromise our privacy? According to privacy experts and developers, there is no purposeless data. Everything can be used to infer private information, even a single like on social media, which seems, at first sight, meaningless, but it can reveal more information than it promises. In the case that our musical tastes reveal our information, we may be profiled for targeted advertisement, by surveillance agencies, or in general, become potential victims of malicious activities Since music is part of our daily lives, and there are many providers that let us listen to music, we are even more at risk of being profiled and having our data sold. In this research, we demonstrate the feasibility of inferring personal data based on playlists and songs people publicly shared on Spotify. Through an online survey, we collected a new dataset containing the private information of 750 Spotify users and we downloaded around 402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant correlations between users’ music preferences (e.g., music genre) and private information (e.g., age, gender, economic status). As a consequence of significant correlations, we built several machine-learning models to infer private information and our results demonstrated that such inference is possible, posing a real privacy threat to all music listeners. In particular, we accurately predicted the gender (71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8% f1-score) or smokes (60.2% f1-score) regularly. The purpose of this project is to raise awareness about how seemingly purposeless data can reveal personal information and educate users about how to better protect their privacy

The way we listen to music has changed drastically in the past decade. Now we can play any kind of music from various artists around the world through our smart devices. Many music streaming providers, if not most, are built with systems to track users’ music preferences and suggest new content. The music we listen to reveals a great deal about who we are. In general, people share their playlists and songs of their favorite artists on the music platform; find people with common music genres and connect with them. It is not always easy to make friends with unknown people, but music is a good way to accomplish that. In spite of that, we must also look at other sides of the coin from a security perspective. Is it a good idea to share music interests with others or will it compromise our privacy? According to privacy experts and developers, there is no purposeless data. Everything can be used to infer private information, even a single like on social media, which seems, at first sight, meaningless, but it can reveal more information than it promises. In the case that our musical tastes reveal our information, we may be profiled for targeted advertisement, by surveillance agencies, or in general, become potential victims of malicious activities Since music is part of our daily lives, and there are many providers that let us listen to music, we are even more at risk of being profiled and having our data sold. In this research, we demonstrate the feasibility of inferring personal data based on playlists and songs people publicly shared on Spotify. Through an online survey, we collected a new dataset containing the private information of 750 Spotify users and we downloaded around 402,999 songs extracted from a total of 8777 playlists. Our statistical analysis shows significant correlations between users’ music preferences (e.g., music genre) and private information (e.g., age, gender, economic status). As a consequence of significant correlations, we built several machine-learning models to infer private information and our results demonstrated that such inference is possible, posing a real privacy threat to all music listeners. In particular, we accurately predicted the gender (71.7% f1-score), and several other private attributes, such as whether a person drinks (62.8% f1-score) or smokes (60.2% f1-score) regularly. The purpose of this project is to raise awareness about how seemingly purposeless data can reveal personal information and educate users about how to better protect their privacy.