Online social networks (OSNs) are essential for modern communication, enabling users to connect, share content, access news, and engage with trending topics. However, the proliferation of fake accounts on these platforms raises concerns about misinformation, fraud, and malicious activities. While prior research has extensively explored fake account detection on platforms like Twitter, Facebook, and Instagram using supervised and semi-supervised machine learning techniques, TikTok remains underexplored. This study bridges this gap by adapting a semi-supervised self-training framework, originally developed for Twitter, to detect fake accounts on TikTok, using its unique feature set and evaluating performance with metrics such as AUC, Recall, and G-Mean. We use a labeled dataset of 1,699 TikTok accounts (758 fake, 941 real) sourced from GitHub, enriched with features extracted via the TikTok API and Apify.com. Robust preprocessing, including Z-Score and Min-Max normalization, ensures consistent feature scaling, while resampling techniques like Synthetic Minority Oversampling Technique (SMOTE) and Cluster-Based Undersampling Technique (CBUTE) address class imbalance to enhance model performance. The proposed Self-Training Semi-Supervised Resampling (SSSTR) framework, tested with six classifiers including Random Forest, CART, Gradient Boosting, AdaBoost, K-Nearest Neighbor, and Naïve Bayes, demonstrates strong results. Among the classifiers evaluated, Random Forest consistently performed best when paired with Z-Score normalization and a carefully selected set of features. CART and Gradient Boosting also yielded strong outcomes under similar conditions. A reduced feature set containing only four attributes maintained competitive performance, showing that efficient detection is possible even with a compact model. Using SMOTE to balance class distribution improved performance across classifiers, particularly compared to CBUTE. By integrating resampling within the self-training process and analyzing feature subsets, this research provides novel insights into fake account detection on TikTok. The methodology and findings offer a foundation for future studies on TikTok feature extraction and platform security.

Online social networks (OSNs) are essential for modern communication, enabling users to connect, share content, access news, and engage with trending topics. However, the proliferation of fake accounts on these platforms raises concerns about misinformation, fraud, and malicious activities. While prior research has extensively explored fake account detection on platforms like Twitter, Facebook, and Instagram using supervised and semi-supervised machine learning techniques, TikTok remains underexplored. This study bridges this gap by adapting a semi-supervised self-training framework, originally developed for Twitter, to detect fake accounts on TikTok, using its unique feature set and evaluating performance with metrics such as AUC, Recall, and G-Mean. We use a labeled dataset of 1,699 TikTok accounts (758 fake, 941 real) sourced from GitHub, enriched with features extracted via the TikTok API and Apify.com. Robust preprocessing, including Z-Score and Min-Max normalization, ensures consistent feature scaling, while resampling techniques like Synthetic Minority Oversampling Technique (SMOTE) and Cluster-Based Undersampling Technique (CBUTE) address class imbalance to enhance model performance. The proposed Self-Training Semi-Supervised Resampling (SSSTR) framework, tested with six classifiers including Random Forest, CART, Gradient Boosting, AdaBoost, K-Nearest Neighbor, and Naïve Bayes, demonstrates strong results. Among the classifiers evaluated, Random Forest consistently performed best when paired with Z-Score normalization and a carefully selected set of features. CART and Gradient Boosting also yielded strong outcomes under similar conditions. A reduced feature set containing only four attributes maintained competitive performance, showing that efficient detection is possible even with a compact model. Using SMOTE to balance class distribution improved performance across classifiers, particularly compared to CBUTE. By integrating resampling within the self-training process and analyzing feature subsets, this research provides novel insights into fake account detection on TikTok. The methodology and findings offer a foundation for future studies on TikTok feature extraction and platform security.

Detecting fake accounts on TikTok

HEYDARI, MAHSA
2024/2025

Abstract

Online social networks (OSNs) are essential for modern communication, enabling users to connect, share content, access news, and engage with trending topics. However, the proliferation of fake accounts on these platforms raises concerns about misinformation, fraud, and malicious activities. While prior research has extensively explored fake account detection on platforms like Twitter, Facebook, and Instagram using supervised and semi-supervised machine learning techniques, TikTok remains underexplored. This study bridges this gap by adapting a semi-supervised self-training framework, originally developed for Twitter, to detect fake accounts on TikTok, using its unique feature set and evaluating performance with metrics such as AUC, Recall, and G-Mean. We use a labeled dataset of 1,699 TikTok accounts (758 fake, 941 real) sourced from GitHub, enriched with features extracted via the TikTok API and Apify.com. Robust preprocessing, including Z-Score and Min-Max normalization, ensures consistent feature scaling, while resampling techniques like Synthetic Minority Oversampling Technique (SMOTE) and Cluster-Based Undersampling Technique (CBUTE) address class imbalance to enhance model performance. The proposed Self-Training Semi-Supervised Resampling (SSSTR) framework, tested with six classifiers including Random Forest, CART, Gradient Boosting, AdaBoost, K-Nearest Neighbor, and Naïve Bayes, demonstrates strong results. Among the classifiers evaluated, Random Forest consistently performed best when paired with Z-Score normalization and a carefully selected set of features. CART and Gradient Boosting also yielded strong outcomes under similar conditions. A reduced feature set containing only four attributes maintained competitive performance, showing that efficient detection is possible even with a compact model. Using SMOTE to balance class distribution improved performance across classifiers, particularly compared to CBUTE. By integrating resampling within the self-training process and analyzing feature subsets, this research provides novel insights into fake account detection on TikTok. The methodology and findings offer a foundation for future studies on TikTok feature extraction and platform security.
2024
Detecting fake accounts on TikTok
Online social networks (OSNs) are essential for modern communication, enabling users to connect, share content, access news, and engage with trending topics. However, the proliferation of fake accounts on these platforms raises concerns about misinformation, fraud, and malicious activities. While prior research has extensively explored fake account detection on platforms like Twitter, Facebook, and Instagram using supervised and semi-supervised machine learning techniques, TikTok remains underexplored. This study bridges this gap by adapting a semi-supervised self-training framework, originally developed for Twitter, to detect fake accounts on TikTok, using its unique feature set and evaluating performance with metrics such as AUC, Recall, and G-Mean. We use a labeled dataset of 1,699 TikTok accounts (758 fake, 941 real) sourced from GitHub, enriched with features extracted via the TikTok API and Apify.com. Robust preprocessing, including Z-Score and Min-Max normalization, ensures consistent feature scaling, while resampling techniques like Synthetic Minority Oversampling Technique (SMOTE) and Cluster-Based Undersampling Technique (CBUTE) address class imbalance to enhance model performance. The proposed Self-Training Semi-Supervised Resampling (SSSTR) framework, tested with six classifiers including Random Forest, CART, Gradient Boosting, AdaBoost, K-Nearest Neighbor, and Naïve Bayes, demonstrates strong results. Among the classifiers evaluated, Random Forest consistently performed best when paired with Z-Score normalization and a carefully selected set of features. CART and Gradient Boosting also yielded strong outcomes under similar conditions. A reduced feature set containing only four attributes maintained competitive performance, showing that efficient detection is possible even with a compact model. Using SMOTE to balance class distribution improved performance across classifiers, particularly compared to CBUTE. By integrating resampling within the self-training process and analyzing feature subsets, this research provides novel insights into fake account detection on TikTok. The methodology and findings offer a foundation for future studies on TikTok feature extraction and platform security.
social media
fake accounts
ml algorithm
File in questo prodotto:
File Dimensione Formato  
Detecting_fake_accounts_on_TikTok_Mahsa.heydari.pdf

accesso aperto

Dimensione 4.84 MB
Formato Adobe PDF
4.84 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91984