Machine learning (ML) models are increasingly being applied in a variety of fields, including healthcare, finance, and autonomous transportation. As ML systems are deployed in safety-critical domains, their robustness against adversarial attacks becomes crucial, especially when errors can result in life-threatening consequences. Adversarial examples, which are inputs altered by small perturbations that lead to incorrect outcomes in ML models, pose significant security risks. This paper focuses on certifying the robustness and stability of k-nearest neighbors (k-NN) classifiers against such adversarial attacks. We propose a novel method that models adversarial attacks as a perturbation region around a test sample and analyzes the labels assigned by k-NN to samples within this region. Our approach constructs a directed graph where nodes represent training samples, and edges indicate the relative proximity of these samples to the perturbation region. We perform a graph traversal to collect the most frequent labels from paths consisting of k samples, ensuring that the proximity conditions are satisfied. If only one label is found, we guarantee the stability of k-NN w.r.t. the input sample. Furthermore, if this label matches the ground truth, we conclude that the classifier is also robust. Otherwise, if multiple labels are found, 푘-NN is neither stable nor robust. We implemented the algorithm in Python and evaluate it on seven datasets commonly used for robustness verification and four datasets for fairness evaluation. Our experimental results demonstrate that the proposed certifier can successfully verify the robustnes of more than 90% of the samples under adversarial perturbations, showcasing k-NN’s potential as a robust classification algorithm.

Machine learning (ML) models are increasingly being applied in a variety of fields, including healthcare, finance, and autonomous transportation. As ML systems are deployed in safety-critical domains, their robustness against adversarial attacks becomes crucial, especially when errors can result in life-threatening consequences. Adversarial examples, which are inputs altered by small perturbations that lead to incorrect outcomes in ML models, pose significant security risks. This paper focuses on certifying the robustness and stability of k-nearest neighbors (k-NN) classifiers against such adversarial attacks. We propose a novel method that models adversarial attacks as a perturbation region around a test sample and analyzes the labels assigned by k-NN to samples within this region. Our approach constructs a directed graph where nodes represent training samples, and edges indicate the relative proximity of these samples to the perturbation region. We perform a graph traversal to collect the most frequent labels from paths consisting of k samples, ensuring that the proximity conditions are satisfied. If only one label is found, we guarantee the stability of k-NN w.r.t. the input sample. Furthermore, if this label matches the ground truth, we conclude that the classifier is also robust. Otherwise, if multiple labels are found, 푘-NN is neither stable nor robust. We implemented the algorithm in Python and evaluate it on seven datasets commonly used for robustness verification and four datasets for fairness evaluation. Our experimental results demonstrate that the proposed certifier can successfully verify the robustnes of more than 90% of the samples under adversarial perturbations, showcasing k-NN’s potential as a robust classification algorithm.

Exact Robustness Certification of k-Nearest Neighbor Classifiers

SHAKEEL, AHMAD
2024/2025

Abstract

Machine learning (ML) models are increasingly being applied in a variety of fields, including healthcare, finance, and autonomous transportation. As ML systems are deployed in safety-critical domains, their robustness against adversarial attacks becomes crucial, especially when errors can result in life-threatening consequences. Adversarial examples, which are inputs altered by small perturbations that lead to incorrect outcomes in ML models, pose significant security risks. This paper focuses on certifying the robustness and stability of k-nearest neighbors (k-NN) classifiers against such adversarial attacks. We propose a novel method that models adversarial attacks as a perturbation region around a test sample and analyzes the labels assigned by k-NN to samples within this region. Our approach constructs a directed graph where nodes represent training samples, and edges indicate the relative proximity of these samples to the perturbation region. We perform a graph traversal to collect the most frequent labels from paths consisting of k samples, ensuring that the proximity conditions are satisfied. If only one label is found, we guarantee the stability of k-NN w.r.t. the input sample. Furthermore, if this label matches the ground truth, we conclude that the classifier is also robust. Otherwise, if multiple labels are found, 푘-NN is neither stable nor robust. We implemented the algorithm in Python and evaluate it on seven datasets commonly used for robustness verification and four datasets for fairness evaluation. Our experimental results demonstrate that the proposed certifier can successfully verify the robustnes of more than 90% of the samples under adversarial perturbations, showcasing k-NN’s potential as a robust classification algorithm.
2024
Exact Robustness Certification of k-Nearest Neighbor Classifiers
Machine learning (ML) models are increasingly being applied in a variety of fields, including healthcare, finance, and autonomous transportation. As ML systems are deployed in safety-critical domains, their robustness against adversarial attacks becomes crucial, especially when errors can result in life-threatening consequences. Adversarial examples, which are inputs altered by small perturbations that lead to incorrect outcomes in ML models, pose significant security risks. This paper focuses on certifying the robustness and stability of k-nearest neighbors (k-NN) classifiers against such adversarial attacks. We propose a novel method that models adversarial attacks as a perturbation region around a test sample and analyzes the labels assigned by k-NN to samples within this region. Our approach constructs a directed graph where nodes represent training samples, and edges indicate the relative proximity of these samples to the perturbation region. We perform a graph traversal to collect the most frequent labels from paths consisting of k samples, ensuring that the proximity conditions are satisfied. If only one label is found, we guarantee the stability of k-NN w.r.t. the input sample. Furthermore, if this label matches the ground truth, we conclude that the classifier is also robust. Otherwise, if multiple labels are found, 푘-NN is neither stable nor robust. We implemented the algorithm in Python and evaluate it on seven datasets commonly used for robustness verification and four datasets for fairness evaluation. Our experimental results demonstrate that the proposed certifier can successfully verify the robustnes of more than 90% of the samples under adversarial perturbations, showcasing k-NN’s potential as a robust classification algorithm.
k-Nearest Neighbors
Machine Learning
Robustness analysis
Stability analysis
File in questo prodotto:
File Dimensione Formato  
dissertation.pdf

accesso aperto

Dimensione 735.61 kB
Formato Adobe PDF
735.61 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/84823