Pattern Mining is a fundamental problem in the field of Data Mining, due to its very central role in many areas. In this work, we focus on discovering subgraphs that are associated with the value of a specific feature called target. The goal is to find patterns that are associated with the target not only in the analyzed dataset, but also in the overall process that generates it, since the dataset is just a representation of it. We achieve this by performing statistical significance tests in an efficient way, returning significant patterns with solid guarantees on the probability of false discoveries, exploiting the Few-Shot Resampling framework, recently introduced for different types of patterns. We present SubStAnce, a tool for efficiently mining statistically significant subgraphs, for which we perform an extensive experimental evaluation that shows the validity of the tool and how different hyperparameters’ values affect performances.

SubStAnce: Efficient Mining of Statistically Significant Subgraphs with Few-Shot Resampling

CARLESSO, DANIEL
2023/2024

Abstract

Pattern Mining is a fundamental problem in the field of Data Mining, due to its very central role in many areas. In this work, we focus on discovering subgraphs that are associated with the value of a specific feature called target. The goal is to find patterns that are associated with the target not only in the analyzed dataset, but also in the overall process that generates it, since the dataset is just a representation of it. We achieve this by performing statistical significance tests in an efficient way, returning significant patterns with solid guarantees on the probability of false discoveries, exploiting the Few-Shot Resampling framework, recently introduced for different types of patterns. We present SubStAnce, a tool for efficiently mining statistically significant subgraphs, for which we perform an extensive experimental evaluation that shows the validity of the tool and how different hyperparameters’ values affect performances.
2023
SubStAnce: Efficient Mining of Statistically Significant Subgraphs with Few-Shot Resampling
Pattern Mining
Hypothesis Testing
Subgraph Mining
File in questo prodotto:
File Dimensione Formato  
Carlesso_Daniel.pdf

accesso aperto

Dimensione 964.86 kB
Formato Adobe PDF
964.86 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/77002