Triangle counting is one of the major graph problems with several applications in social network analysis, anomaly detection, link recommendation, query optimization, pattern classification, and other implementations in data mining and big data fields. One of the most popular computational models considered is edge streaming. However, as the size of the graph grows and involves millions of edges and nodes, several challenges arise, such as memory space limitations and long running time, and therefore sampling techniques are needed to estimate the number of triangles appearing in our graph. In this work we are going to implement a sampling algorithm using Waiting Room to store most recent edges, and making use of a Machine Learning oracle model that provides predictions about the heaviness of edges to output more accurate estimates, while still maintaining rigorous guarantees on the approximation of our result.
Triangle counting is one of the major graph problems with several applications in social network analysis, anomaly detection, link recommendation, query optimization, pattern classification, and other implementations in data mining and big data fields. One of the most popular computational models considered is edge streaming. However, as the size of the graph grows and involves millions of edges and nodes, several challenges arise, such as memory space limitations and long running time, and therefore sampling techniques are needed to estimate the number of triangles appearing in our graph. In this work we are going to implement a sampling algorithm using Waiting Room to store most recent edges, and making use of a Machine Learning oracle model that provides predictions about the heaviness of edges to output more accurate estimates, while still maintaining rigorous guarantees on the approximation of our result.
Counting Triangles in Graph Streams using Machine Learning Predictions
BOLDRIN, CRISTIAN
2022/2023
Abstract
Triangle counting is one of the major graph problems with several applications in social network analysis, anomaly detection, link recommendation, query optimization, pattern classification, and other implementations in data mining and big data fields. One of the most popular computational models considered is edge streaming. However, as the size of the graph grows and involves millions of edges and nodes, several challenges arise, such as memory space limitations and long running time, and therefore sampling techniques are needed to estimate the number of triangles appearing in our graph. In this work we are going to implement a sampling algorithm using Waiting Room to store most recent edges, and making use of a Machine Learning oracle model that provides predictions about the heaviness of edges to output more accurate estimates, while still maintaining rigorous guarantees on the approximation of our result.File | Dimensione | Formato | |
---|---|---|---|
Boldrin_Cristian.pdf
embargo fino al 08/03/2025
Dimensione
5.2 MB
Formato
Adobe PDF
|
5.2 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/50905