This thesis describes motivations, techniques and results of a large crawl designed to obtain a suitable snapshot of the web graph. Our goal requires a properly designed crawling system to explore the whole .it domain. As a result, we obtained a fast and stable crawling system, which in a preliminary test collected more than 308 million distinct web pages in 28 days at an average rate of 204 pages per second, using a single high-end PC-class machine.
How Good Is a Web Page? Data Collection for Experimental Evaluation of Link Analysis Algorithms
Secco, Alessandro
2014/2015
Abstract
This thesis describes motivations, techniques and results of a large crawl designed to obtain a suitable snapshot of the web graph. Our goal requires a properly designed crawling system to explore the whole .it domain. As a result, we obtained a fast and stable crawling system, which in a preliminary test collected more than 308 million distinct web pages in 28 days at an average rate of 204 pages per second, using a single high-end PC-class machine.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
Alessandro_Secco_WebQual_Thesis.pdf
accesso riservato
Dimensione
1.31 MB
Formato
Adobe PDF
|
1.31 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
Utilizza questo identificativo per citare o creare un link a questo documento:
https://hdl.handle.net/20.500.12608/18707