How Good Is a Web Page? Data Collection for Experimental Evaluation of Link Analysis Algorithms

This thesis describes motivations, techniques and results of a large crawl designed to obtain a suitable snapshot of the web graph. Our goal requires a properly designed crawling system to explore the whole .it domain. As a result, we obtained a fast and stable crawling system, which in a preliminary test collected more than 308 million distinct web pages in 28 days at an average rate of 204 pages per second, using a single high-end PC-class machine.

How Good Is a Web Page? Data Collection for Experimental Evaluation of Link Analysis Algorithms

Secco, Alessandro

2014/2015

Abstract

Scheda

Scheda DC

	Anno Accademico
	
				2014-10-14
			
	Parola chiave
	
				web, crawl, crawling, heritrix, link analysis, information retrieval, algorithms
			
	Relatore
	
				Peserico Stecchini Negri De Salvi, Enoch
			
	Correlatore
	
				Bressan, Marco
Pretto, Luca
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Alessandro_Secco_WebQual_Thesis.pdf Accesso riservato Dimensione 1.31 MB Formato Adobe PDF	1.31 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/18707