Web Scraping e Tennis: estrazione, salvataggio ed analisi di dati statistici dal sito dell'ATP

The Internet is, nowadays, the largest collection existing of data, and the web is thus increasingly becoming the main source from which to obtain information. However, extracting data from the web is often too expensive; the practical copy-paste cannot in fact be a practicable way with a mass of data in the order of millions and the time and energy employed would be enormous. Fortunately, can be found in this area techniques that enable the process of extracting data and information from websites to be more usable and less expensive. Web Scraping, the main topic of this paper, is, in fact, a set of computer techniques by which data is extracted from the web. In this thesis we want to introduce, present and describe this topic in all its nuances, citing its origins, showing how to approach it, describing its tools, techniques and libraries. Its then presented and described the project that has been developed, with which Web Scraping was applied in the field of statistics data analysis in sports, specifically in tennis.

Internet è, ad oggi, la più grande raccolta di dati esistente, ed il web diventa così sempre più la principale fonte dalla quale ricavare informazioni. Estrarre dati dal web, spesso, risulta però un'operazione troppo dispendiosa; il pratico copia-incolla non può essere infatti una strada percorribile con una mole di dati nell'ordine dei milioni ed il tempo e le energie impiegate sarebbero enormi. Fortunatamente si collocano in questo ambito delle tecniche che permettono di rendere più fruibile e meno dispendioso il processo di estrazione di dati ed informazioni dai siti web. Il Web Scraping, argomento principale di questo elaborato, è, infatti, un insieme di tecniche informatiche mediante le quali vengono estratti dati dal web. In questa tesi si vuole introdurre, presentare e descrivere questo argomento in tutte le sue sfumature, citandone le origini, mostrando come approcciarvisi, descrivendone strumenti, tecniche e librerie. Viene poi presentato e descritto il progetto che è stato sviluppato, con il quale si è applicato il Web Scraping nel campo delle analisi di dati statistici in ambito sportivo, nello specifico quello tennistico.

Web Scraping e Tennis: estrazione, salvataggio ed analisi di dati statistici dal sito dell'ATP

GHIOTTO, ANDREA

2022/2023

Abstract

The Internet is, nowadays, the largest collection existing of data, and the web is thus increasingly becoming the main source from which to obtain information. However, extracting data from the web is often too expensive; the practical copy-paste cannot in fact be a practicable way with a mass of data in the order of millions and the time and energy employed would be enormous. Fortunately, can be found in this area techniques that enable the process of extracting data and information from websites to be more usable and less expensive. Web Scraping, the main topic of this paper, is, in fact, a set of computer techniques by which data is extracted from the web. In this thesis we want to introduce, present and describe this topic in all its nuances, citing its origins, showing how to approach it, describing its tools, techniques and libraries. Its then presented and described the project that has been developed, with which Web Scraping was applied in the field of statistics data analysis in sports, specifically in tennis.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				INGEGNERIA INFORMATICA Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Web Scraping and Tennis: statistic data extraction, saving and analysis from ATP web site
			
	Abstract in italiano
	
				Internet è, ad oggi, la più grande raccolta di dati esistente, ed il web diventa così sempre più la principale fonte dalla quale ricavare informazioni. Estrarre dati dal web, spesso, risulta però un'operazione troppo dispendiosa; il pratico copia-incolla non può essere infatti una strada percorribile con una mole di dati nell'ordine dei milioni ed il tempo e le energie impiegate sarebbero enormi.
Fortunatamente si collocano in questo ambito delle tecniche che permettono di rendere più fruibile e meno dispendioso il processo di estrazione di dati ed informazioni dai siti web. Il Web Scraping, argomento principale di questo elaborato, è, infatti, un insieme di tecniche informatiche mediante le quali vengono estratti dati dal web.
In questa tesi si vuole introdurre, presentare e descrivere questo argomento in tutte le sue sfumature, citandone le origini, mostrando come approcciarvisi, descrivendone strumenti, tecniche e librerie. Viene poi presentato e descritto il progetto che è stato sviluppato, con il quale si è applicato il Web Scraping nel campo delle analisi di dati statistici in ambito sportivo, nello specifico quello tennistico.
			
	Parola chiave
	
				Web Scraping
Estrazione dati
Siti web
			
	Relatore
	
				DI NUNZIO, GIORGIO MARIA
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
Ghiotto_Andrea.pdf accesso aperto Dimensione 3.99 MB Formato Adobe PDF Visualizza/Apri	3.99 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/43578