Web Scraping: Un'analisi sul recupero automatico di informazioni sul retrocomputing

In the dynamic context of rapid technological evolution, the preservation and understanding of the computer heritage from the past become fundamental aspects. This thesis aims to explore and delve into the role of web scraping as an advanced methodology for the automatic retrieval of information related to Retrocomputing. It focuses on the analysis of past computer systems, requiring an innovative approach to systematically obtain historical data, documents, and relevant information. On the other hand, web scraping, with its ability to efficiently extract data from web pages, emerges as a crucial tool in this context. Through a detailed analysis, this thesis will explore the technical challenges associated with web data extraction related to Retrocomputing, including variations in the structure of websites over time, the analysis of their structure, and their limitations. Ethical aspects related to the automatic retrieval of online data will also be examined, considering privacy management and compliance with current regulations. In order to provide a comprehensive framework, case studies and practical examples will be presented, illustrating the effectiveness of web scraping in systematically collecting information about historical computer systems. Emphasis will be placed on its role not only as a research tool but also as an essential means for the preservation and enhancement of our rich computer heritage. The goal is to describe the practice of web scraping in the context of Retrocomputing, highlighting its implications, challenges, and future prospects.

Nel contesto dinamico della rapida evoluzione tecnologica, la conservazione e la comprensione del patrimonio informatico del passato diventano aspetti fondamentali. Questa tesi si propone di esplorare e approfondire il ruolo del web scraping come metodologia avanzata per il recupero automatico di informazioni legate al Retrocomputing. Esso si focalizza sull'analisi dei sistemi informatici del passato, richiede un approccio innovativo per reperire in modo sistematico dati storici, documenti e informazioni pertinenti. Il web scraping d’altra parte, con la sua capacità di estrarre dati da pagine web in modo efficiente, si presenta come uno strumento cruciale in questo contesto. Attraverso un'analisi dettagliata, questa tesi esplorerà le sfide tecniche associate all’estrazione di dati dal web inerenti il Retrocomputing, comprese le variazioni nella struttura dei siti web nel corso del tempo, l’analisi della loro struttura e le loro limitazioni. Saranno esaminati anche gli aspetti etici legati al recupero automatico di dati online, considerando la gestione della privacy e la conformità alle normative vigenti. Al fine di fornire un quadro completo, saranno presentati studi di caso e esempi pratici che illustrano l'efficacia del web scraping nella raccolta sistematica di informazioni sui sistemi informatici storici, enfatizzando il suo ruolo non solo come strumento di ricerca, ma anche come mezzo essenziale per la preservazione e la valorizzazione del nostro ricco patrimonio informatico. L’obiettivo è dunque quello di descrivere la pratica del web scraping nel contesto del Retrocomputing, evidenziando le sue implicazioni, sfide e prospettive future.