Nowadays, artificial intelligence, more briefly AI, is used in many different fields. Despite their undoubted usefulness, however, they present a rather significant problem that limits their use: in order to act in the most effective way possible, they must be trained to the "specific work" in which they will be used. To do this, they requires enormous amounts of data, which are not always available, because there are no datasets on the desired topic, because there are not enough money to purchase them or because is impossibe to generate them. In these situations, the solution come from "data augmentation techniques, algorithms that allow to generate artificial data to compensate or complete the incomplete or unbalanced datasets and allow to train the AI starting from smaller datasets. In this thesis we are going to explain the implementation techniques for various DA technique, the pros and cons of each model and the steps to follow to choose wich technique implent. Furthermore, we are going to analyse the real case scenario studied by Sunniva in his master thesis at Norwegian University of Science and Technology. For this scenario, we will examine the known issues and we will make an attempt to find the most appropriete ML model to solve them. Finally, we present some code extracts useful for implementing the timeGAN model and some precision metrics for GAN dealing with time series data.
Al giorno d’oggi le intelligenze artificiali, più brevemente IA, vengono utilizzate in molti ambiti diversi. Nonostante la loro indubbia utilità, però, presentano un problema piuttosto rilevante che ne limita l’utilizzo: per poter agire nel modo più efficace possibile, devono essere allenate al ”lavoro” specifico in cui dovranno essere impiegate. Per fare ciò sono necessarie enormi quantità di dati, che non sempre sono disponibili, sia perchè non esistono datasets sull’argomento desiderato, sia perchè non si hanno i fondi necessari per acquistali o non si hanno i mezzi per generarli autonomamente. In queste situazioni vengono in aiuto le tecniche di ”data augmentation”, ovvero algoritmi che permettono di generare dati artificiali per compensare i datasets incompleti e permettere di allenare modelli di intelligenza artificiale anche a partire da database relativamente piccoli. In questa tesi verranno spiegati i concetti e le modalità alla base di alcune tecniche di data augmentation, i loro pro e contro e i passaggi necessari alla scelta di quale modello implementare. Inoltre, verrà analizzato il caso reale studiato da Sunniva nella sua tesi magistrale alla Norwegian University of Science and Technology. Per questo scenario, saranno prese in esame le problematiche presenti e si cercherà di trovare il modello di machine learning più adeguato a risolverle. Infine, verranno presentati degli estratti di codice utili all’implementazione del modello time- GAN e di alcune tecniche per la valutazione della precisione dei modelli GAN per le sequenze temporali.
Aumento dei dati per scopi medici
PIETROGRANDE, SIMONE
2023/2024
Abstract
Nowadays, artificial intelligence, more briefly AI, is used in many different fields. Despite their undoubted usefulness, however, they present a rather significant problem that limits their use: in order to act in the most effective way possible, they must be trained to the "specific work" in which they will be used. To do this, they requires enormous amounts of data, which are not always available, because there are no datasets on the desired topic, because there are not enough money to purchase them or because is impossibe to generate them. In these situations, the solution come from "data augmentation techniques, algorithms that allow to generate artificial data to compensate or complete the incomplete or unbalanced datasets and allow to train the AI starting from smaller datasets. In this thesis we are going to explain the implementation techniques for various DA technique, the pros and cons of each model and the steps to follow to choose wich technique implent. Furthermore, we are going to analyse the real case scenario studied by Sunniva in his master thesis at Norwegian University of Science and Technology. For this scenario, we will examine the known issues and we will make an attempt to find the most appropriete ML model to solve them. Finally, we present some code extracts useful for implementing the timeGAN model and some precision metrics for GAN dealing with time series data.| File | Dimensione | Formato | |
|---|---|---|---|
|
Pietrogrande_Simone.pdf
accesso aperto
Dimensione
4.27 MB
Formato
Adobe PDF
|
4.27 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/76851