Model-Based Reinforcement Learning for industrial robotics applications

In industrial robotics settings, pick-and-place tasks are amongst the most common activities that are delegated to robotic manipulators, because of the advantages they bring in terms of costs when performing highly repetitive tasks. Indeed, it is preferred to assign these tasks to a robot than to a human, for many reasons like for example reduced costs and because it allows to free man-hours for more intellectual tasks. An ideal goal would be to have a robotic system capable of handling arbitrary objects. Some of the great challenges in this context are: (i) to guarantee the time efficiency of the task, crucial for handling costs; (ii) programming of the robot, indeed optimal pick and place of arbitrary objects is not a trivial task. When allowed, throwing objects, instead of placing them, has the potential to improve the time-efficiency, as well as to increase the physical reachability of a robotic arm, by exploiting extrinsic dexterity. This can be called pick-and-throw. Unfortunately, when dealing with arbitrary objects, this approach opens a new set of problems on top of those already existing, mainly (i) how to pick the object in an optimal way for throwing; (ii) how to toss an object grasped in a certain configuration. A recent work, Tossingbot, has implemented a whole pick-and-throw task in unstructured settings, using a trial and error approach exploiting unsupervised learning. This approach has excellent performance in terms of time-efficiency and accuracy, but it does come at the cost of having to perform hundreds to thousands of training throws in order to learn the task even with the simplest object, a ball. The objective of this thesis is to explore how to apply Reinforcement Learning techniques to the considered pick-and-throw task with a simple ball, in order to train the robot to reach the same performance as Tossingbot, using minimal training trials. Mainly, we applied a recent Model-Based RL algorithm, MC-PILCO, to train a tossing policy in a simulated environment created from scratch and we validated part of the algorithm on data collected in laboratory sessions. We show some positive and encouraging results obtained with the policy training in simulations, as well as with the real data, moreover, we exploited this particular task to highlight some interesting and crucial aspects of the algorithms itself.

Nel contesto della robotica industriale, le mansioni del tipo pick-and-place sono tra le più comuni ad essere delegate a bracci manipolatori, poiché questi sono molto efficienti e adatti ai lavori ripetitivi. In questi casi, delegare mansioni ai robot presenta diversi vantaggi, tra i quali il risparmio di costo e di tempo uomo. Inoltre, liberare le persone dai compiti più pesanti e ripetitivi permette a quest’ultimi di concentrarsi su incarichi intellettualmenete più rilevanti. Un ostacolo ancora non del tutto superato, in questo ambito, è sicuramente la gestione di oggetti arbitrari, dove le sfide più importanti sono: (i) garantire l’efficienza in termini di tempo della mansione; (ii) la programmazione del robot. Quando possibile, un modo per aumentare l’efficienza temporale dell’attività di pick-and-place è lanciare gli oggetti invece di riporli, in questo caso si può parlare di pick-and-throw. Ciò consente anche di allargare il confine dello spazio raggiungibile del robot, sfruttando la sua cosiddetta manualità estrinseca. Purtroppo, quando gli oggetti manipolati sono arbitrari, questo approccio aggiunge nuovi problemi a quelli già esistenti, in particolare: (i) come raccogliere un oggetto con l’orientamento corretto per poi lanciarlo; (ii) come lanciare un oggetto afferrato in una certa configurazione. In un articolo di recente pubblicazione, viene presentata l’implementazione di una mansione di pick-and-throw completa, in ambiente non strutturato, utilizzando un approccio trial and error che fa uso di unsupervised learning, questo sistema è stato chiamato Tossingbot. Tossingbot ha prestazioni eccellenti sia in termini di efficienza temporale che di accuratezza. Tali risultati, tuttavia, si ottengono al costo di dover effettuare fino a migliaia di prove per imparare l’attività, questo anche considerando l’oggetto manipolato più semplice, ovvero una pallina. L’obiettivo di questa tesi è di indagare come si può applicare il Reinforcement Learning, per insegnare una mansione di pick-and-throw a un braccio manipolatore in maniera efficiente. In particolare, si vuole verificare se è possible raggiungere le stesse performance di Tossingbot nel lancio di una pallina, utilizzando un numero minimo di lanci di allenamento. Nello specifico, è stato utilizzato un recente algoritmo di Model-Based RL, MC-PILCO per allenare una policy di lancio in un ambiente simulato, in più, parte dell’algoritmo è stato validato con dati raccolti in sessioni di laboratorio. Sono stati ottenuti risultati positivi ed incoraggianti sia nell’ambiente simulato che con dati reali, oltre a ciò è stato possibile sfruttare la particolare geometria della mansione per evidenziare alcuni aspetti peculiari dell’algoritmo stesso.