Deep learning has revolutionized various fields, including computer vision, nat- ural language processing, and autonomous systems. However, these powerful deep learning models, especially convolutional neural networks, require a lot of computing power in order to work effectively. This thesis project will explore the role of hardware accelerators specially GPUs, TPUs, FPGAs, and ASICs in enhancing the performance of deep learning models. GPUs are known for their excellent parallel computing power and are widely used in many applications but they suffer from high power consumption and they are expensive. FPGAs provide a balance between good performance capabilities and power consumption, with the added advantage of being configurable, making them suitable for many applications in deep learning. TPUs, developed specifically for tensor operations, offer significant performance improvements for deep learning tasks but they lack flexibility that we need in many deep learning applications. ASICs, designed for specific tasks, provide unrivaled performance and energy efficiency but they are highly limited in terms of flexibility. After providing an overview of these accelerators, focusing on their architectural features, benefits and performance metrics in different applications, the manuscript will present a critical analysis of the various solutions.

Deep learning has revolutionized various fields, including computer vision, nat- ural language processing, and autonomous systems. However, these powerful deep learning models, especially convolutional neural networks, require a lot of computing power in order to work effectively. This thesis project will explore the role of hardware accelerators specially GPUs, TPUs, FPGAs, and ASICs in enhancing the performance of deep learning models. GPUs are known for their excellent parallel computing power and are widely used in many applications but they suffer from high power consumption and they are expensive. FPGAs provide a balance between good performance capabilities and power consumption, with the added advantage of being configurable, making them suitable for many applications in deep learning. TPUs, developed specifically for tensor operations, offer significant performance improvements for deep learning tasks but they lack flexibility that we need in many deep learning applications. ASICs, designed for specific tasks, provide unrivaled performance and energy efficiency but they are highly limited in terms of flexibility. After providing an overview of these accelerators, focusing on their architectural features, benefits and performance metrics in different applications, the manuscript will present a critical analysis of the various solutions.

Accelerating Deep Learning Workloads: a Comparative Study of GPUs, TPUs, FPGAs, and ASICs

MAGHSOUD, AMIRKHASHAYAR
2024/2025

Abstract

Deep learning has revolutionized various fields, including computer vision, nat- ural language processing, and autonomous systems. However, these powerful deep learning models, especially convolutional neural networks, require a lot of computing power in order to work effectively. This thesis project will explore the role of hardware accelerators specially GPUs, TPUs, FPGAs, and ASICs in enhancing the performance of deep learning models. GPUs are known for their excellent parallel computing power and are widely used in many applications but they suffer from high power consumption and they are expensive. FPGAs provide a balance between good performance capabilities and power consumption, with the added advantage of being configurable, making them suitable for many applications in deep learning. TPUs, developed specifically for tensor operations, offer significant performance improvements for deep learning tasks but they lack flexibility that we need in many deep learning applications. ASICs, designed for specific tasks, provide unrivaled performance and energy efficiency but they are highly limited in terms of flexibility. After providing an overview of these accelerators, focusing on their architectural features, benefits and performance metrics in different applications, the manuscript will present a critical analysis of the various solutions.
2024
Accelerating Deep Learning Workloads: a Comparative Study of GPUs, TPUs, FPGAs, and ASICs
Deep learning has revolutionized various fields, including computer vision, nat- ural language processing, and autonomous systems. However, these powerful deep learning models, especially convolutional neural networks, require a lot of computing power in order to work effectively. This thesis project will explore the role of hardware accelerators specially GPUs, TPUs, FPGAs, and ASICs in enhancing the performance of deep learning models. GPUs are known for their excellent parallel computing power and are widely used in many applications but they suffer from high power consumption and they are expensive. FPGAs provide a balance between good performance capabilities and power consumption, with the added advantage of being configurable, making them suitable for many applications in deep learning. TPUs, developed specifically for tensor operations, offer significant performance improvements for deep learning tasks but they lack flexibility that we need in many deep learning applications. ASICs, designed for specific tasks, provide unrivaled performance and energy efficiency but they are highly limited in terms of flexibility. After providing an overview of these accelerators, focusing on their architectural features, benefits and performance metrics in different applications, the manuscript will present a critical analysis of the various solutions.
machine learning
hardware accelerator
deep learning
File in questo prodotto:
File Dimensione Formato  
Maghsoud_Amirkhashayar.pdf

accesso riservato

Dimensione 3.77 MB
Formato Adobe PDF
3.77 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/82742