Deep learning has revolutionized various fields, including computer vision, nat- ural language processing, and autonomous systems. However, these powerful deep learning models, especially convolutional neural networks, require a lot of computing power in order to work effectively. This thesis project will explore the role of hardware accelerators specially GPUs, TPUs, FPGAs, and ASICs in enhancing the performance of deep learning models. GPUs are known for their excellent parallel computing power and are widely used in many applications but they suffer from high power consumption and they are expensive. FPGAs provide a balance between good performance capabilities and power consumption, with the added advantage of being configurable, making them suitable for many applications in deep learning. TPUs, developed specifically for tensor operations, offer significant performance improvements for deep learning tasks but they lack flexibility that we need in many deep learning applications. ASICs, designed for specific tasks, provide unrivaled performance and energy efficiency but they are highly limited in terms of flexibility. After providing an overview of these accelerators, focusing on their architectural features, benefits and performance metrics in different applications, the manuscript will present a critical analysis of the various solutions.
Deep learning has revolutionized various fields, including computer vision, nat- ural language processing, and autonomous systems. However, these powerful deep learning models, especially convolutional neural networks, require a lot of computing power in order to work effectively. This thesis project will explore the role of hardware accelerators specially GPUs, TPUs, FPGAs, and ASICs in enhancing the performance of deep learning models. GPUs are known for their excellent parallel computing power and are widely used in many applications but they suffer from high power consumption and they are expensive. FPGAs provide a balance between good performance capabilities and power consumption, with the added advantage of being configurable, making them suitable for many applications in deep learning. TPUs, developed specifically for tensor operations, offer significant performance improvements for deep learning tasks but they lack flexibility that we need in many deep learning applications. ASICs, designed for specific tasks, provide unrivaled performance and energy efficiency but they are highly limited in terms of flexibility. After providing an overview of these accelerators, focusing on their architectural features, benefits and performance metrics in different applications, the manuscript will present a critical analysis of the various solutions.
Accelerating Deep Learning Workloads: a Comparative Study of GPUs, TPUs, FPGAs, and ASICs
MAGHSOUD, AMIRKHASHAYAR
2024/2025
Abstract
Deep learning has revolutionized various fields, including computer vision, nat- ural language processing, and autonomous systems. However, these powerful deep learning models, especially convolutional neural networks, require a lot of computing power in order to work effectively. This thesis project will explore the role of hardware accelerators specially GPUs, TPUs, FPGAs, and ASICs in enhancing the performance of deep learning models. GPUs are known for their excellent parallel computing power and are widely used in many applications but they suffer from high power consumption and they are expensive. FPGAs provide a balance between good performance capabilities and power consumption, with the added advantage of being configurable, making them suitable for many applications in deep learning. TPUs, developed specifically for tensor operations, offer significant performance improvements for deep learning tasks but they lack flexibility that we need in many deep learning applications. ASICs, designed for specific tasks, provide unrivaled performance and energy efficiency but they are highly limited in terms of flexibility. After providing an overview of these accelerators, focusing on their architectural features, benefits and performance metrics in different applications, the manuscript will present a critical analysis of the various solutions.File | Dimensione | Formato | |
---|---|---|---|
Maghsoud_Amirkhashayar.pdf
accesso riservato
Dimensione
3.77 MB
Formato
Adobe PDF
|
3.77 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/82742