Thanks to the popularity and effectiveness of machine learning, the computational requirements for its development have increased beyond the limits of conventional devices. Because of this, in recent years, in order to speed up the training and inference processes of deep neural networks, a new hardware accelerator, the tensor core unit (TCU), was introduced, allowing the computation time of matrix multiplication operations to be reduced. The aim of this work is to exploit the capabilities of tensor cores to speed up a different problem: dimensionality reduction. By making use of TCUs and certain matrices’ properties, first introduced by William B. Johnson and Joram Lindenstrauss, we are able to embed a set of points having high dimensionality into a lower dimensional space with high-quality results. Throughout this paper, we will introduce the basic concepts of dimensionality reduction and explain the construction of the Johnson-Lindenstrauss matrices used in our reduction method in detail, as well as the theory involved. Following the description of Nvidia tensor cores and the Volta architecture, we will develop a number of dimensionality reduction algorithms. After testing their CUDA implementation on the Nvidia Tesla V100 GPU, we will extensively study their effectiveness and performance, both in terms of computation time and quality of results.

Thanks to the popularity and effectiveness of machine learning, the computational requirements for its development have increased beyond the limits of conventional devices. Because of this, in recent years, in order to speed up the training and inference processes of deep neural networks, a new hardware accelerator, the tensor core unit (TCU), was introduced, allowing the computation time of matrix multiplication operations to be reduced. The aim of this work is to exploit the capabilities of tensor cores to speed up a different problem: dimensionality reduction. By making use of TCUs and certain matrices’ properties, first introduced by William B. Johnson and Joram Lindenstrauss, we are able to embed a set of points having high dimensionality into a lower dimensional space with high-quality results. Throughout this paper, we will introduce the basic concepts of dimensionality reduction and explain the construction of the Johnson-Lindenstrauss matrices used in our reduction method in detail, as well as the theory involved. Following the description of Nvidia tensor cores and the Volta architecture, we will develop a number of dimensionality reduction algorithms. After testing their CUDA implementation on the Nvidia Tesla V100 GPU, we will extensively study their effectiveness and performance, both in terms of computation time and quality of results.

Dimensionality Reduction with Nvidia Tensor Cores

BALZAN, PIETRO
2021/2022

Abstract

Thanks to the popularity and effectiveness of machine learning, the computational requirements for its development have increased beyond the limits of conventional devices. Because of this, in recent years, in order to speed up the training and inference processes of deep neural networks, a new hardware accelerator, the tensor core unit (TCU), was introduced, allowing the computation time of matrix multiplication operations to be reduced. The aim of this work is to exploit the capabilities of tensor cores to speed up a different problem: dimensionality reduction. By making use of TCUs and certain matrices’ properties, first introduced by William B. Johnson and Joram Lindenstrauss, we are able to embed a set of points having high dimensionality into a lower dimensional space with high-quality results. Throughout this paper, we will introduce the basic concepts of dimensionality reduction and explain the construction of the Johnson-Lindenstrauss matrices used in our reduction method in detail, as well as the theory involved. Following the description of Nvidia tensor cores and the Volta architecture, we will develop a number of dimensionality reduction algorithms. After testing their CUDA implementation on the Nvidia Tesla V100 GPU, we will extensively study their effectiveness and performance, both in terms of computation time and quality of results.
2021
Dimensionality Reduction with Nvidia Tensor Cores
Thanks to the popularity and effectiveness of machine learning, the computational requirements for its development have increased beyond the limits of conventional devices. Because of this, in recent years, in order to speed up the training and inference processes of deep neural networks, a new hardware accelerator, the tensor core unit (TCU), was introduced, allowing the computation time of matrix multiplication operations to be reduced. The aim of this work is to exploit the capabilities of tensor cores to speed up a different problem: dimensionality reduction. By making use of TCUs and certain matrices’ properties, first introduced by William B. Johnson and Joram Lindenstrauss, we are able to embed a set of points having high dimensionality into a lower dimensional space with high-quality results. Throughout this paper, we will introduce the basic concepts of dimensionality reduction and explain the construction of the Johnson-Lindenstrauss matrices used in our reduction method in detail, as well as the theory involved. Following the description of Nvidia tensor cores and the Volta architecture, we will develop a number of dimensionality reduction algorithms. After testing their CUDA implementation on the Nvidia Tesla V100 GPU, we will extensively study their effectiveness and performance, both in terms of computation time and quality of results.
dimensionality
reduction
nvidia
tensor core
gpu
File in questo prodotto:
File Dimensione Formato  
Balzan_Pietro.pdf

accesso aperto

Dimensione 4.36 MB
Formato Adobe PDF
4.36 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/29239