In the last years, the field of AI exploded leading many institutions, companies, and researchers to invest vast amounts of money to improve models that either predict a human behavior, imitate it, or help the user to extrapolate information from a dataset that otherwise would have been hidden. Today the computational part that powers AI algorithms is shifting form huge mainframes to tiny, embedded devices, “on the edge” with respect to data centers, raising problems in terms of required power and speed. This thesis reports and implements various hardware (and software) optimizations, both adapted from the literature and original, with the aim of enabling a proprietary core, which is supposed to work closely with sensors, hence, with a small form factor, to run neural networks efficiently and enhancing the inference time of the most common layers. The result of this research is the implementation of optimized hardware accelerators that use quantized values to shrink the amount of data and parallelize as much as possible, also implementing in the firmware approximated versions of some useful well-known functions to prevent heavy computations during the inference process.
Instruction set optimizations for Edge AI computing on a low-area microprocessor
PAFFI, LEONARDO
2021/2022
Abstract
In the last years, the field of AI exploded leading many institutions, companies, and researchers to invest vast amounts of money to improve models that either predict a human behavior, imitate it, or help the user to extrapolate information from a dataset that otherwise would have been hidden. Today the computational part that powers AI algorithms is shifting form huge mainframes to tiny, embedded devices, “on the edge” with respect to data centers, raising problems in terms of required power and speed. This thesis reports and implements various hardware (and software) optimizations, both adapted from the literature and original, with the aim of enabling a proprietary core, which is supposed to work closely with sensors, hence, with a small form factor, to run neural networks efficiently and enhancing the inference time of the most common layers. The result of this research is the implementation of optimized hardware accelerators that use quantized values to shrink the amount of data and parallelize as much as possible, also implementing in the firmware approximated versions of some useful well-known functions to prevent heavy computations during the inference process.File | Dimensione | Formato | |
---|---|---|---|
Paffi_Leonardo.pdf
Open Access dal 06/06/2024
Dimensione
3.52 MB
Formato
Adobe PDF
|
3.52 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/39697