This thesis presents a comprehensive evaluation of the Alternating Direction Method of Multipliers (ADMM) algorithm for neural network optimization, comparing its performance against traditional gradient-based methods, including Gradient Descent (GD) and Stochastic Gradient Descent (SGD). While gradient-based methods dominate deep learning optimization, their limitations in handling complex loss landscapes and distributed computing motivate the exploration of alternative approaches. ADMM, originally developed for convex optimization problems, offers a promising alternative by breaking down the optimization problem into simpler subproblems. Through systematic experimentation on the MNIST dataset, this study evaluates ADMM in three key areas: computational efficiency, convergence behavior, and scalability based on its parameters. Results show that ADMM achieves significantly faster per-epoch processing times—approximately three times faster than GD and 20 times faster than SGD—while maintaining reasonable, though slightly lower, final accuracy (82% compared to 96% for gradient-based methods). Further analysis highlights that ADMM's performance is highly dependent on hyperparameter settings, with optimal values identified for the penalty parameter (BETA = 1) and activation constraint weight (GAMMA = 1-5). The choice of initialization method also plays a crucial role, with He and Xavier initialization yielding the best results. ADMM’s scaling properties indicate improved performance when distributed across four parallel processes, demonstrating its potential for large-scale distributed training. The trade-off between computational efficiency and final accuracy makes ADMM a valuable alternative optimization approach in scenarios where processing speed is prioritized over achieving state-of-the-art accuracy, or where distributed computing resources are available but communication overhead must be minimized. This thesis contributes to a broader understanding of neural network optimization beyond gradient-based methods and provides practical guidelines for implementing ADMM in neural network training. It also opens avenues for hybrid approaches that could combine ADMM’s computational advantages with the accuracy of gradient-based methods.

This thesis presents a comprehensive evaluation of the Alternating Direction Method of Multipliers (ADMM) algorithm for neural network optimization, comparing its performance against traditional gradient-based methods, including Gradient Descent (GD) and Stochastic Gradient Descent (SGD). While gradient-based methods dominate deep learning optimization, their limitations in handling complex loss landscapes and distributed computing motivate the exploration of alternative approaches. ADMM, originally developed for convex optimization problems, offers a promising alternative by breaking down the optimization problem into simpler subproblems. Through systematic experimentation on the MNIST dataset, this study evaluates ADMM in three key areas: computational efficiency, convergence behavior, and scalability based on its parameters. Results show that ADMM achieves significantly faster per-epoch processing times—approximately three times faster than GD and 20 times faster than SGD—while maintaining reasonable, though slightly lower, final accuracy (82% compared to 96% for gradient-based methods). Further analysis highlights that ADMM's performance is highly dependent on hyperparameter settings, with optimal values identified for the penalty parameter (BETA = 1) and activation constraint weight (GAMMA = 1-5). The choice of initialization method also plays a crucial role, with He and Xavier initialization yielding the best results. ADMM’s scaling properties indicate improved performance when distributed across four parallel processes, demonstrating its potential for large-scale distributed training. The trade-off between computational efficiency and final accuracy makes ADMM a valuable alternative optimization approach in scenarios where processing speed is prioritized over achieving state-of-the-art accuracy, or where distributed computing resources are available but communication overhead must be minimized. This thesis contributes to a broader understanding of neural network optimization beyond gradient-based methods and provides practical guidelines for implementing ADMM in neural network training. It also opens avenues for hybrid approaches that could combine ADMM’s computational advantages with the accuracy of gradient-based methods.

Evaluation of the Performance of the Alternating Direction Method of Multipliers in Artificial Neural Networks

MEDA, ERGYS
2024/2025

Abstract

This thesis presents a comprehensive evaluation of the Alternating Direction Method of Multipliers (ADMM) algorithm for neural network optimization, comparing its performance against traditional gradient-based methods, including Gradient Descent (GD) and Stochastic Gradient Descent (SGD). While gradient-based methods dominate deep learning optimization, their limitations in handling complex loss landscapes and distributed computing motivate the exploration of alternative approaches. ADMM, originally developed for convex optimization problems, offers a promising alternative by breaking down the optimization problem into simpler subproblems. Through systematic experimentation on the MNIST dataset, this study evaluates ADMM in three key areas: computational efficiency, convergence behavior, and scalability based on its parameters. Results show that ADMM achieves significantly faster per-epoch processing times—approximately three times faster than GD and 20 times faster than SGD—while maintaining reasonable, though slightly lower, final accuracy (82% compared to 96% for gradient-based methods). Further analysis highlights that ADMM's performance is highly dependent on hyperparameter settings, with optimal values identified for the penalty parameter (BETA = 1) and activation constraint weight (GAMMA = 1-5). The choice of initialization method also plays a crucial role, with He and Xavier initialization yielding the best results. ADMM’s scaling properties indicate improved performance when distributed across four parallel processes, demonstrating its potential for large-scale distributed training. The trade-off between computational efficiency and final accuracy makes ADMM a valuable alternative optimization approach in scenarios where processing speed is prioritized over achieving state-of-the-art accuracy, or where distributed computing resources are available but communication overhead must be minimized. This thesis contributes to a broader understanding of neural network optimization beyond gradient-based methods and provides practical guidelines for implementing ADMM in neural network training. It also opens avenues for hybrid approaches that could combine ADMM’s computational advantages with the accuracy of gradient-based methods.
2024
Evaluation of the Performance of the Alternating Direction Method of Multipliers in Artificial Neural Networks
This thesis presents a comprehensive evaluation of the Alternating Direction Method of Multipliers (ADMM) algorithm for neural network optimization, comparing its performance against traditional gradient-based methods, including Gradient Descent (GD) and Stochastic Gradient Descent (SGD). While gradient-based methods dominate deep learning optimization, their limitations in handling complex loss landscapes and distributed computing motivate the exploration of alternative approaches. ADMM, originally developed for convex optimization problems, offers a promising alternative by breaking down the optimization problem into simpler subproblems. Through systematic experimentation on the MNIST dataset, this study evaluates ADMM in three key areas: computational efficiency, convergence behavior, and scalability based on its parameters. Results show that ADMM achieves significantly faster per-epoch processing times—approximately three times faster than GD and 20 times faster than SGD—while maintaining reasonable, though slightly lower, final accuracy (82% compared to 96% for gradient-based methods). Further analysis highlights that ADMM's performance is highly dependent on hyperparameter settings, with optimal values identified for the penalty parameter (BETA = 1) and activation constraint weight (GAMMA = 1-5). The choice of initialization method also plays a crucial role, with He and Xavier initialization yielding the best results. ADMM’s scaling properties indicate improved performance when distributed across four parallel processes, demonstrating its potential for large-scale distributed training. The trade-off between computational efficiency and final accuracy makes ADMM a valuable alternative optimization approach in scenarios where processing speed is prioritized over achieving state-of-the-art accuracy, or where distributed computing resources are available but communication overhead must be minimized. This thesis contributes to a broader understanding of neural network optimization beyond gradient-based methods and provides practical guidelines for implementing ADMM in neural network training. It also opens avenues for hybrid approaches that could combine ADMM’s computational advantages with the accuracy of gradient-based methods.
ADMM
Optimization Methods
Neural Networks
Closed form solution
File in questo prodotto:
File Dimensione Formato  
Meda_Ergys.pdf

accesso aperto

Dimensione 4.13 MB
Formato Adobe PDF
4.13 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/83034