Evaluation of the Performance of the Alternating Direction Method of Multipliers in Artificial Neural Networks

This thesis presents a comprehensive evaluation of the Alternating Direction Method of Multipliers (ADMM) algorithm for neural network optimization, comparing its performance against traditional gradient-based methods, including Gradient Descent (GD) and Stochastic Gradient Descent (SGD). While gradient-based methods dominate deep learning optimization, their limitations in handling complex loss landscapes and distributed computing motivate the exploration of alternative approaches. ADMM, originally developed for convex optimization problems, offers a promising alternative by breaking down the optimization problem into simpler subproblems. Through systematic experimentation on the MNIST dataset, this study evaluates ADMM in three key areas: computational efficiency, convergence behavior, and scalability based on its parameters. Results show that ADMM achieves significantly faster per-epoch processing times—approximately three times faster than GD and 20 times faster than SGD—while maintaining reasonable, though slightly lower, final accuracy (82% compared to 96% for gradient-based methods). Further analysis highlights that ADMM's performance is highly dependent on hyperparameter settings, with optimal values identified for the penalty parameter (BETA = 1) and activation constraint weight (GAMMA = 1-5). The choice of initialization method also plays a crucial role, with He and Xavier initialization yielding the best results. ADMM’s scaling properties indicate improved performance when distributed across four parallel processes, demonstrating its potential for large-scale distributed training. The trade-off between computational efficiency and final accuracy makes ADMM a valuable alternative optimization approach in scenarios where processing speed is prioritized over achieving state-of-the-art accuracy, or where distributed computing resources are available but communication overhead must be minimized. This thesis contributes to a broader understanding of neural network optimization beyond gradient-based methods and provides practical guidelines for implementing ADMM in neural network training. It also opens avenues for hybrid approaches that could combine ADMM’s computational advantages with the accuracy of gradient-based methods.

Evaluation of the Performance of the Alternating Direction Method of Multipliers in Artificial Neural Networks

MEDA, ERGYS

2024/2025

Abstract

This thesis presents a comprehensive evaluation of the Alternating Direction Method of Multipliers (ADMM) algorithm for neural network optimization, comparing its performance against traditional gradient-based methods, including Gradient Descent (GD) and Stochastic Gradient Descent (SGD). While gradient-based methods dominate deep learning optimization, their limitations in handling complex loss landscapes and distributed computing motivate the exploration of alternative approaches. ADMM, originally developed for convex optimization problems, offers a promising alternative by breaking down the optimization problem into simpler subproblems. Through systematic experimentation on the MNIST dataset, this study evaluates ADMM in three key areas: computational efficiency, convergence behavior, and scalability based on its parameters. Results show that ADMM achieves significantly faster per-epoch processing times—approximately three times faster than GD and 20 times faster than SGD—while maintaining reasonable, though slightly lower, final accuracy (82% compared to 96% for gradient-based methods). Further analysis highlights that ADMM's performance is highly dependent on hyperparameter settings, with optimal values identified for the penalty parameter (BETA = 1) and activation constraint weight (GAMMA = 1-5). The choice of initialization method also plays a crucial role, with He and Xavier initialization yielding the best results. ADMM’s scaling properties indicate improved performance when distributed across four parallel processes, demonstrating its potential for large-scale distributed training. The trade-off between computational efficiency and final accuracy makes ADMM a valuable alternative optimization approach in scenarios where processing speed is prioritized over achieving state-of-the-art accuracy, or where distributed computing resources are available but communication overhead must be minimized. This thesis contributes to a broader understanding of neural network optimization beyond gradient-based methods and provides practical guidelines for implementing ADMM in neural network training. It also opens avenues for hybrid approaches that could combine ADMM’s computational advantages with the accuracy of gradient-based methods.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Evaluation of the Performance of the Alternating Direction Method of Multipliers in Artificial Neural Networks
			
	Abstract in italiano
	
				This thesis presents a comprehensive evaluation of the Alternating Direction Method of Multipliers (ADMM) algorithm for neural network optimization, comparing its performance against traditional gradient-based methods, including Gradient Descent (GD) and Stochastic Gradient Descent (SGD). While gradient-based methods dominate deep learning optimization, their limitations in handling complex loss landscapes and distributed computing motivate the exploration of alternative approaches. ADMM, originally developed for convex optimization problems, offers a promising alternative by breaking down the optimization problem into simpler subproblems.
Through systematic experimentation on the MNIST dataset, this study evaluates ADMM in three key areas: computational efficiency, convergence behavior, and scalability based on its parameters. Results show that ADMM achieves significantly faster per-epoch processing times—approximately three times faster than GD and 20 times faster than SGD—while maintaining reasonable, though slightly lower, final accuracy (82% compared to 96% for gradient-based methods). Further analysis highlights that ADMM's performance is highly dependent on hyperparameter settings, with optimal values identified for the penalty parameter (BETA = 1) and activation constraint weight (GAMMA = 1-5). The choice of initialization method also plays a crucial role, with He and Xavier initialization yielding the best results. ADMM’s scaling properties indicate improved performance when distributed across four parallel processes, demonstrating its potential for large-scale distributed training. The trade-off between computational efficiency and final accuracy makes ADMM a valuable alternative optimization approach in scenarios where processing speed is prioritized over achieving state-of-the-art accuracy, or where distributed computing resources are available but communication overhead must be minimized.
This thesis contributes to a broader understanding of neural network optimization beyond gradient-based methods and provides practical guidelines for implementing ADMM in neural network training. It also opens avenues for hybrid approaches that could combine ADMM’s computational advantages with the accuracy of gradient-based methods.
			
	Parola chiave
	
				ADMM
Optimization Methods
Neural Networks
Closed form solution
			
	Relatore
	
				PEGORARO, JACOPO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Meda_Ergys.pdf accesso aperto Dimensione 4.13 MB Formato Adobe PDF Visualizza/Apri	4.13 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/83034