Bilevel optimization presents significant challenges, particularly in large-scale problems, and binary hyperparameter optimization introduces additional complexities due to the combinatorial nature of the problem. For example, the binary constraint makes the objective function non-differentiable with respect to these binary variables, which means gradient-based methods cannot be applied directly. Additionally, the exponential increase in the number of potential configurations makes the problem computationally expensive to solve. Traditional relaxation and rounding methods often lead to inconsistent results. In this thesis, we propose a novel algorithm that combines the Relax and Penalize framework with the Zeroth-Order Frank-Wolfe algorithm, effectively handling binary variables with theoretical guarantees while maintaining computational efficiency. Our approach combines zeroth-order hypergradient estimates with a penalty approach for handling the combinatorial part, thus reducing the heavy computation typically required in gradient-based bilevel optimization methods. In numerical experiments, we evaluate the effectiveness of our approach through tests on a data hyper-cleaning task using synthetic and MNIST datasets under varying noise conditions. In particular, we study how the maximum number of iterations in the lower-level problem affects the algorithm's efficiency and effectiveness. Looking ahead, future work will focus on exploring the impact of additional algorithmic parameters, such as the penalty parameter, decay factors, and the number of random directions used for gradient estimation. These investigations will provide further insights into how our method can be implemented more efficiently in practical applications.

Bi-level Optimization with Penalty Method for Binary Hyperparameter Tuning in Data Hyper-cleaning

ZHAO, DANDAN
2023/2024

Abstract

Bilevel optimization presents significant challenges, particularly in large-scale problems, and binary hyperparameter optimization introduces additional complexities due to the combinatorial nature of the problem. For example, the binary constraint makes the objective function non-differentiable with respect to these binary variables, which means gradient-based methods cannot be applied directly. Additionally, the exponential increase in the number of potential configurations makes the problem computationally expensive to solve. Traditional relaxation and rounding methods often lead to inconsistent results. In this thesis, we propose a novel algorithm that combines the Relax and Penalize framework with the Zeroth-Order Frank-Wolfe algorithm, effectively handling binary variables with theoretical guarantees while maintaining computational efficiency. Our approach combines zeroth-order hypergradient estimates with a penalty approach for handling the combinatorial part, thus reducing the heavy computation typically required in gradient-based bilevel optimization methods. In numerical experiments, we evaluate the effectiveness of our approach through tests on a data hyper-cleaning task using synthetic and MNIST datasets under varying noise conditions. In particular, we study how the maximum number of iterations in the lower-level problem affects the algorithm's efficiency and effectiveness. Looking ahead, future work will focus on exploring the impact of additional algorithmic parameters, such as the penalty parameter, decay factors, and the number of random directions used for gradient estimation. These investigations will provide further insights into how our method can be implemented more efficiently in practical applications.
2023
Bi-level Optimization with Penalty Method for Binary Hyperparameter Tuning in Data Hyper-cleaning
Bilevel Optimization
Penalty Method
Data Hyper-cleaning
Zeroth Order
File in questo prodotto:
File Dimensione Formato  
Zhao_Dandan.pdf

accesso aperto

Dimensione 793.63 kB
Formato Adobe PDF
793.63 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/71041