The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.
The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.
Exploiting the conservation of probability in simple machine learning models
SINGH, MANJODH
2024/2025
Abstract
The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.| File | Dimensione | Formato | |
|---|---|---|---|
|
Singh_Manjodh.pdf
accesso aperto
Dimensione
1.61 MB
Formato
Adobe PDF
|
1.61 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91176