The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.

The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.

Exploiting the conservation of probability in simple machine learning models

SINGH, MANJODH
2024/2025

Abstract

The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.
2024
Exploiting the conservation of probability in simple machine learning models
The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.
Machine Learning
Statistical Physics
Markov chain
Boltzmann machines
File in questo prodotto:
File Dimensione Formato  
Singh_Manjodh.pdf

accesso aperto

Dimensione 1.61 MB
Formato Adobe PDF
1.61 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91176