Exploiting the conservation of probability in simple machine learning models

The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.

Exploiting the conservation of probability in simple machine learning models

SINGH, MANJODH

2024/2025

Abstract

The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Fisica e Astronomia "Galileo Galilei" - DFA
			
	Corso di studio
	
				PHYSICS OF DATA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Exploiting the conservation of probability in simple machine learning models
			
	Abstract in italiano
	
				The thesis investigates the development of simple unsupervised machine learning models with two layers. The first hidden layer, or coding, of low dimension is connected to the data layer. The model jumps from the hidden layer to the visible one, preserving probability, and is therefore simple and easy to understand. The proposed models, referred to as Markov Chain Machines (MCMs), build upon and generalize the structure of Restricted Boltzmann Machines (RBMs) by decoupling the latent-state distribution from the conditional distribution over visible states. This disentanglement improves interpretability, optimizes latent space utilization, and facilitates more stable training dynamics. The model represents essentially a rediscovery of Sigmoid Belief Networks. We provide a mathematical formulation of MCMs, their training procedure, and initialization strategies, and explore two novel extensions: entropy-based merging and recursive splitting. Numerical experiments on the MNIST dataset show that MCMs outperform RBMs in convergence speed, likelihood maximization, and hidden unit utilization, while producing smoother likelihood landscapes and more interpretable weights. The findings suggest promising applications of MCMs in clustering, generative modeling, and structured representation learning.
			
	Parola chiave
	
				Machine Learning
Statistical Physics
Markov chain
Boltzmann machines
			
	Relatore
	
				BAIESI, MARCO
			
	Correlatore
	
				ROSSO, ALBERTO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Singh_Manjodh.pdf accesso aperto Dimensione 1.61 MB Formato Adobe PDF Visualizza/Apri	1.61 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91176