Relationships between the biological brain and computational models have emerged with the advent of neural networks. Hierarchical generative models like the Iterative Deep Belief Network (iDBN) enables the study of how daily-life tasks are processed and represented in neural networks trained with biologically plausible Hebbian-like learning. This work proposes experiments on numerical cognition datasets and assesses the ability of the iDBN in a multimodal approach. The multimodal framework allows the network to process additional information compared to the unimodal case, which takes only images as sensory input. The multimodal approach includes bi-modal label-image, bi-modal text-image, and three-modal text-label-image configurations. The architectures are tested in terms of reconstruction, generation and investigation of the joint hidden latent representation, first on the classic MNIST dataset and then on the Stoianov-Zorzi numerosity dataset. Results show that multimodal iDBNs can efficiently reconstruct labels and text, though image reconstruction is more challenging, especially with the numerosity dataset. Generation tasks are more effective when a complex modality is generated from a simpler one, with label generation being the most challenging generation task. Furthermore, image generation exhibits limited pattern variability due to some attractors on which the model gets stuck. Joint-hidden layer analyses demonstrated that multimodal architectures effectively encode latent numerical information, with t-SNE analyses showing the distinctiveness of numerosity information and Poisson regression analyses indicating superior performance of the multimodal approaches compared to unimodal ones.
Relationships between the biological brain and computational models have emerged with the advent of neural networks. Hierarchical generative models like the Iterative Deep Belief Network (iDBN) enables the study of how daily-life tasks are processed and represented in neural networks trained with biologically plausible Hebbian-like learning. This work proposes experiments on numerical cognition datasets and assesses the ability of the iDBN in a multimodal approach. The multimodal framework allows the network to process additional information compared to the unimodal case, which takes only images as sensory input. The multimodal approach includes bi-modal label-image, bi-modal text-image, and three-modal text-label-image configurations. The architectures are tested in terms of reconstruction, generation and investigation of the joint hidden latent representation, first on the classic MNIST dataset and then on the Stoianov-Zorzi numerosity dataset. Results show that multimodal iDBNs can efficiently reconstruct labels and text, though image reconstruction is more challenging, especially with the numerosity dataset. Generation tasks are more effective when a complex modality is generated from a simpler one, with label generation being the most challenging generation task. Furthermore, image generation exhibits limited pattern variability due to some attractors on which the model gets stuck. Joint-hidden layer analyses demonstrated that multimodal architectures effectively encode latent numerical information, with t-SNE analyses showing the distinctiveness of numerosity information and Poisson regression analyses indicating superior performance of the multimodal approaches compared to unimodal ones.
Multimodal iDBN for numerical cognition tasks
RENNA, PIETRO
2023/2024
Abstract
Relationships between the biological brain and computational models have emerged with the advent of neural networks. Hierarchical generative models like the Iterative Deep Belief Network (iDBN) enables the study of how daily-life tasks are processed and represented in neural networks trained with biologically plausible Hebbian-like learning. This work proposes experiments on numerical cognition datasets and assesses the ability of the iDBN in a multimodal approach. The multimodal framework allows the network to process additional information compared to the unimodal case, which takes only images as sensory input. The multimodal approach includes bi-modal label-image, bi-modal text-image, and three-modal text-label-image configurations. The architectures are tested in terms of reconstruction, generation and investigation of the joint hidden latent representation, first on the classic MNIST dataset and then on the Stoianov-Zorzi numerosity dataset. Results show that multimodal iDBNs can efficiently reconstruct labels and text, though image reconstruction is more challenging, especially with the numerosity dataset. Generation tasks are more effective when a complex modality is generated from a simpler one, with label generation being the most challenging generation task. Furthermore, image generation exhibits limited pattern variability due to some attractors on which the model gets stuck. Joint-hidden layer analyses demonstrated that multimodal architectures effectively encode latent numerical information, with t-SNE analyses showing the distinctiveness of numerosity information and Poisson regression analyses indicating superior performance of the multimodal approaches compared to unimodal ones.File | Dimensione | Formato | |
---|---|---|---|
Renna_Pietro.pdf
accesso aperto
Dimensione
7.85 MB
Formato
Adobe PDF
|
7.85 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/68388