Autoencoder-based characterization of QCD multijet background at the LHC

A proof of principle for the application of autoencoders in encoding high-dimensional multijet data is presented. A simulation with events containing four b-quark QCD jets is used to train the autoencoder. The reconstruction of events after a reduced dimension step is attempted. We also demonstrate the capabilities of this autoencoder to generate new artificial events. The characterization of QCD multijet events plays a crucial role in estimating background samples for processes involving 4b-jets final states, including the production of Higgs boson pairs. Such a process is accessible at the LHC and may be observed in a combined search by the end of the high-luminosity run of the LHC. The 4b channel is expected to contribute significantly to this combined search, and it is the goal of this work to showcase the ability of autoencoders to model such multijet events in a reduced-dimension space. This approach has the potential to yield embedded metrics for effective background-signal discrimination. A deep learning architecture, largely inspired by an ongoing analysis within the CMS collaboration, is built as the autoencoder skeleton. After the description of the architecture, loss function, and training schedule, reconstruction of events is done successfully. To quantify the accuracy of this reconstruction, we compute the Wasserstein distance between several kinematic variables of interest, along with a figure of merit to measure the similarity between the reconstructed dijet and quadjet invariant masses. Furthermore, our approach demonstrates the capability to generate an arbitrarily large number of events from the encoded space, showing promising agreement with the reconstructed samples. This study not only underscores the applicability of autoencoders in high-energy physics but also offers insights into their potential contributions to future experimental analyses within the field.

Autoencoder-based characterization of QCD multijet background at the LHC

MARIÑO VILLADAMIGO, JAVIER

2022/2023

Abstract

A proof of principle for the application of autoencoders in encoding high-dimensional multijet data is presented. A simulation with events containing four b-quark QCD jets is used to train the autoencoder. The reconstruction of events after a reduced dimension step is attempted. We also demonstrate the capabilities of this autoencoder to generate new artificial events. The characterization of QCD multijet events plays a crucial role in estimating background samples for processes involving 4b-jets final states, including the production of Higgs boson pairs. Such a process is accessible at the LHC and may be observed in a combined search by the end of the high-luminosity run of the LHC. The 4b channel is expected to contribute significantly to this combined search, and it is the goal of this work to showcase the ability of autoencoders to model such multijet events in a reduced-dimension space. This approach has the potential to yield embedded metrics for effective background-signal discrimination. A deep learning architecture, largely inspired by an ongoing analysis within the CMS collaboration, is built as the autoencoder skeleton. After the description of the architecture, loss function, and training schedule, reconstruction of events is done successfully. To quantify the accuracy of this reconstruction, we compute the Wasserstein distance between several kinematic variables of interest, along with a figure of merit to measure the similarity between the reconstructed dijet and quadjet invariant masses. Furthermore, our approach demonstrates the capability to generate an arbitrarily large number of events from the encoded space, showing promising agreement with the reconstructed samples. This study not only underscores the applicability of autoencoders in high-energy physics but also offers insights into their potential contributions to future experimental analyses within the field.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Fisica e Astronomia "Galileo Galilei" - DFA
			
	Corso di studio
	
				PHYSICS Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Autoencoder-based characterization of QCD multijet background at the LHC
			
	Abstract in italiano
	
				A proof of principle for the application of autoencoders in encoding high-dimensional multijet data is presented. A simulation with events containing four b-quark QCD jets is used to train the autoencoder. The reconstruction of events after a reduced dimension step is attempted. We also demonstrate the capabilities of this autoencoder to generate new artificial events.

The characterization of QCD multijet events plays a crucial role in estimating background samples for processes involving 4b-jets final states, including the production of Higgs boson pairs. Such a process is accessible at the LHC and may be observed in a combined search by the end of the high-luminosity run of the LHC. The 4b channel is expected to contribute significantly to this combined search, and it is the goal of this work to showcase the ability of autoencoders to model such multijet events in a reduced-dimension space. This approach has the potential to yield embedded metrics for effective background-signal discrimination. A deep learning architecture, largely inspired by an ongoing analysis within the CMS collaboration, is built as the autoencoder skeleton. After the description of the architecture, loss function, and training schedule, reconstruction of events is done successfully.

To quantify the accuracy of this reconstruction, we compute the Wasserstein distance between several kinematic variables of interest, along with a figure of merit to measure the similarity between the reconstructed dijet and quadjet invariant masses. Furthermore, our approach demonstrates the capability to generate an arbitrarily large number of events from the encoded space, showing promising agreement with the reconstructed samples. This study not only underscores the applicability of autoencoders in high-energy physics but also offers insights into their potential contributions to future experimental analyses within the field.
			
	Parola chiave
	
				Autoencoder
QCD Background
Machine-Learning
LHC
Jets
			
	Relatore
	
				DORIGO, TOMMASO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
MarinoVilladamigo_Javier.pdf accesso aperto Dimensione 14.24 MB Formato Adobe PDF Visualizza/Apri	14.24 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/52998