A proof of principle for the application of autoencoders in encoding high-dimensional multijet data is presented. A simulation with events containing four b-quark QCD jets is used to train the autoencoder. The reconstruction of events after a reduced dimension step is attempted. We also demonstrate the capabilities of this autoencoder to generate new artificial events. The characterization of QCD multijet events plays a crucial role in estimating background samples for processes involving 4b-jets final states, including the production of Higgs boson pairs. Such a process is accessible at the LHC and may be observed in a combined search by the end of the high-luminosity run of the LHC. The 4b channel is expected to contribute significantly to this combined search, and it is the goal of this work to showcase the ability of autoencoders to model such multijet events in a reduced-dimension space. This approach has the potential to yield embedded metrics for effective background-signal discrimination. A deep learning architecture, largely inspired by an ongoing analysis within the CMS collaboration, is built as the autoencoder skeleton. After the description of the architecture, loss function, and training schedule, reconstruction of events is done successfully. To quantify the accuracy of this reconstruction, we compute the Wasserstein distance between several kinematic variables of interest, along with a figure of merit to measure the similarity between the reconstructed dijet and quadjet invariant masses. Furthermore, our approach demonstrates the capability to generate an arbitrarily large number of events from the encoded space, showing promising agreement with the reconstructed samples. This study not only underscores the applicability of autoencoders in high-energy physics but also offers insights into their potential contributions to future experimental analyses within the field.
A proof of principle for the application of autoencoders in encoding high-dimensional multijet data is presented. A simulation with events containing four b-quark QCD jets is used to train the autoencoder. The reconstruction of events after a reduced dimension step is attempted. We also demonstrate the capabilities of this autoencoder to generate new artificial events. The characterization of QCD multijet events plays a crucial role in estimating background samples for processes involving 4b-jets final states, including the production of Higgs boson pairs. Such a process is accessible at the LHC and may be observed in a combined search by the end of the high-luminosity run of the LHC. The 4b channel is expected to contribute significantly to this combined search, and it is the goal of this work to showcase the ability of autoencoders to model such multijet events in a reduced-dimension space. This approach has the potential to yield embedded metrics for effective background-signal discrimination. A deep learning architecture, largely inspired by an ongoing analysis within the CMS collaboration, is built as the autoencoder skeleton. After the description of the architecture, loss function, and training schedule, reconstruction of events is done successfully. To quantify the accuracy of this reconstruction, we compute the Wasserstein distance between several kinematic variables of interest, along with a figure of merit to measure the similarity between the reconstructed dijet and quadjet invariant masses. Furthermore, our approach demonstrates the capability to generate an arbitrarily large number of events from the encoded space, showing promising agreement with the reconstructed samples. This study not only underscores the applicability of autoencoders in high-energy physics but also offers insights into their potential contributions to future experimental analyses within the field.
Autoencoder-based characterization of QCD multijet background at the LHC
MARIÑO VILLADAMIGO, JAVIER
2022/2023
Abstract
A proof of principle for the application of autoencoders in encoding high-dimensional multijet data is presented. A simulation with events containing four b-quark QCD jets is used to train the autoencoder. The reconstruction of events after a reduced dimension step is attempted. We also demonstrate the capabilities of this autoencoder to generate new artificial events. The characterization of QCD multijet events plays a crucial role in estimating background samples for processes involving 4b-jets final states, including the production of Higgs boson pairs. Such a process is accessible at the LHC and may be observed in a combined search by the end of the high-luminosity run of the LHC. The 4b channel is expected to contribute significantly to this combined search, and it is the goal of this work to showcase the ability of autoencoders to model such multijet events in a reduced-dimension space. This approach has the potential to yield embedded metrics for effective background-signal discrimination. A deep learning architecture, largely inspired by an ongoing analysis within the CMS collaboration, is built as the autoencoder skeleton. After the description of the architecture, loss function, and training schedule, reconstruction of events is done successfully. To quantify the accuracy of this reconstruction, we compute the Wasserstein distance between several kinematic variables of interest, along with a figure of merit to measure the similarity between the reconstructed dijet and quadjet invariant masses. Furthermore, our approach demonstrates the capability to generate an arbitrarily large number of events from the encoded space, showing promising agreement with the reconstructed samples. This study not only underscores the applicability of autoencoders in high-energy physics but also offers insights into their potential contributions to future experimental analyses within the field.File | Dimensione | Formato | |
---|---|---|---|
MarinoVilladamigo_Javier.pdf
accesso aperto
Dimensione
14.24 MB
Formato
Adobe PDF
|
14.24 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/52998