Generative models for molecular graphs have progressed quickly, yet most cannot guarantee structural compliance while sampling, which limits their reliability in scientific use. This thesis investigates hard constraints within discrete diffusion for molecular graph generation by adapting and extending the ConStruct framework to QM9. I introduce sampling-time projectors that impose symbolic graph constraints, namely upper bounds on total ring count and on maximum ring length, defined over all simple cycles. The projectors act only during reverse diffusion, are edge-deletion invariant, and prevent precisely those edges that would lead to a violation, leaving the learned score network unchanged. Across all settings, the method attains near-perfect satisfaction of the targeted constraints (essentially 100%) with competitive generative quality: RDKit validity, uniqueness, and novelty remain high, and Fréchet ChemNet Distance stays close to the unconstrained baseline. When constraints are non-binding, behavior matches the baseline as expected. To make these effects measurable, I separate structural satisfaction at sampling time from chemical metrics computed post-hoc, and provide diagnostics that track shifts in ring spectra and connectivity under different constraint levels. The analysis also characterizes trade-offs introduced by tight constraints, such as mild distributional bias and occasional novelty loss. Overall, the results show that strict, interpretable constraints can be integrated into discrete diffusion without retraining, enabling controllable molecular generation. The same mechanism aims to extend naturally to additional rules (for example, drug-likeness), suggesting a general template for constraint-aware graph generative models beyond molecules.
Generative models for molecular graphs have progressed quickly, yet most cannot guarantee structural compliance while sampling, which limits their reliability in scientific use. This thesis investigates hard constraints within discrete diffusion for molecular graph generation by adapting and extending the ConStruct framework to QM9. I introduce sampling-time projectors that impose symbolic graph constraints, namely upper bounds on total ring count and on maximum ring length, defined over all simple cycles. The projectors act only during reverse diffusion, are edge-deletion invariant, and prevent precisely those edges that would lead to a violation, leaving the learned score network unchanged. Across all settings, the method attains near-perfect satisfaction of the targeted constraints (essentially 100%) with competitive generative quality: RDKit validity, uniqueness, and novelty remain high, and Fréchet ChemNet Distance stays close to the unconstrained baseline. When constraints are non-binding, behavior matches the baseline as expected. To make these effects measurable, I separate structural satisfaction at sampling time from chemical metrics computed post-hoc, and provide diagnostics that track shifts in ring spectra and connectivity under different constraint levels. The analysis also characterizes trade-offs introduced by tight constraints, such as mild distributional bias and occasional novelty loss. Overall, the results show that strict, interpretable constraints can be integrated into discrete diffusion without retraining, enabling controllable molecular generation. The same mechanism aims to extend naturally to additional rules (for example, drug-likeness), suggesting a general template for constraint-aware graph generative models beyond molecules.
Constrained Molecular Graph Generation with Diffusion Models
ISLEK, RANA
2024/2025
Abstract
Generative models for molecular graphs have progressed quickly, yet most cannot guarantee structural compliance while sampling, which limits their reliability in scientific use. This thesis investigates hard constraints within discrete diffusion for molecular graph generation by adapting and extending the ConStruct framework to QM9. I introduce sampling-time projectors that impose symbolic graph constraints, namely upper bounds on total ring count and on maximum ring length, defined over all simple cycles. The projectors act only during reverse diffusion, are edge-deletion invariant, and prevent precisely those edges that would lead to a violation, leaving the learned score network unchanged. Across all settings, the method attains near-perfect satisfaction of the targeted constraints (essentially 100%) with competitive generative quality: RDKit validity, uniqueness, and novelty remain high, and Fréchet ChemNet Distance stays close to the unconstrained baseline. When constraints are non-binding, behavior matches the baseline as expected. To make these effects measurable, I separate structural satisfaction at sampling time from chemical metrics computed post-hoc, and provide diagnostics that track shifts in ring spectra and connectivity under different constraint levels. The analysis also characterizes trade-offs introduced by tight constraints, such as mild distributional bias and occasional novelty loss. Overall, the results show that strict, interpretable constraints can be integrated into discrete diffusion without retraining, enabling controllable molecular generation. The same mechanism aims to extend naturally to additional rules (for example, drug-likeness), suggesting a general template for constraint-aware graph generative models beyond molecules.| File | Dimensione | Formato | |
|---|---|---|---|
|
RanaIslek_Thesis_Report.pdf
accesso aperto
Dimensione
2.52 MB
Formato
Adobe PDF
|
2.52 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/102115