Reasons Why You Are Still An Amateur At Design: Evolutionary design of molecules based on deep learning and a genetic algorithm Scientific Reports

Table Of Content

1. Components of Computational Autonomous Molecular Design Workflow
Highly accurate protein structure prediction with AlphaFold
Could a single synthetic molecule outsmart a variety of drug-resistant bacteria?
A pharmacophore-guided deep learning approach for bioactive molecular generation
Molecular property prediction

The property prediction function f(∙) predicts the molecular property t using the ECFP vector as an input. The decoding and property prediction functions are derived by the RNN and DNN, respectively, and consequently, lead the overall workflow shown in Fig. Gómez-Bombarelli et al. [21] first proposed the character VAE composed of encoder, decoder and predictor (refer Figure 2.1). First, the kernel density estimation was used to learn to capture the relevant features of the molecules. Then continuous latent spaces were learned on dimensions, optimizing the specific properties of the molecules, allowed the use of powerful gradient-based to efficiently guide the search. Adding the joint training task of multi-layer perceptron and encoder guaranteed the prediction ability of molecular properties.

1. Components of Computational Autonomous Molecular Design Workflow

De novo prediction of explicit water molecule positions by a novel algorithm within the protein design software MUMBO ... - Nature.com

De novo prediction of explicit water molecule positions by a novel algorithm within the protein design software MUMBO ....

Posted: Wed, 04 Oct 2023 07:00:00 GMT [source]

The results show that the data-driven machine intelligence acquires implicit chemical knowledge and generates novel molecules with bespoke properties and structural diversity. The method is available as an open-access tool for medicinal and bioorganic chemistry. Evolutionary design has gained significant attention as a useful tool to accelerate the design process by automatically modifying molecular structures to obtain molecules with the target properties. However, its methodology presents a practical challenge—devising a way in which to rapidly evolve molecules while maintaining their chemical validity.

Highly accurate protein structure prediction with AlphaFold

Extensive analysis of the molecule's physical and optical properties, including refractive index, extinction coefficient, dielectric constant, and conductivity, was conducted. The molecule exhibited intriguing behavior in terms of extinction coefficient and refractive index, showing an initial increase followed by a decrease with increasing photon energy. The molecule demonstrated surfactant properties, indicated by its increasing dielectric constant and optical conductivity, suggesting enhanced charge transfer and energy storage potential. Additionally, the study investigated the impact of flexible spacer length on organic compounds, revealing that shorter spacers improve refractive index, extinction coefficient, and optical conductivity, indicating enhanced light absorption and bending capabilities. The observed differences in optical values between compounds with different spacer lengths can be attributed to factors such as molecular packing, dipole moment, molecular orientation, and light interactions.

Could a single synthetic molecule outsmart a variety of drug-resistant bacteria?

Herein we introduce GaUDI, a guided diffusion model for inverse molecular design that combines an equivariant graph neural net for property prediction and a generative diffusion model. We demonstrate GaUDI’s effectiveness in designing molecules for organic electronic applications by using single- and multiple-objective tasks applied to a generated dataset of 475,000 polycyclic aromatic systems. GaUDI shows improved conditional design, generating molecules with optimal properties and even going beyond the original distribution to suggest better molecules than those in the dataset. In addition to point-wise targets, GaUDI can also be guided toward open-ended targets (for example, a minimum or maximum) and in all cases achieves close to 100% validity of generated molecules.

Science News

During the past 5 years, case studies using GANs towards the generation of novel molecules with specific desired properties have made milestone progress, especially the combination of GAN and reinforcement learning [51]. GAN includes a generator that imitates the real samples and a discriminator that distinguishes the output of the generator from the actual sample to the greatest extent, while the generator is a liar for the discriminator. The ultimate goal of GANs is to make the discriminator unable to judge whether the output of the generator is the fake.

A pharmacophore-guided deep learning approach for bioactive molecular generation

In recent three years, there are many surprisingly effective works in the field of molecular graph generation. Considering the success of VAE models on SMILES, architectures based on VAE with molecular graph design were later developed. Gómez-Bombarelli et al. [21] believed that graph-based representation methods should be further explored. Moreover, with the popularity of graph neural networks, graph-based models also play a dominate role in de novo molecular design.

Our article serves as a guide for medicinal, computational chemistry and biology, analytical chemistry, and the ML community to practice autonomous molecular design in precision medicine and drug discovery. Data.De novo molecular design is facing the common failing in artificial intelligence, including the representation, quality and scarcity of data. The training of deep neural networks always relies on sufficient data namely data-driven. Therefore, constructing more satisfying datasets in the field of molecular generation is also a hot-potato to solve. For this, some models [61] choose to pre-train on the large dataset and then fine-tuned to generate molecules for the specific targets. We take the view that incorporating multi-omics data can make up for the insufficiency of data scarcity in the future.

Molecular property prediction

In contrast, our molecular design framework is designed to generate molecules given a certain composition that exhibit target property values within specific ranges. The validity of the generated molecules obtained with different molecular design frameworks is verified along with their properties, followed by plotting the distributions of the properties for corresponding property targets in Fig. Distributions for the training data are also plotted to validate the generalization capabilities of the molecular design techniques. As evident from these density plots, the CVAE architecture does not guarantee constrained sampling of molecular candidates.

Graph-based methods for end-to-end feature learning and predictive modeling have been successfully used on small molecules consisting of lighter atoms. For larger molecules, robust representation learning and molecule generation parts must include non-local interactions, such as Van der Waals and H-bonding, while building predictive and generative models. Building on DTNN, Schütt et al. [58] also proposed a SchNet model, where the interactions between the atoms are encoded using a continuous filter convolution layer before being processed by filter generating neural networks. The predictive power of their model was further extended for electronic, optical, and thermodynamic properties of molecules in the QM9 dataset compared to only the total energy in DTNN, achieving state-of-the-art chemical accuracy in 8 out of 12 properties. The improved accuracy was observed over a related approach of Gilmer et al. [37], known as message passing neural network (MPNN), on a number of properties except polarizability and electronic spatial extent. It is critical to mention that MPNN is more accurate for the intensive properties (α, 〈R2〉) where the decomposition into individual atomic contributions is not required.

Further, designing a representation with enriched information for molecules is also a challenge. No doubt that sequence-based representations are simpler, but they ignore the structure information to some extent. Moreover, graph-based methods have been widely used, nevertheless, incorporating 3D information into graph-based models is still lacking. Combining 3D information with appropriate structure-based models in a simple manner is the Achilles’ heel and it will be an interesting venue for the future work [27]. Last but not least, learning molecules under the representation of images may be a feasible orientation due to the mature of computer vision. To evaluate the efficiency of the structure-property relationship captured by the energy-based model, we benchmark their predictive performance against different neural network models that adopt various learning strategies and input types.

The average fitness improves with the number of generations when S1 is increased or decreased, indicating that the proposed workflow has successfully evolved the seed molecules toward those with the required target properties. In the early stage, S1 changes fast, where after the change is relatively slow. Moreover, a larger amount of training data results in a higher rate of S1 change. It takes into account the properties obtained from quantum mechanics-based simulation or from experimental data to ultimately generate features in addition to the standard process used in benchmark models (e.g., message passing neural network (MPNN). Percentage of molecules generated with evolutionary design vs. the density of the training dataset (a) and the number of new molecules generated in repeated phases (b). The MCB mission is to teach and do research on the molecular and biochemical underpinnings of life itself.

For example, molecular descriptors values were incorporated into the RNNs-based models [62], which were more focused that the traditional methods. Deep generative models develop rapidly as generating new synthetic data from given samples, including images [17], text [18] and video [19]. The representations of molecules in silico are similar to texts in natural language processing and graphs in social networks.

Besides, Zheng et al. [60] built a quasi-biogenic compound library including stereo-chemical properties. Linking bioactive synthetic compounds with natural products provides source of inspiration for drug discovery and the result expands the application scope of CLM in a small data regime. Notably, conditional generative models have been recommended, which utilized additional information to guide the molecular design.

We compared the proposed method with previous generative models by testing it on goal-directed tasks defined in the GuacaMol8,12 scoring suite. The comparison involved using the proposed method to generate novel molecules with the desired properties. The two main objectives of validation of the GuacaMol test are rediscovery and property satisfaction benchmarks. Specifically, the rediscovery task is defined as the maximization of the similarity between the ECFP fingerprints of the structures of the generated molecules and that of the target. We employed celecoxib, troglitazone, and thiothixene as benchmarks for the rediscovery tasks, such as baselines.

And R.P.J.; writing—original draft preparation, R.P.J.; writing—review and editing, N.K. All rights are reserved, including those for text and data mining, AI training, and similar technologies. Note that in some cases, the resolved 3D model is only an approach of the real molecule, this means you have to execute an Energy minimization in order to do reliable measurements. These functions allow you to perform some advanced searches through the PubChem database using the structural formula from the sketcher. You can load molecules from large databases like PubChem and RCSB using the search form located on the left side of the menu-bar. Just type what you are looking for and a list of available molecules will appear.

Reasons Why You Are Still An Amateur At Design

Monday, April 29, 2024

Evolutionary design of molecules based on deep learning and a genetic algorithm Scientific Reports