Project Details
Projekt Print View

Machine learning approaches for faster discovery and adaptation of enzymes for difficult chemical reactions (MacBioSyn). Part I: providing solutions for regioselective oxygenations by 2OGD oxidases

Subject Area Biological and Biomimetic Chemistry
Organic Molecular Chemistry - Synthesis and Characterisation
Term since 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 497207454
 
Biocatalytic synthesis of chemicals is considered a keystone for future green and sustainable chemistry. It is particularly highlighted in combination with digital transformation (Green Deal) by the European Commission. However, its power is far from being realized today in industry, mainly because of the limited activity or diversity of accessible enzymes. 2-oxoglutarate-dependent (2OGD) proteins are an under-researched family of enzymes which catalyze “tricky” oxidative reactions (e.g., oxyfunctionalization of non-activated carbons, demethylations), which are challenging or cannot be performed using traditional chemosynthesis. Thus, 2OGD proteins have the high potential to revolutionize the industry as a regio- and product-specific “alternative to chemistry”. Identifying new representatives of this large family having e.g. increased substrate scope can offer a new range of biocatalytic routes to e.g. natural products. However, a common challenge for enzyme development is the prediction of activity by exploring the enormous biodiversity through genome mining. Machine learning (ML) can capitalize on large and diverse enzyme datasets to predict function and activity, and explore the biodiversity to identify advanced biocatalysts. Additionally, ML methods comprise can optimize multiple protein traits simultaneously and to navigate sequence and chemical space efficiently.In the MacBioSyn project, we aim to develop (a general, high-throughput (HT)) ML-based framework (deep learning, active learning, reinforcement learning) that predicts the activity of enzymes and their substrate / reaction scope. We will implement a new in silico framework for the analysis of enzyme sequences/substrates pairs based on ML models trained on screening results by a synergistic approach, combining the interdisciplinary expertise of computational design / modeling (Davari) with HT enzyme characterization (Dippe/Wessjohann). To establish this platform, we will focus on 2OGD enzymes as proof-of-concept biocatalysts. The critical challenge that appears through machine learning projects is the dataset size available for training to establish consistent and reliable statistical modeling. Therefore, MacBioSyn aims at generating a large dataset by HT screening (> 1500 enzymes) of the superfamily’s biodiversity. Conversion of 30 substrates covering various structures will generate representative data to train our algorithms in an iterative process. In essence, our framework will provide a solution for activity / substrate scope prediction for biocatalyst discovery in general. The synergistic approach will provide methodologies that enable the power of ML methods to accelerate the discovery of improved enzymes, i. e. how biocatalytic reactions (here oxyfunctionalizations) are developed. The new fundamental design principles learned for 2OGD enzymes will broaden their applications in the biocatalytic production of valuable natural products and beyond.
DFG Programme Priority Programmes
International Connection France
Cooperation Partner Professor Dr. Jean Loup Faulon
 
 

Additional Information

Textvergrößerung und Kontrastanpassung