Project Details
Projekt Print View

Learning structures in the CRISPR-Cas system using deep learning architectures

Subject Area Bioinformatics and Theoretical Biology
Term since 2018
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 405892038
 
In recent years, deep neural networks, such as recurrent neural networks (RNN) and convolutional neural networks (CNN), have become a central and remarkably effective modeling tool for classification tasks, such as speech and image recognition, as well for text classification, and outperform classical machine learning approaches, and even humans, in video and image recognition. However, RNNs are still barely tested and applied on genetic datasets. In this application, we present the use of RNNs to model CRISPR regions and their associated genomes, as well as their targets. By visualizing the hidden states of the trained network we will get insights into structural properties, which are shared by CRISPR loci and their associated genomic and target sequences, such as the Protospacer Adjacent Motif (PAM). Due to the fact that nucleotide-level models will be trained unsupervised, the method is capable of detecting yet unknown structural properties of the CRISPR system. We first aim to catalog all CRISPR structures recovered from large collection of metagenomes (’Objective 1’). With this data and together with 2509 already identified CRISPR loci from complete genomes, we will employ RNNs to uncover hidden structures (’Objective 2’). The trained model will also be used to validate putative CRISPR loci, which make up the majority of current CRISPR databases and to refine CRISPR subtype classification (’Objective 3’).
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung