Project Details
Learning structures in the CRISPR-Cas system using deep learning architectures
Applicant
Professorin Dr. Alice C. McHardy
Subject Area
Bioinformatics and Theoretical Biology
Term
since 2018
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 405892038
In recent years, deep neural networks, such as recurrent neural networks (RNN) and convolutional neural networks (CNN), have become a central and remarkably effective modeling tool for classification tasks, such as speech and image recognition, as well for text classification, and outperform classical machine learning approaches, and even humans, in video and image recognition. However, RNNs are still barely tested and applied on genetic datasets. In this application, we present the use of RNNs to model CRISPR regions and their associated genomes, as well as their targets. By visualizing the hidden states of the trained network we will get insights into structural properties, which are shared by CRISPR loci and their associated genomic and target sequences, such as the Protospacer Adjacent Motif (PAM). Due to the fact that nucleotide-level models will be trained unsupervised, the method is capable of detecting yet unknown structural properties of the CRISPR system. We first aim to catalog all CRISPR structures recovered from large collection of metagenomes (’Objective 1’). With this data and together with 2509 already identified CRISPR loci from complete genomes, we will employ RNNs to uncover hidden structures (’Objective 2’). The trained model will also be used to validate putative CRISPR loci, which make up the majority of current CRISPR databases and to refine CRISPR subtype classification (’Objective 3’).
DFG Programme
Priority Programmes