The DNA from a Coding Perspective
Final Report Abstract
DNA was investigated from the perspective of information theory and communications in general to gain a deeper insight into the properties of DNA sequences linking those engineering perspectives to biological aspects. Several new aspects about the DNA could be elucidated and known properties could be understood and actually formally proven. Seeing mutations and their probabilities as a communications channel and computing mutual information along time, realized as a channel matrix exponent, allowed to prove the mapping between codons and amino acids, especially synonymous mappings and the Wobble rule have become directly visible. Even more, we had to realize that at some point, the information content of a base pair that is a quaternary representation, will only offer a single bit of information, maybe only allowing to distinguish between purines and pyrimidines. We could show that Shannon entropy is a good general indicator for biologically relevant DNA features. With a screening based on Shannon entropy, we could identify promoter features sensitive to DNA 3D structure determinative for temporal gene expression. With the same approach we identified repetitive sequence features able to modulate basal levels of gene expression. With this information about promoter design, custom-made promoters with desired features are within reach for synthetic biology and allow for more predictable engineering approaches in biology and biotechnology. Furthermore, we were able to quantify rearrangements in the order of genes forming patterns of gene migration during evolution of bacterial chromosomes. This work revealed a fundamental driving force for bacterial chromosome evolution which also paves the road to a more reliable construction of stable synthetic chromosomes. Based on information-theoretic features, just as entropies (Shannon and Gibbs), mutual information, conditional mutual information, Kullback-Leibler divergence, and Markov models, we managed to perform intra-organism and cross-organism prediction of essential genes, in bacteria, archaeon, and eukaryotes. With such a simple sequence-based approach, we obtained AUC performances comparable to much more elaborate methods, e.g., CRISPR-based ones. For the Markov modeling, we had to, of course, estimate the suitable order. Our studies on essentiality lead to a cooperation with colleagues in Israel and joint publications. It hence developed to become a bigger share of our project than once anticipated. As a consequence, studies on gene regulatory network as the presumably highest layer of protection in a kind of graph-related error correction mechanism were then only touched upon, especially looking into synthetic lethality networks. Synthetic lethality can be regarded as a repetition code, but this is not just limited to gene duplications or simple functional replacements. There are pathways that provide redundancy and hence, a more complicated multi-level “code” graph appears appropriate. Jointly looking into co-regulatory networks did so far not lead to an understanding of connection degrees of certain genes, especially of hubs with many connections. However, more studies will certainly be needed on this aspect. The project not only lead to international cooperations, but also to centrally contributing to an NSF-Workshop that we got invited to. NSF is trying to initiate a cooperative initiative between engineering and life-sciences, just as DFG’s earlier framework project. Also NSF had realized that such a cooperation can lead to a significant enhancement in the understanding of the genetic structure and function, which we have indeed experienced, as well.
Publications
- “Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers.” EURASIP Journal on Advances in Signal Processing 2017.1, no. 70, 2017
Malik Yousef, Dawit Nigatu, Dalit Levy, Jens Allmer, and Werner Henkel
(See online at https://doi.org/10.1186/s13634-017-0506-8) - “Computational identification of essential genes in prokaryotes and eukaryotes,” Peixoto N., Silveira M., Ali H., Maciel C., van den Broek E. (eds) Biomedical Engineering Systems and Technologies, Communications in Computer and Information Science, vol. 881, Springer, Cham., 2017
Dawit Nigatu and Werner Henkel
(See online at https://doi.org/10.1007/978-3-319-94806-5_13) - “Prediction of essential genes based on machine learning and information theoretic features.” Proceedings of BIOSTEC 2017 - BIOINFORMATICS, pp. 81–92, 2017
Dawit Nigatu and Werner Henkel
(See online at https://doi.org/10.5220/0006165700810092) - “Sequence-based information-theoretic features for gene essentiality prediction,” BMC bioinformatics, vol. 18(1), no. 473, 2017
Dawit Nigatu, Patrick Sobetzko, Malik Yousef, and Werner Henkel
(See online at https://doi.org/10.1186/s12859-017-1884-5) - The DNA from a coding perspective, Information- and Communication Theory in Molecular Biology, Springer International Publishing, 2018
Werner Henkel, Georgi Muskhelishvili, Dawit Nigatu, and Patrick Sobetzko
(See online at https://doi.org/10.1007/978-3-319-54729-9_12) - “Multilevel capacities for the codon mutation channel,” 2018 10th International Symposium on Turbo Codes and Iterative Information Processing (ISTC), Hong Kong, 2018
Dawit Nigatu and Werner Henkel
(See online at https://doi.org/10.1109/ISTC.2018.8625354) - “MoCloFlex: a modular yet flexible cloning system,” Front Bioeng Biotechnol, 7:271, 2019
Carlo A Klein, Marc Teufel, Carl J Weile, and Patrick Sobetzko
(See online at https://doi.org/10.3389/fbioe.2019.00271) - “ICCT in biology at the molecular and cellular level - some steps in unveiling the protection and prioritization in the DNA,” BioTICC NSF workshop (Biology through Information Communication & Coding Theory), Alexandria, VA, 2020
Werner Henkel
- “The bacterial promoter spacer modulates promoter strength and timing by length, TG-motifs and DNA supercoiling sensitivity,” Nature Scientific Reports, 2021
Carlo A Klein, Marc Teufel, Carl J Weile, and Patrick Sobetzko
(See online at https://doi.org/10.1038/s41598-021-03817-4) - “The role of replication-induced chromosomal copy numbers in spatio-temporal gene regulation and evolutionary chromosome plasticity,” bioRxiv
Marc Teufel, Werner Henkel, and Patrick Sobetzko
(See online at https://doi.org/10.1101/2022.03.30.486354)