Project Details
Finding new overlapping genes and their theory
Applicants
Professor Dr.-Ing. Martin Bossert; Professor Dr. Daniel Keim; Professor Dr. Siegfried Scherer
Subject Area
Electronic Semiconductors, Components and Circuits, Integrated Systems, Sensor Technology, Theoretical Electrical Engineering
Term
from 2010 to 2019
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 150058393
New overlapping protein-coding DNA-sequences in prokaryotes are to be found and verified. The underlying mechanisms are examined using models from information and communication theory. Based on former project results, we hypothesize that (i) overlapping genes (OLG) result from overprinting and (ii) orphans are descendants of newly arisen OLG. These hypotheses shall be tested as follows: 1. Bacterial genomes will be modelled statistically and compared to the natural genomes: Which role plays the genetic code, the GC-content and the codon usage in the de novo evolution of protein-coding OLG? We will examine if there is a continuum between non-coding overlapping proto-genes towards 'true' OLG. These studies will reveal details about the stochastic processes from which OLG potentially originate. 2. We expect competing selection pressures acting on OLG pairs, which cause Ka/Ks ratios to deviate from single genes. Current methods to measure Ka/Ks ratios need many data to be sufficient accurate. Using information theoretic knowledge, we will develop a model which uses less parameters and, thus, needs less data to determine Ka/Ks.3. Sequenced organisms provide a large data base to determine the phylogeny of OLG. To understand and interpret these data for many OLG, visualisations and visual analytics are needed. Using such tools, the above mentioned orphan hypothesis can be answered. Towards this end, related genes are identified using BLAST and the phylogeny shall be visualized, such that the uncoupling of former OLG can be shown. To assess the on-going evolution of overlaps, selection pressures and sequence similarities shall be visualized as well. These tools will help to answer the question of how OLG potentially evolve further functions. 4. Hypothetical evolutionary histories of OLG shall be reconstructed (see above) and tested experimentally. Towards this end, ancestral and intermediate sequences of OLG shall be reconstructed by phylogenetic tools, synthesized and inserted in appropriate knock out mutants, thus allowing to measure their influence on fitness.5. Ribosomal footprints of mRNA are efficient to find OLG in different bacteria. Such experiments shall be used to determine the prediction of OLG in dependence of the GC-content and codon usage. Footprint data provide the possibility to determine the reading frame used. This approach will be improved using information theoretic methods such that the reading frame of single genes can be detected. Furthermore, relatives of EHEC, which has been examined in former project, shall be tested. This will corroborate or refute the hypothesis that orphans are important for niche adaptation.6. In the previous project phases, OLG could be predicted or were experimentally found. Some of these have been mutated strand-specifically to test their fitness. This shall be continued and a functional characterization of some further OLG shall be conducted using methods of molecular biology.
DFG Programme
Priority Programmes
Participating Persons
Privatdozent Dr. Klaus Neuhaus; Professor Dr.-Ing. Steffen Schober