Project Details
Evolutionary constraints on de novo emergence of new protein coding genes
Applicant
Dr. Bharat Ravi
Subject Area
Bioinformatics and Theoretical Biology
Term
since 2025
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 556095308
New protein coding genes usually emerge from existing protein coding genes. Challenging several years of belief that genes cannot emerge from scratch in evolutionarily feasible timescales, recent studies have shown that new protein coding genes can indeed emerge from DNA sequences that do not yet encode any gene. This phenomenon is called de novo gene emergence and the genes thus evolved are called proto-genes or de novo genes. Although several de novo genes have been discovered and some have been shown to also be beneficial to the host organism, very little is known about the rate at which they emerge and are lost, how their protein products interact with cellular physiology, and how their protein sequences change during early stages of their evolution. In a recently published work, I have used mathematical models to address some open questions on the topic. Specifically, I showed that (1) de novo genes can be lost 2-3 times independently in the timescale of one gene gain event (2) emergence of transcription usually precedes the evolution of an open reading frame (ORF), during de novo gene emergence (3), loss of ORF is more likely than loss of transcription and (4) mutation bias can provide a direction to random protein evolution. Overall, my earlier work explored how de novo emergence would occur if evolution is not constrained by selection. That is, the genes and their products do not affect fitness of the host organism. This neutral evolutionary model provides a null hypothesis. As a next step in this direction, the proposed project will explore how selection shapes the evolution of genes immediately after their emergence. I focus on early evolution because at this stage de novo protein coding genes are highly distinguishable from existing protein coding genes, and may thus evolve via mechanisms that are different from the evolutionary mechanisms of existing protein coding genes (with or without duplication). To explore this topic, I plan to use a multi-disciplinary approach with a combination of theoretical modeling, computational protein structure prediction, analysis of genome/transcriptome sequencing data, and laboratory protein evolution to address open questions on early evolution of de novo genes. Broadly, I plan to address the following questions: (1) How rapidly are de novo genes fixed in a population (2) how proteins expressed from newly emerged genes affect cellular health, and (3) how quickly can these proteins evolve a 3D structure. My earlier works were well received by the reviewers for its theoretical rigor and for addressing important open questions, and have been published in highly visible journals. I believe that the planned study also makes a significant contribution to the field.
DFG Programme
Research Grants