GRK 1906: Informatische Methoden für die Analyse von Genomdiversität und -dynamik
Informatik
Mathematik
Zusammenfassung der Projektergebnisse
Enabled by modern high-throughput analytic biotechnologies, genomic research has moved from studying single genomes to the concurrent analysis of multiple genomes. In this International Research Training Group, we have developed new computational approaches targeting both (i) genome diversity, i.e., the variation between different samples, species, strains, individuals, cells, etc., and (ii) genomic dynamics originating from random mutations, recombination, evolutionary pressure and selection. Therefore we subdivided our research program into different areas addressing diverse methodological needs. In the context of Area 1 “Scale-up call: Enhancing computational capacity”, the method of choice has been to develop new tools within modern distributed IT environments. This way, high-performance computing becomes affordable and the algorithms are available close to the data. Within the IRTG, different approaches for scale up have been pursued. Containerisation of application (e.g. via Docker) lead to easy deployment in distributed computing infrastructures, integration into workflow systems and reproducible analyses. Integration of existing tools and “dockerized” applications into the MapReduce streaming framework allow robust distribution in cloud environments. For other application, algorithms have been natively implemented in the MapReduce framework. These approaches have been successfully shown to apply metagenomics workflows and publish reproducible results, to scale metagenomics analyses as well as comparative genome analyses. Research in Area 2 “Data management: Basic storage and retrieval” has focused on novel data structures that allow to efficiently store the sequences along with high-level meta-data. In particular, data structures for indexing and compressing pangenomes together with algorithms for their functional analysis have been developed. Furthermore, a data warehouse-driven online tool for metadata based studies of metagenomes has been developed. For the development of new algorithms and methods (Areas 3–5), different fields of application were addressed. Most notably, researchers of the IRTG developed algorithms for the computational determination of functional RNAs, for the efficient grouping and clustering of NGS data, for reconstructing ancestral genomes including ancient DNA, for the simulation of the mutation process along the ancestral line of populations under selection, for the prediction and visualization of 3D protein-protein networks to identify and analyse drug-drug interactions, for microfluidics time lapse image analysis and visualization, and for the visualization of molecular dynamics and co-location in MSI and polyomics data. The methodologies used reach from the design of models, algorithms and data structures to machine learning.
Projektbezogene Publikationen (Auswahl)
- Mycoplasma salivarium as a dominant coloniser of Fanconi anaemia associated oral carcinoma. PLoS One, 9(3), e92297, 2014
B. Henrich, M. Rumming, A. Sczyrba, E. Velleuer, R. Dietrich, W. Gerlach, M. Gombert, S. Rahn, J. Stoye, A. Borkhardt, and U. Fischer
(Siehe online unter https://doi.org/10.1371/journal.pone.0092297) - Scaffolding of ancient contigs and ancestral reconstruction in a phylogenetic framework. In: Proc. of BSB 2014, 135–143, 2014
N. Luhmann, C. Chauve, J. Stoye, and R. Wittler
(Siehe online unter https://doi.org/10.1007/978-3-319-12418-6_17) - Scaffolding of ancient contigs and ancestral reconstruction in a phylogenetic framework. In: Proc. of BSB 2014, 135–143, 2014
N. Luhmann, C. Chauve, J. Stoye, and R. Wittler
(Siehe online unter https://doi.org/10.1007/978-3-319-12418-6_17) - Automatic discovery of metagenomic structure. In: Proc. of IJCNN 2015. 2015
M. Lux, A. Sczyrba, and B. Hammer
(Siehe online unter https://doi.org/10.1109/IJCNN.2015.7280500) - Bloom Filter Trie – a data structure for pan-genome storage. In: Proc. of WABI 2015, 217–230, 2015
G. Holley, R. Wittler, and J. Stoye
(Siehe online unter https://doi.org/10.1007/978-3-662-48221-6_16) - CellWhere: graphical display of interaction networks organized on subcellular localizations. Nucleic Acids Res. 43(W1), W571–W575, 2015
L. Zhu, A. Malatras, M. Thorley, I. Aghoghogbe, A. Mer, S. Duguez, G. Butler-Browne, T. Voit, and W. Duddy
(Siehe online unter https://doi.org/10.1093/nar/gkv354) - The SCJ small parsimony problem for weighted gene adjacencies. In: Proc. of ISBRA 2016, 200–210, 2016
N. Luhmann, A. Thévenin, A. Ouangraoua, R. Wittler, and C. Chauve
(Siehe online unter https://doi.org/10.1007/978-3-319-38782-6_17) - acdc – automated contamination detection and confidence estimation for single-cell genome data. BMC Bioinformatics, 17. 2016
M. Lux, J. Krüger, C. Rinke, I. Maus, A. Schlüter, T. Woyke, A. Sczyrba, and B. Hammer
(Siehe online unter https://doi.org/10.1186/s12859-016-1397-7) - Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage. Algorithms Mol. Biol. 11. 2016
G. Holley, R. Wittler, and J. Stoye
(Siehe online unter https://doi.org/10.1186/s13015-016-0066-8) - Identification and genome reconstruction of abundant distinct taxa in microbiomes from one thermophilic and three mesophilic production-scale biogas plants. Biotechnol. Biofuels, 9. 2016
Y. Stolze, A. Bremges, M. Rumming, C. Henkel, I. Maus, A. Pühler, A. Sczyrba, and A. Schlüter
(Siehe online unter https://doi.org/10.1186/s13068-016-0565-3) - Omics Fusion – a platform for integrative analysis of omics data. J. Integr. Bioinform. 13(4), 296, 2016
B. Brink, A. Seidel, N. Kleinbölting, T. W. Nattkemper, and S. Albaum
(Siehe online unter https://doi.org/10.1515/jib-2016-296) - The SCJ small parsimony problem for weighted gene adjacencies. In: Proc. of ISBRA 2016, 200–210, 2016
N. Luhmann, A. Thévenin, A. Ouangraoua, R. Wittler, and C. Chauve
(Siehe online unter https://doi.org/10.1007/978-3-319-38782-6_17) - A review of bioinformatics platforms for comparative genomics. Recent developments of the EDGAR 2.0 platform and its utility for taxonomic and phylogenetic studies. J. Biotechnol. 261, 2–9, 2017
J. Yu, J. Blom, S. Glaeser, S Jaenicke, T Juhre, O Rupp, O Schwengers, S Spänig, and A. Goesmann
(Siehe online unter https://doi.org/10.1016/j.jbiotec.2017.07.010) - Bayesian collective Markov random fields for subcellular localization prediction of human proteins. In: Proc. of ACM BCB 2017, 321–329, 2017
L. Zhu and M. Ester
(Siehe online unter https://doi.org/10.1145/3107411.3107412) - Bayesian collective Markov random fields for subcellular localization prediction of human proteins. In: Proc. of ACM BCB 2017, 321–329, 2017
L. Zhu and M. Ester
(Siehe online unter https://doi.org/10.1145/3107411.3107412) - Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes. Microbial Genomics, 3(9). 2017
N. Luhmann, D. Doerr, and C. Chauve
(Siehe online unter https://doi.org/10.1099/mgen.0.000123) - Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes. Microbial Genomics, 3(9). 2017
N. Luhmann, D. Doerr, and C. Chauve
(Siehe online unter https://doi.org/10.1099/mgen.0.000123) - Dynamic alignment-free and reference-free read compression. In: Proc. of RECOMB 2017. LNCS, 50–65, 2017
G. Holley, F. Hach, R. Wittler, and J. Stoye
(Siehe online unter https://doi.org/10.1007/978-3-319-56970-3_4) - Dynamic alignment-free and reference-free read compression. In: Proc. of RECOMB 2017. LNCS, 50–65, 2017
G. Holley, F. Hach, R. Wittler, and J. Stoye
(Siehe online unter https://doi.org/10.1007/978-3-319-56970-3_4) - Feature relevance bounds for linear classification. In: Proc. of ESANN 2017, Special Session on Biomedical data analysis in translational research: integration of expert knowledge and interpretable models. 2017
C. Göpfert, L. Pfannschmidt, and B. Hammer
- Methods for the identification of common RNA motifs. Universität Bielefeld. PhD thesis. 2017, 140
B. Löwes
- Phylogenetic assembly of paleogenomes integrating ancient DNA data. Universität Bielefeld. PhD thesis. 2017
N. Luhmann
- Rapid protein alignment in the cloud: HAMOND combines fast DIAMOND alignments with Hadoop parallelism. J. Biotechnol. 257, 58–60, 2017
J. Yu, J. Blom, A. Sczyrba, and A. Goesmann
(Siehe online unter https://doi.org/10.1016/j.jbiotec.2017.02.020) - The SCJ small parsimony problem for weighted gene adjacencies. IEEE-ACM Trans. Comput. Biol. Bioinform. 16. 2019. Epub 2017
N. Luhmann, M. Lafond, A. Thévenin, A. Ouangraoua, R. Wittler, and C. Chauve
(Siehe online unter https://doi.org/10.1109/TCBB.2017.2661761) - The SCJ small parsimony problem for weighted gene adjacencies. IEEE-ACM Trans. Comput. Biol. Bioinform. 16. 2019. Epub 2017
N. Luhmann, M. Lafond, A. Thévenin, A. Ouangraoua, R. Wittler, and C. Chauve
(Siehe online unter https://doi.org/10.1109/TCBB.2017.2661761) - ViCAR: an adaptive and landmark-free registration of time lapse image data from microfluidics experiments. Front. Genetics, 8, 69, 2017
G. Hattab, J.-P. Schluter, A. Becker, and T. W. Nattkemper
(Siehe online unter https://doi.org/10.3389/fgene.2017.00069) - A novel methodology for characterizing cell subpopulations in automated time-lapse microscopy. Front. Bioeng. Biotechnol. 6, 17, 2018
G. Hattab, V. Wiesmann, A. Becker, T. Munzner, and T. W. Nattkemper
(Siehe online unter https://doi.org/10.3389/fbioe.2018.00017) - Analyzing colony dynamics and visualizing cell diversity in spatiotemporal experiments. Universität Bielefeld. PhD thesis. 2018
G. Hattab
- Analyzing large scale genomic data on the cloud with Sparkhit. Bioinformatics, 34(9), 1457–1465, 2018
L. Huang, J. Kruger, and A. Sczyrba
(Siehe online unter https://doi.org/10.1093/bioinformatics/btx808) - Comparative methods for reconstructing ancient genome organization. In: Comparative Genomics, 343–362. Springer, 2018
Y. Anselmetti, N. Luhmann, S. Bérard, E. Tannier, and C. Chauve
(Siehe online unter https://doi.org/10.1007/978-1-4939-7463-4_13) - Comparative methods for reconstructing ancient genome organization. In: Comparative Genomics, 343–362. Springer, 2018
Y. Anselmetti, N. Luhmann, S. Bérard, E. Tannier, and C. Chauve
(Siehe online unter https://doi.org/10.1007/978-1-4939-7463-4_13) - Context-specific subcellular localization prediction: Leveraging protein interaction networks and scientific texts. Universität Bielefeld. PhD thesis. 2018
L. Zhu
(Siehe online unter https://doi.org/10.4119/unibi/2931387) - ddPCRclust: an R package and Shiny app for automated analysis of multiplexed ddPCR data. Bioinformatics, 34(15), 2687–2689, 2018
B. Brink, J. Meskas, and R. R. Brinkman
(Siehe online unter https://doi.org/10.1093/bioinformatics/bty136) - ddPCRclust: an R package and Shiny app for automated analysis of multiplexed ddPCR data. Bioinformatics, 34(15), 2687–2689, 2018
B. Brink, J. Meskas, and R. R. Brinkman
(Siehe online unter https://doi.org/10.1093/bioinformatics/bty136) - Dynamic alignment-free and reference-free read compression. J. Comp. Biol. 25(7), 825–836, 2018
G. Holley, R. Wittler, J. Stoye, and F. Hach
(Siehe online unter https://doi.org/10.1089/cmb.2018.0068) - Dynamic alignment-free and reference-free read compression. J. Comp. Biol. 25(7), 825–836, 2018
G. Holley, R. Wittler, J. Stoye, and F. Hach
(Siehe online unter https://doi.org/10.1089/cmb.2018.0068) - Efficient grouping methods for the annotation and sorting of single cells. Universität Bielefeld. PhD thesis. 2018
M. Lux
- GeFaST: An improved method for OTU assignment by generalising Swarm’s fastidious clustering approach. BMC Bioinformatics, 19(1), 321, 2018
R. Müller and M. E. Nebel
(Siehe online unter https://doi.org/10.1186/s12859-018-2349-1) - GenCoNet–a graph database for the analysis of comorbidities by gene networks. J. Integr. Bioinform. 15(4). 2018
A. Shoshi, R. Hofestädt, O. Zolotareva, M. Friedrichs, A. Maier, V. A. Ivanisenko, V. E. Dosenko, and E. Y. Bragina
(Siehe online unter https://doi.org/10.1515/jib-2018-0049) - Interpretation of linear classifiers by means of feature relevance bounds. Neurocomputing, 298, 69–79, 2018
C. Göpfert, L. Pfannschmidt, J. P. Göpfert, and B. Hammer
(Siehe online unter https://doi.org/10.1016/j.neucom.2017.11.074) - Metadata-driven computational (meta)genomics. A practical machine learning approach. Universität Bielefeld. PhD thesis. 2018
M. Rumming
- Molecular relationships between bronchial asthma and hypertension as comorbid diseases. J. Integr. Bioinform. 15(4). 2018
E. Y. Bragina, I. A. Goncharova, A. F. Garaeva, E. V. Nemerov, A. A. Babovskaya, A. B. Karpov, Y. V. Semenova, I. Z. Zhalsanova, D. E. Gomboeva, O. V. Saik, O. I. Zolotareva, V. A. Ivanisenko, V. E. Dosenko, R. Hofestädt, and M. B. Freidin
(Siehe online unter https://doi.org/10.1515/jib-2018-0052) - Novel candidate genes important for asthma and hypertension comorbidity revealed from associative gene networks. BMC Med. Genomics, 11(1), 15, 2018
O. V. Saik, P. S. Demenkov, T. V. Ivanisenko, E. Y. Bragina, M. B. Freidin, I. A. Goncharova, V. E. Dosenko, O. I. Zolotareva, R. Hofestädt, I. N. Lavrik, E. I. Rogaev, and V. A. Ivanisenko
(Siehe online unter https://doi.org/10.1186/s12920-018-0331-4) - Omics visualization and its application to presymptomatic diagnosis of oral cancer. Universität Bielefeld. PhD thesis. 2018
B. Brink
(Siehe online unter https://doi.org/10.4119/unibi/2930495) - Pan-genome search and storage. Universität Bielefeld. PhD thesis. 2018
G. Holley
- Pan-genome storage and analysis techniques. In: Comparative Genomics, 29–53. Springer, 2018
T. Zekic, G. Holley, and J. Stoye
(Siehe online unter https://doi.org/10.1007/978-1-4939-7463-4_2) - Scaffolding of ancient contigs and ancestral reconstruction in a phylogenetic framework. IEEE-ACM Trans. Comput. Biol. Bioinform. 15(6), 2094–2100, 2018
N. Luhmann, C. Chauve, J. Stoye, and R. Wittler
(Siehe online unter https://doi.org/10.1007/978-3-319-12418-6_17) - Scaffolding of ancient contigs and ancestral reconstruction in a phylogenetic framework. IEEE-ACM Trans. Comput. Biol. Bioinform. 15(6), 2094–2100, 2018
N. Luhmann, C. Chauve, J. Stoye, and R. Wittler
(Siehe online unter https://doi.org/10.1109/TCBB.2018.2816034) - Search for new candidate genes involved in the comorbidity of asthma and hypertension based on automatic analysis of scientific literature. J. Integr. Bioinform. 15(4). 2018
O. V. Saik, P. S. Demenkov, T. V. Ivanisenko, E. Y. Bragina, M. B. Freidin, V. E. Dosenko, O. I. Zolotareva, E. L. Choynzonov, R. Hofestaedt, and V. A. Ivanisenko
(Siehe online unter https://doi.org/10.1515%2Fjib-2018-0054) - flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry. Bioinformatics, 34(13), 2245–2253, 2018
M. Lux, R. R. Brinkman, C. Chauve, A. Laing, A. Lorenc, L. Abeler-Dörner, and B. Hammer
(Siehe online unter https://doi.org/10.1093/bioinformatics/bty082) - flowLearn: fast and precise identification and quality checking of cell populations in flow cytometry. Bioinformatics, 34(13), 2245–2253, 2018
M. Lux, R. R. Brinkman, C. Chauve, A. Laing, A. Lorenc, L. Abeler-Dörner, and B. Hammer
(Siehe online unter https://doi.org/10.1093/bioinformatics/bty082) - A survey of gene prioritization tools for Mendelian and complex human diseases. J. Integr. Bioinform. 16(4). 2019
O. Zolotareva and M. Kleine
(Siehe online unter https://doi.org/10.1515/jib-2018-0069) - Cloud-based bioinformatics framework for next-generation sequencing data. Universität Bielefeld. PhD thesis. 2019
L. Huang
(Siehe online unter https://doi.org/10.4119/unibi/2936599) - Comorbidity of asthma and hypertension may be mediated by shared genetic dysregulation and drug side effects. Scientific Reports, 9(1), 1–11, 2019
O. Zolotareva, O. V. Saik, C. Königs, E. Y. Bragina, I. A. Goncharova, M. B. Freidin, V. E. Dosenko, V. A. Ivanisenko, and R. Hofestädt
(Siehe online unter https://doi.org/10.1038/s41598-019-52762-w) - Detection and visualization of communities in mass spectrometry imaging data. BMC Bioinformatics, 20(1), 303, 2019
K. Wullems, J. Kölling, H. Bednarz, K. Niehaus, V. H. Hans, and T. W. Nattkemper
(Siehe online unter https://doi.org/10.1186/s12859-019-2890-6) - Feature relevance bounds for ordinal regression. In: Proc. of ESANN 2019. 2019
L. Pfannschmidt, J. Jakob, M. Biehl, P. Tino, and B. Hammer
(Siehe online unter https://doi.org/10.48550/arXiv.1902.07662) - FRI–Feature relevance intervals for interpretable and interactive data exploration. In: Proc. of CIBCB 2019, 1–10, 2019
L. Pfannschmidt, C. Göpfert, U. Neumann, D. Heider, and B. Hammer
(Siehe online unter https://doi.org/10.1109/CIBCB.2019.8791489) - HyAsP, a greedy tool for plasmids identification. Bioinformatics, 35(21), 4436–4439, 2019
R. Müller and C. Chauve
(Siehe online unter https://doi.org/10.1093/bioinformatics/btz413) - HyAsP, a greedy tool for plasmids identification. Bioinformatics, 35(21), 4436–4439, 2019
R. Muller and C. Chauve
(Siehe online unter https://doi.org/10.1093/bioinformatics/btz413) - Identification of the genetic factors underlying comorbidity between bronchial asthma and hypertension. Eu. J. Hum. Genet. 27(Suppl. 1), 1035–1036, 2019
E. Bragina, M. Freidin, O. Saik, O. Zolotareva, I. Goncharova, V. Ivanisenko, V. Dosenko, and R. Hofestädt
(Siehe online unter https://doi.org/10.1038%2Fs41431-019-0408-3) - SeeVis-3D space-time cube rendering for visualization of microfluidics image data. Bioinformatics, 35(10), 1802–1804, 2019
G. Hattab and T. W. Nattkemper
(Siehe online unter https://doi.org/10.1093/bioinformatics/bty889) - Tissue-specific subcellular localization prediction using multi-label Markov random fields. IEEE-ACM Trans. Comput. Biol. Bioinform. 16(5), 1471– 1482, 2019
L. Zhu, R. Hofestädt, and M. Ester
(Siehe online unter https://doi.org/10.1109/tcbb.2019.2897683) - Tissue-specific subcellular localization prediction using multi-label Markov random fields. IEEE-ACM Trans. Comput. Biol. Bioinform. 16(5), 1471– 1482, 2019
L. Zhu, R. Hofestädt, and M. Ester
(Siehe online unter https://doi.org/10.1109/TCBB.2019.2897683)