Project Details
Microgastropod Taxon-Omics: Towards a probabilistic and automated species-discovery system
Applicant
Professor Dr. Thomas Wilke
Subject Area
Systematics and Morphology (Zoology)
Term
from 2017 to 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 351198199
Over the past 250 years, taxonomy yielded ca. 1.5 Mio. described species. However, the actual number of species is estimated to range broadly from 3 to 100 Mio. Some scientists are deeply concerned that many species will disappear before being named. However, it is not straightforward to significantly reduce the share of undescribed species because there is an inherent conflict between two main interests of taxonomy - quality of delimitation/description and speed of delimitation/description. A possible solution to this problem could be the integration of available museum information with omics data and the application of novel tools for (semi-) automated species delimitation. With the increasing amount of available genomic data, several methods for inferring species boundaries and determining species were developed based on phylogenetic or barcoding information. However, non-genetic information shown to be of diagnostic value, such as morphology and biogeography, are rarely integrated in automated species-delimitation systems. Problems with such integrative approaches comprise i) the need for a robust a priori phylogeny, ii) an increase in computational burden, iii) difficulties with the usage of mixed data types, and iv) a restriction to species-poor taxa.Given these problems with phylogeny- or barcode-based species determination systems, we here propose the development of a novel probabilistic and semi-automated Species Discovery System (proSDS). It is based on an integrated reference dataset (comprising anatomical, 3D-morphological, genetic, ecological, and biogeographical information) and uses supervised machine-learning approaches for dynamically delimitating species. The project comprises five specific objectives 1) to establish a curated reference database, 2) to test the power of candidate approaches for coding the information, 3) to test the power of candidate discrimination methods for delimitating species, 4) to test candidate procedures that address the problem of missing data, and 5) to implement the approach into an R-based interface (proSDS), which will be made publically available.Our proSDS is innovative in several regards. It builds on integrated data, can handle missing data, is applicable to fossil data, allows for the usage of mixed data types, retrieves some of the trait information automatically from public databases, uses novel trait codings such as the fractal dimension D, dynamically updates trait weights and classification rules, and provides the user with probabilities that a queried specimen belongs to a known or novel species as well as with information on the individual contribution of the underlying traits. Though this novel approach might be applicable to taxa across kingdoms, we here use the species-rich and taxonomically 'notorious difficult' microgastropod family Hydrobiidae as model taxon for developing and testing the approach.
DFG Programme
Priority Programmes
International Connection
Brazil
Co-Investigator
Professor Dr. André Backes