Project Details
Similarity Search for Richly Annotated Structured Patient Cases
Applicant
Dr. Johannes Starlinger
Subject Area
Security and Dependability, Operating-, Communication- and Distributed Systems
Term
from 2016 to 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 314301216
Similarity search is becoming an ever more important tool for managing and analyzing the increasing amounts of data available in digital form in various disciplines today. Complex analyses such as grouping of similar data objects, classification of new objects, or the detection of unusual, e.g., erroneous, objects can only be performed when specific similarity measures and effective algorithms are available. Many applications today deal with data that stems from several different sources, and include structured, semi-structured, and unstructured data. Similarity search for these applications poses two important research questions: Firstly, methods have to be found to integrate data from unstructured text into the similarity assessment, e.g., by transforming this information into a structured representation, especially preserving relevant process information. Secondly, similarity search measures and algorithms have to be defined that are able to compare the resulting annotation-enriched process representations. One particluar area of application where such process-aware similarity search is highly relevant is clinical medicine: During a stay at a hospital, a patient typically follows an (often implicit) process along which he/she undergoes several diagnostic examinations in order to determine the exact diagnosis for the patient's disease or diseases. The results of each such examination are typically recorded in separate, unstructured or semi-structured documents and added to the patient's personal (electronic) health record (EHR). Individual patients' records also contain personal information, such as a date of birth, and, nowadays increasingly often, laboratory measurements and genetic information. The current lack of similarity search methods over this data not only poses hinderances when conducting clinical studies and trials, but also prevents the rich knowledge collected in hospitals to be directly used in everyday clinical healthcare, e.g., for ad hoc discovery of patients with a similar set of symptoms, disease history, or genetic profiles. How the different pieces of data from EHRs are to be included, weighted and combined to best support process-aware similarity search over medical patient cases for clinical decision support is still unclear. To the best of our knowledge, neither the process-oriented extraction of information from EHR, nor similarity search over such processes, richly annotated with extracted clinically relevant data, have been previously investigated. Our project will target these clinical instantiations of the two research questions formulated above. We will apply existing information extraction methods to clinical documents to construct richly annotated process-structured representations of individual patients' EHR. Based on these representations, we will research similarity measures to allow for effective comparison of patient cases, and evaluate all methods on the task of clinical decision support.
DFG Programme
Research Grants