Project Details
Hyperbolic Stochastic Neighbor Embeddings
Applicant
Dr. Martin Skrodzki
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Mathematics
Mathematics
Term
from 2021 to 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 455095046
SummaryThe goal of the proposed project is to develop new methods for the visualization of high-dimensional data. Such data consist of elements represented by vectors in a high-dimensional space. They are used in a wide range of different research and application areas. Examples range from measured characteristics of cell probes to collections of medical imaging data and global climate patterns in atmospheric sciences. Displaying these data supports their analysis by making them visually accessible to humans.The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm constructs an embedding of the data into a two- or three-dimensional space in such a way that structures in the data are preserved. By visually inspecting the embedded data, domain experts can gain insight into the structure of the data set. This approach has proven successful in various application areas. For example, rare cell populations in the immune system have been identified.In the proposed project, we aim to further develop the t-SNE approach in three work packages. The first package concerns the scalability of the t-SNE algorithm to the processing of large data sets. This is important because the data sets domain experts need to work with are continuously growing. A first goal of our project is to devise a hierarchical optimization approach for t-SNE that we expect be able to handle large data sets more efficiently than currently used single-scale algorithms.The second work package is concerned with the geometry of the embedding space. Instead of the usual Euclidean geometry, we will investigate embeddings into hyperbolic space. Hyperbolic and Euclidean geometry are fundamentally different, for example, the area contained in a circle in the hyperbolic plane grows exponentially with the radius of the circle while in the Euclidean plane it grows only polynomially. Network analysis research found that many real world networks, such as the Internet or large social networks, follow a hyperbolic pattern. Recent results on multidimensional scaling indicate that also high-dimensional data has a hyperbolic structure. Therefore, we will devise a novel t-SNE-like embedding algorithm that embeds into hyperbolic space. The challenges lie in the modeling of the optimization problem that defines the hyperbolic t-SNE and in the design of an algorithm to efficiently compute the resulting embeddings.In the third work package, we will create techniques for interacting with hyperbolic t-SNE. Our goal is to develop a Focus+Context approach that allows users to navigate a currently inspected subset of the data (focus), while having the surrounding data (context) available in a single visualization.A common target for all three work packages is to include all findings, algorithms, and improvements into the Cytosplore software. This will allow us to receive feedback from domain experts currently using the software. This feedback will immediately be incorporated into our research.
DFG Programme
WBP Fellowship
International Connection
Netherlands