Project Details
Programming Support for Fault-Tolerant Distributed Live Applications
Applicant
Professorin Dr.-Ing. Mira Mezini
Subject Area
Software Engineering and Programming Languages
Term
from 2019 to 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 415626024
Decentralized distributed computing platforms, comprising a mixture of interconnected web servers, decentralized Clouds, mobile and IoT devices, have encouraged the emergence of distributed data-driven and/or interactive applications, which continuously observe and correlate sensor data, logging activities, other event flows, user actions, etc., and in response update their state in real time. We call these applications distributed live applications. Such applications are inherently complex due to asynchrony and the inverted control of user interactions and event/data flow, which encourages programming in some form of continuation-passing style – an error-prone style leading to the so-called Callback Hell. The issues are amplified in a context where computations are distributed across interconnected machines/devices that are not under the control of a single unit. Our specific focus is on complexity due to disconnects and crashes that are ubiquitous in a such a setting (mobile devices may have poor connectivity and shut down when batteries run low; cloud servers are rebooted without prior notice, and a failed network switch results in lost connections between servers).Our hypothesis is that a good fraction of this complexity is due to poor abstractions offered by the existing programming languages and frameworks for distributed applications, which force developers to program the systems and to reason about them in terms of callbacks/continuations. Reactive programming (RP) originally proposed for enabling direct style programming of interactive desktop applications has the potential to address such complexity. However, RP languages/frameworks lack proper support for fault tolerance. In comparison, actor languages, which are often the programming model of choice in the context of distributed live applications feature less declarative message passing abstractions, while cloud languages and programming platforms for big-data processing are not designed with liveliness in mind. The goal of this proposal is to develop a programming model and language for fault-tolerant distributed live applications that brings the benefits of the declarative direct-style of the RP model to this complex domain to enable a higher-level of (automated) reasoning about such systems in the presence of faults. In particular, we aim to leverage unique features of the RP paradigm to generalize automated fault handling – that frameworks like Spark and Flink provide in an controlled environment – to arbitrary live applications deployed on decentralized distributed systems. Furthermore, we will extend RP abstractions with support for error propagation to enable application developers to explicitly handle faults, whenever automated handling is not possible or meaningful. We will formally model our language to prove it properties and we will implement it on top of an existing reactive language.
DFG Programme
Research Grants