Project Details
ScalableMine: Scalable Hierarchical Process Mining in Event-Stream Systems
Applicant
Professor Dr. Wilhelm Hasselbring
Subject Area
Software Engineering and Programming Languages
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term
since 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 496119880
Process mining challenges include scalability, i.e., dealing with volume, velocity and variability of input data, especially in online settings using event streams. Online means in this context that the events are processed immediately as they arrive in a continuous stream. Scaling process mining is required to establish process mining as a continuous company-wide activity, not to be seen as a single project. Thus, scalable processing of continuous event streams and process fragments on cloud infrastructures is required for efficient and effective streaming process mining. Scalability is the ability of a software system to sustain increasing workloads with adequate performance provided that hardware resources are added. When considering continuous streams of events, often an integration of multiple such streams is required. With the traditional approach to process mining of first writing all events to a (relational) database and then querying this integrated database, processing the events is straightforward. Processing continuous event streams raises new challenges. This is in particular the case when requirements for scalability have to be considered, as we intend to address with ScalableMine. With ScalableMine, we contribute to the seminal work on streaming process mining by designing and benchmarking scalable event processing algorithms and architectures to aggregate events of multiple event streams online for process mining in (near) real time. Streaming process mining has to be scalable to cope with the high volume and velocity of events from distributed sources. With ScalableMine, new algorithms and architectures for scalable streaming process mining will be designed and investigated. Representative, specific benchmarks are missing for systematically evaluating the scalability of approaches to streaming process mining. With ScalableMine, such benchmarks will be designed, implemented, evaluated and published for the community. The synergies of SOURCED make it possible to address the challenges of scalability in the context of process mining appropriately. For example, abstraction models for distributed streaming process mining will support benchmarking online data aggregation. Resource-aware process mining algorithms on edge and cloud infrastructures will support edge vs. cloud scalability benchmarking. The TinyWorkPlace House will provide realistic load profiles for benchmarking.
DFG Programme
Research Units