Project Details
Projekt Print View

DSI2: Learning Data Structure Behaviour from Executions of Pointer Programs

Subject Area Software Engineering and Programming Languages
Term from 2014 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 242347041
 
Final Report Year 2024

Final Report Abstract

Software development in industry focuses on evolution and maintenance rather than creating new software. While information about a software component's behaviour is usually available and enough for developers to reuse the component, it is often insufficient for conducting maintenance tasks: developers need a detailed understanding of the component's internals, including how data is organized. Indeed, the choice of data structures is a key implementation aspect and significantly contributes to software complexity. For example, when non-functional properties like low memory usage are crucial, developers tend to implement new data structures. Understanding these (non-standard) structures is vital for comprehending the software as a whole. This project focused on analyzing dynamic data structures, with an emphasis on their state, i.e., how data is organized in memory. State information falls into two categories: Structural aspects concern the shape created by linkage structure, e.g., whether that structure forms a binary tree. Semantic aspects impose constraints, e.g., whether a tree meets the binary search tree criteria. We were especially interested in how a data structure's formal specification can be derived automatically from a program and enhance program comprehension. Program heap visualization can also benefit from formal characterizations, and software verification requires a formal specification to prove correctness, too. The results of the project’s first half demonstrated via our novel dynamic analyzer for extracting, identifying and aggregating pointer linkages, so called strands, that dynamic data structures can be identified reliably from source code executions, even when nonstandard implementations are involved. To some extent, it is also possible to obtain data structure information from compiled binaries, e.g., from malware. However, analysis quality heavily depends on tools, which are not publicly available, for extracting type information from binaries. This led us to focus in the project’s second half on programs in byte code, like those compiled from Java, where type information is explicit. We first developed a state-level analysis that generates formal shape predicates to describe dynamic data structures. To enhance program comprehension, we showed how this shape information can be used to automatically create data structure state abstractions for visualization. We also designed a lightweight language for encoding similar abstractions when no formal predicate is available. In addition, we devised a new analysis that considers invalid data structure states, too, which is needed to obtain precise constraints describing also a structure’s semantic aspects. When dealing with classes exhibiting complex linkage structures, our obtained constraints effectively provide a shape predicate specification that can serve as a starting point for program verification.

Publications

  • dsOli: data structure operation location and identification. Proceedings of the 22nd International Conference on Program Comprehension, 6617(2014, 6, 2), 48-52. ACM.
    White, David H.
  • dsOli2: Discovery and Comprehension of Interconnected Lists in C Programs. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS), Bericht2015-IX-1, pp.686-700. TU Vienna, 2015
    White, David H.; Rupprecht, Thomas & Lüttgen, G.
  • Learning Assertions to Verify Linked-List Programs. Lecture Notes in Computer Science (2015), 37-52. Springer International Publishing.
    Mühlberg, Jan Tobias; White, David H.; Dodds, Mike; Lüttgen, Gerald & Piessens, Frank
  • DSI: an evidence-based approach to identify dynamic data structures in C programs. Proceedings of the 25th International Symposium on Software Testing and Analysis, 2008(2016, 7, 18), 259-269. ACM.
    White, David H.; Rupprecht, Thomas & Lüttgen, Gerald
  • POSTER. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016, 10, 24), 1772-1774. ACM.
    Rupprecht, Thomas; Chen, Xi; White, David H.; Mühlberg, Jan Tobias; Bos, Herbert & Lüttgen, Gerald
  • DSI: Automated Detection of Dynamic Data Structures in C Programs and Binary Code. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS), vol. Math/Inf/02/2017 of Jenaer Schriften zur Mathematik und Informatik, pp. 134-147. Universität Jena, 2017
    Rupprecht, Thomas; Boockmann, Jan H.; White, David H. & Lüttgen, Gerald
  • DSIbin: Identifying dynamic data structures in C/C++ binaries. 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), 2304(2017, 10), 331-341. IEEE.
    Rupprecht, Thomas; Chen, Xi; White, David H.; Boockmann, Jan H.; Luttgen, Gerald & Bos, Herbert
  • Generating Inductive Shape Predicates for Runtime Checking and Formal Verification. Lecture Notes in Computer Science (2018), 64-74. Springer International Publishing.
    Boockmann, Jan H.; Lüttgen, Gerald & Mühlberg, Jan Tobias
  • Shape Inference from Memory Graphs (Extended Abstract). Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS), pp. 69-71. Shaker Verlag, 2019. ISBN: 978-3-8440-6933-4
    Boockmann, Jan H. & Lüttgen, Gerald
  • Data Structure Identification from Executions of Pointer Programs. University of Bamberg, Germany, Germany, 2020. ISBN: 978-3-86309-717-2
    Rupprecht, Thomas
  • Learning Data Structure Shapes from Memory Graphs. EPiC Series in Computing, 73(NA), 151-132. EasyChair.
    Boockmann, Jan H. & Luettgen, Gerald
  • Tagungsband zum 21. Kolloquium Programmiersprachen und Grundlagen der Programmierung. Kiel Computer Science Series, KCSS (2021, 9, 30). Department of Computer Science of the Faculty of Engineering, Kiel University.
    Boockmann, Jan H.; Jacob, Kerstin & Lüttgen, Gerald
  • Heap Patterns for Memory Graph Visualization. 2022 Working Conference on Software Visualization (VISSOFT) (2022, 10), 162-166. IEEE.
    Boockmann, Jan H. & Luttgen, Gerald
  • Model-driven Engineering for Dynamic Data Structures. Softwaretechnik-Trends 42(4):2-7 (2022)
    Boockmann, Jan H.; Jacob, Kerstin & Lüttgen, Gerald
  • Shape-analysis driven memory graph visualization. Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, 168(2022, 5, 16), 298-308. ACM.
    Boockmann, Jan H. & Lüttgen, Gerald
  • Breaking Class Invariants Through Program Mutation-based Object State Space Exploration. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS), vol. AIB-2023-03 of Aachener Informatik-Berichte, pp. 3-4. RWTH Aachen University, 2023
    Boockmann, Jan H. & Lüttgen, Gerald
  • Comprehending Object State via Dynamic Class Invariant Learning. Lecture Notes in Computer Science (2024), 143-164. Springer Nature Switzerland.
    Boockmann, Jan H. & Lüttgen, Gerald
  • On the Hunt for Invalid Objects: Exploring the Object State Space with Program Mutants. 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 31(2024, 3, 12), 711-716. IEEE.
    Boockmann, Jan H. & Lüttgen, Gerald
 
 

Additional Information

Textvergrößerung und Kontrastanpassung