Quantitative Aspekte Grammatik-basierter Kompression
Zusammenfassung der Projektergebnisse
In the QUANT-KOMP project we made progress on several intriguing problems in grammar-based compression. For grammar-based string compression, our main achievements are the following: • We solved several open problems from the seminal paper of Charikar et al. concerning the approximation ratio of grammar-based string compressors. For LZ78 and BiSection we precisely determined the approximation ratio. For the global grammar-based string compressors RePair and Greedy we improved the existing lower bounds on the approximation ratio. For our conference paper, we received the best paper award at SPIRE 2016. The journal version was published at IEEE Transactions on Information Theory. • We solved a long standing open problem concerning the depth of string straight-line programs by showing that every string-straight line program of size n that produces a string of length N can be transformed (in linear time) into a string straight-line program of size O(n) and depth O(log N ). In fact, a more general version of this result that covers also grammarbased tree compression can be shown. The result has many applications for algorithms on grammar-based strings and trees. The major part of our results were obtained in the area of grammar-based string compression. • A classical special case of grammar-based tree compression is compression with DAGs (directed acyclic graphs). For a large variety of different classes of random tree models we proved upper and lower bounds on the average size of the DAG. These results generalize older results of Devroye, Flajolet, Gourdon, Martinez, Sipala, and Steyaert. We used these results in order to obtain upper bounds of o(n) on the average case redundancy for a large class of random tree models. • We also studied intensively grammar-based tree compression using tree/forest straight-line programs. We proved an upper bound of O(n/ log n) on the size of a smallest tree/forest straight-line program for a ranked/unranked tree. Using this bound we developed a grammarbased tree compressor that achieves worst-case redundancy o(n) for several natural random tree models. • Based on tree coverings we came up with a tree compressor that achieves all redundancy bounds that are achieved by the tree compressors from the two previous points. This tree compressor has the advantage that it allows to execute a large variety of query operations on compressed trees in constant time. • Finally, we defined a new version of tree entropy, compared it to other notions of tree entropy and proved that in terms of compression ratio this tree entropy can be achieved by grammarbased tree compressors.
Projektbezogene Publikationen (Auswahl)
- Approximation of smallest linear tree grammars, Information and Computation 251, pp. 215–251, 2016
Artur Jeż and Markus Lohrey
(Siehe online unter https://doi.org/10.1016/j.ic.2016.09.007) - Tree compression using string grammars, Algorithmica 80(3), pp. 885–917, 2018 (special issue for LATIN 2016)
Moses Ganardi, Danny Hucke, Markus Lohrey and Eric Nöth
(Siehe online unter https://doi.org/10.1007/s00453-017-0279-3) - Constructing small tree grammars and small circuits for formulas, Journal of Computer and System Sciences 86, pp. 136–158, 2017
Danny Hucke, Artur Jeż , Moses Ganardi, Markus Lohrey and Eric Nöth
(Siehe online unter https://doi.org/10.1016/j.jcss.2016.12.007) - A universal tree balancing theorem, ACM Transactions on Computation Theory 11(1), Article No. 1, 2018
Moses Ganardi and Markus Lohrey
(Siehe online unter https://doi.org/10.1145/3278158) - Universal tree source coding using grammar-based compression, IEEE Transactions on Information Theory 65(10), pp. 6399–6413, 2019
Moses Ganardi, Danny Hucke, Markus Lohrey and Louisa Seelbach
(Siehe online unter https://doi.org/10.1109/TIT.2019.2919829) - Grammar-based compression of unranked trees, Theory of Computing Systems 61(1), pp. 141–176, 2020 (special issue for CSR 2018)
Adria Gascon, Markus Lohrey, Sebastian Maneth, Carl Philipp Reh and Kurt Sieber
(Siehe online unter https://doi.org/10.1007/s00224-019-09942-y) - On the collection of fringe subtrees in random binary trees, Proceedings of LATIN 2020, LNCS 12118, pp. 546–558, Springer 2020
Louisa Seelbach and Stephan Wagner
(Siehe online unter https://doi.org/10.1007/978-3-030-61792-9_43) - The smallest grammar problem revisited, IEEE Transactions on Information Theory 67(1), pp. 317–328, 2020
Hideo Bannai, Momoko Hirayama, Danny Hucke, Shunsuke Inenaga, Artur Jeż, Markus Lohrey and Carl Philipp Reh
(Siehe online unter https://doi.org/10.1109/TIT.2020.3038147) - Balancing straight-line programs, Journal of the ACM 68(4), Article No. 27, pp. 1–40
Moses Ganardi, Artur Jeż , Markus Lohrey
(Siehe online unter https://doi.org/10.1145/3457389) - Hypersuccinct trees – new universal tree source codes for optimal compressed tree data structures and range minima, ESA 2021
Ian Munro, Patrick K. Nicholson, Louisa Seelbach Benkner and Sebastian Wild
(Siehe online unter https://doi.org/10.4230/LIPIcs.ESA.2021.70)