^TIS fly* Faculty of Informatics Masaryk University Czech Republic From Infinite-State Systems to Translation of LTL to Automata Habilitation Thesis (Collection of Articles) Jan Strejček April 2012 Abstract As computer systems are becoming ubiquitous and their importance and complexity are constantly increasing, the importance of verification tools applicable on these systems grows as well. Development of verification tools is usually preceded by studying decidability and complexity of various verification problems for suitable models of computer systems. While some computer systems or their parts can be appropriately modeled by finite-state models, the other systems need to be described by infinite-state models. Formalisms for description of infinite-state models can be divided into two basic categories: formalisms with Turing power and weaker formalisms. The formalisms of the first category have enough expressive power, but nearly all important verification problems are undecidable for them. Weaker formalisms can provide a good compromise between expressiveness and decidability of interesting problems. Many classical formalisms from the second category, e.g. Petri nets or pushdown systems, are covered by the formalism of Process Rewrite Systems (PRS). This thesis maps a part of author's research in the area of infinite-state systems and verification in general. We deal mainly with classes of infinite-state systems subsumed in the PRS formalism and its extensions. We study expressive power of these classes and decidability of reachability problem, model checking problem, and equivalence checking problem for these classes. The thesis contain also several results from related areas. In particular, we present decidability of a bounded reachability problem for a Turing-powerful formalism called Asynchronous Dynamic Pushdown Networks. Our results on model checking of PRS systems against formulae of Linear Temporal Logic (LTL) lead to interesting observations about a certain LTL fragment. These observations allow us to significantly improve translation of LTL to Biichi automata, which is a standard part of many LTL model checking algorithms for both finite and infinite-state systems. The thesis has the form of a collection of articles accompanied by a commentary. The collection contains four journal papers and seven papers published in the proceedings of international conferences or workshops. One paper was written solely by the author of the thesis. In case of all other papers, the contribution is at least proportional to the number of co-authors. The author contributed to these papers in various ways: often by suggesting the topic, partly by bringing the ideas of solution, almost always by discussing and improving solutions and writing down significant parts of texts. iii Abstrakt Jak se počítačové systémy stávají všudypřítomnými a jak roste jejich význam i složitost, roste i význam nástrojů schopných tyto systémy verifikovat. Vývoji verifikačního nástroje zpravidla předchází výzkum rozhodnutelnosti a složitosti různých verifikačních problémů na vhodných modelech počítačových systémů. Zatímco některé systémy lze adekvátně modelovat pomocí konečně stavových modelů, jiné systémy je třeba modelovat jako nekonečně stavové. Formalismy pro popis nekonečně stavových modelů lze rozdělit do dvou základních kategorií: formalismy s Turingovskou silou a slabší formalismy. Formalismy první kategorie mají velkou popisnou sílu, ale téměř všechny důležité verifikační problémy jsou pro ně nerozhod-nutelné. Slabší formalismy mohou nabídnout dobrý kompromis mezi vyjadřovací sílou a rozhodnutelnosti zajímavých problémů. Mnoho běžných formalismů z druhé kategorie (jako Petriho sítě nebo zásobníkové systémy) je zastřešeno formalismem procesových přepisovacích systémů (PRS). Tato habilitační práce mapuje část autorova výzkumu v oblasti nekonečně stavových systémů a verifikace obecně. Zabýváme se především třídami nekonečně stavových systémů pokrytých formalismem PRS a jeho rozšířeními. Zkoumáme vyjadřovací sílu těchto tříd a rozhodnutelnost problému dosažitelnosti, problému ověřování modelu a problému ověření ekvivalence pro tyto třídy. Habilitační práce dále obsahuje několik výsledků ze souvisejících oblastí. Zejména prezentujeme rozhodnutelnost omezené dosažitelnosti pro formalismus asynchronních sítí dynamických zásobníkových systémů, který má Turingovskou sílu. Naše výsledky o ověřování modelu pro PRS systémy a formule lineární temporální logiky (LTL) vedly k zajímavým poznatkům o jistém fragmentu LTL. Tyto poznatky nám dovolují významně zlepšit překlad LTL na Bůchiho automaty, který je standardní součástí mnoha algoritmů pro ověřování modelu LTL vlastností pro konečně i nekonečně stavové systémy. Tato habilitační práce je koncipována jako soubor uveřejněných vědeckých prací doplněný komentářem (§72 odst. 3 písmena b zákona č. 111/1998 Sb., o vysokých školách). Soubor obsahuje čtyři časopisecké články a sedm článků ze sborníků mezinárodních konferencí a seminářů. U jednoho článku je uchazeč jediným autorem. U ostatních článků je jeho autorský podíl přinejmenším úměrný počtu autorů. Uchazeč se podílel na vzniku těchto článků různými způsoby: často navrhl téma, někdy přišel s myšlenkami vedoucími k řešení, téměř vždy diskutoval a vylepšoval řešení a napsal podstatné části textů. v Acknowledgments First of all, I would like to thank Mojmír Křetínský and Vojtěch Řehák for fruitful and long-lasting collaboration and wide-ranging discussions. I thank Mojmír Křetínský also for reading a draft of this thesis. Many thanks go to other co-authors of papers included in this collection, namely to Tomáš Babiak, Ahmed Bouajjani, Laura Bozzelli, Javier Esparza, Stefan Schwoon, and Tayssir Touili. Last but not least, I thank my wife Adriana and our children Zorka and Andy for their constant support, infinite patience, and love. vii Contents I Commentary 1 1 Introduction 3 1.1 Motivation.................................... 3 1.2 Focus....................................... 5 2 State of the Art 7 2.1 Process Rewrite Systems............................ 7 2.1.1 Reachability Problem.......................... 9 2.1.2 Model Checking Problem....................... 9 2.1.3 Equivalence Checking Problem.................... 10 2.2 Translation of LTL to Buchi Automata.................... 11 3 Thesis Contribution 13 3.1 Process Rewrite Systems............................ 13 3.1.1 Reachability Problem.......................... 13 3.1.2 Model Checking Problem....................... 15 3.1.3 Equivalence Checking Problem.................... 17 3.2 Translation of LTL to Biichi Automata.................... 17 4 Papers in Collection 21 4.1 Journal Papers.................................. 21 4.2 Proceedings Papers............................... 22 5 Bibliography 23 ix Parti Commentary Chapter 1 Introduction 1.1 Motivation The significance of computer systems for our society is constantly increasing. A failure of these systems can have an enormous negative impact on an individual, a company, or even a country. Hence, there is a great demand for methods and technologies to increase reliability of computer systems. The traditional techniques like simulation, testing, and error-preventing methodologies for system design are clearly not sufficient. In particular, these methods aim to eliminate errors, but they do not demonstrate an absence of errors, i.e. they do not prove correctness of the systems. Proving correctness of computer systems is the original motivation for development of formal methods, which started in the late sixties. The area of formal methods now subsume formalisms for specification and modeling of systems as well as techniques and algorithms for analysis and verification of systems or their models. The modeling formalisms can be roughly divided into the following three categories according to their expressive power. Formalisms describing finite-state systems This category contains formalism which can describe only systems with a finite number of states. The category contains some low-level formalisms as finite labeled transition systems or finite Kripke structures as well as many high-level formalisms. High-level formalisms are typically designed for compact description of large finite-state systems. Finite-state systems are also called finite systems. Formalisms with Turing power This category contains formalisms that can accurately model all Turing machines. As a Turing machine can in one computation successively enter infinitely-many different states, the formalism can model systems with an infinite number of states. Formalisms describing infinite-state systems without Turing power This category contains formalisms that can describe systems with an infinite number of states, but they cannot accurately model all Turing machines. 3 Formalisms of each category have some benefits and drawbacks. Finite systems can faithfully model all hardware systems and thus also all self-contained software/hardware systems. Moreover, all verification problems are decidable for finite systems. Unfortunately, finite-state models of real systems are often extremely large even if their description in a suitable formalism can be relatively small. Thus, they cannot be effectively analyzed by standard algorithms designed for finite-state systems. It may therefore be more convenient to model them in formalisms for description of infinite-state systems and analyze them by the corresponding algorithms. Modeling extremely large finite systems is not the only motivation for formalisms describing infinite-state systems. For example, if we want to analyze algorithms rather then programs, we also need to model them as infinite-state systems. The reason is that algorithms typically work with unbounded data domains, while all data domains in real programs are finite. Formalisms with Turing power have an obvious drawback: even the very basic verification problems like reachability of a given state are not decidable. In practice, models in these formalisms are analyzed using algorithms with some imprecision. For example, there are bounded analysis approaches examining only a part of a state space. Formalisms describing infinite-state systems without Turing power represent a compromise between expressive power and decidability of verification problems. Formal methods cover several verification problems. The following three are the most traditional and the most prominent. Reachability problem is the problem to decide whether a given state of a given system is reachable from the initial state or not. A typical application of reachability problem is verification of a basic safety property: a system is considered to be safe if a given error state is not reachable. Model checking problem is the problem to decide whether each run of a given system satisfies a given specification. The specification is usually described by a formula of some temporal logic. Equivalence checking problem is the problem to decide whether two given systems, typically a specification and its implementation, are equivalent with respect to a given behavioral equivalence. All the mentioned problems are decidable for finite systems regardless the temporal logic used for specification in the case of model checking or behavioral equivalence in case of equivalence checking. On the contrary, none of the problems is decidable for formalisms with Turing power (unless we consider some trivial temporal logic or behavioral equivalence). The decidability of these problems for formalisms describing infinite-state systems without Turing power depends on concrete formalism or, in case of model checking or equivalence checking, on a combination of a concrete formalism and temporal logic or behavioral equivalence. 4 1.2 Focus In this thesis, we focus mainly on the expressiveness and decidability issues of formalisms covered by Process Rewrite Systems (PRS) [69, 71]. PRS is a term rewriting formalism. Its popularity stems from two facts. First, many classical and well-studied formalisms for description of infinite-state systems can be uniformly obtained from PRS by imposing simple restrictions on syntax of rewrite rules. For example, Petri nets or pushdown systems can be seen as syntactical subclasses of PRS. Second, PRS has not Turing power and some interesting instances of verification problems are decidable for PRS. In the following chapter, we recapitulate basic decidability results for the mentioned verification problems and formalisms subsumed in PRS. In the model checking problem, we focus our attention to action-based semantics only. For the equivalence checking problem, out of dozens studied equivalences surveyed in [96], we consider only two most prominent equivalences, namely strong and weak bisimulation equivalences [73, 79, 74]. The second topic of this thesis is a translation of Linear Temporal Logic (LTD [81] to Biichi automata (BA). LTL is probably the most frequently used formalism for specification of system properties for model checking purposes. A translation of LTL to BA is the first step of many LTL model checking algorithms for both finite and infinite-state systems. The relation between the two topics is seemingly feeble. However, our results in LTL to automata translation are directly motivated by our previous research of LTL model checking problem for PRS classes. Moreover, the results on LTL to automata translation have again direct consequences to decidability of LTL model checking for various classes of infinite-state systems (not only subclasses of PRS). This relationship is explained in Section 3.2. 5 6 Chapter 2 State of the Art 2.1 Process Rewrite Systems The term rewriting formalism called Process Rewrite Systems (PRS) is introduced in [69]. PRS combines both sequential and parallel operators. Hence, it can be seen as an extension of a similar, but purely sequential formalism studied in [20] and a purely sequential and purely parallel formalisms studied in [76]. By imposing various restrictions on syntax of PRS rewrite rules we get several standard classes of infinite-state systems, namely Basic Process Algebras (BPA) [4], Basic Parallel Processes (BPP) [21], Process Algebras (PA) [4], Pushdown Processes (PDA), and Petri Nets (PN). Another syntax restrictions of PRS define the class of all Finite Systems (FS) and classes called PAD [69] (common generalization of PA and PDA) and PAN [68] (common generalization of PA and PN). PRS PAD PDA BPA PAN BPP Figure 2.1: The PRS-hierarchy The mentioned classes form so-called PRS-hierarchy depicted in Figure 2.1. The hierarchy is in fact a Hasse diagram where the classes are partially ordered according to their expressive power as follows: each class represents the set of corresponding labeled transition systems and the partial order compares these sets according to the set 7 sePRS sePAD PRS sePAN PAD sePDA=PDA=seBPA se PA sePN=PN PAN seBPP=MSA BPA BPP seFS=FS Figure 2.2: The state-extended PRS-hierarchy. The dotted lines represent the fact, that strict increase of expressive power is only conjectured. inclusion modulo strong bisimulation equivalence [79, 74]. The shape of the hierarchy copies relations between syntactical restrictions defining the classes. Strictness of the hierarchy follows from [18, 76, 69]. A disadvantage of PRS is its limited expressive power. As PRS subsumes pushdown processes, it can accurately model sequential programs with unlimited recursion and both local and global variables over finite domains. As PRS contains Petri nets, it can model also synchronously communicating parallel systems with dynamic creation of new processes, but the processes cannot have unbounded recursion. PRS cannot model even two asynchronous sequential programs with unbounded recursion where one program sends one message to the other program. The connection between PRS classes programming features is nicely explained in [33]. The modeling power can be improved by enriching PRS with an additional finite-state control unit. The resulting formalism is called state-extended PRS (sePRS) [51,44]. Using the syntactic restriction applied to PRS, we get analogous subclasses of sePRS, denoted by se- prefix. Addition of the finite control unit does not change expressive power of classes FS, PDA, and PN. The expressive power of other PRS subclasses strictly increases: seBPA class has the same expressive power as PDA [20], while seBPP coincides with a previously studied class called Parallel Pushdown Processes (PPDA) [76] or Multiset Automata (MSA) [77]. The strict increase of power is proven also for sePA, sePAD, and sePAN. In fact, we conjecture that the classes sePA, sePAD, sePAN, and sePRS form fresh classes of infinite-state systems. Figure 2.2 depicts the original hierarchy extended with the state-extended classes. The figure also 8 marks relations that are not rigorously proven, but only conjectured. The shape of the hierarchy follows from the definition of sePRS, shape of the PRS hierarchy, results of [20, 71,44], and our result of [55,59]. The mentioned conjectures are discussed in [97]. 2.1.1 Reachability Problem The reachability problem, i.e. the problem to decide whether a given state is reachable from a given initial state, is decidable for PRS [71]. The proof employs the previously known results on decidability of reachability for Petri nets [67] and for pushdown processes [6]. Four out of the five new classes obtained by state-extension have Turing power. Indeed, sePA and its superclasses sePAD, sePAN, and sePRS can encode any Minsky 2-counter machine [75] and thus also any Turing machine. As a result, the reachability problem is undecidable for these classes [5]. The remaining class seBPP is a subclass of Petri nets and hence the reachability problem is also decidable for seBPP. Other versions of reachability are studied as well. For example, reachable property problem, i.e. the problem to decide whether there exists a reachable state where all actions of one set are enabled and all actions of another set are disabled. The decidability of reachable property problem is the same as for the original reachability problem: it is decidable for PRS [71] and undecidable for sePA [5]. A lot of attention has been also devoted to symbolic reachability on various PRS subclasses, where the goal is to compute a finite representations of the set (or its under-or over-approximation) of all states reachable (or backward reachable) from a given set of states. The main results in this area can be found in [64, 36, 34,11, 8,12]. 2.1.2 Model Checking Problem For the model checking problem, we consider branching-time logics of Figure 2.3 and linear-time logic LTL. The decidability boundary of the model checking problem heavily depends on the considered logic. The logic EG and all stronger logics For the logic EG, the problem is decidable for PDA and its subclasses [78, 20]. The same result holds for all stronger logics, as the problem is decidable for PDA even for specification formulated in /x-calculus [19, 98]. On the contrary, the problem is undecidable for the logic EG (and thus also for all stronger logics) and all classes subsuming BPP [35]. In fact, an inspection of this undecidability proof exposes that the undecidability result for BPP holds already for a fragment of EG containing only formulae of the form EG