SimpleCAR: An Efficient Bug-Finding Tool Based on Approximate Reachability Jianwen Li1(B) , Rohit Dureja1 , Geguang Pu2 , Kristin Yvonne Rozier1 , and Moshe Y. Vardi3 1 Iowa State University, Ames, IA, USA lijwen2748@gmail.com 2 East China Normal University, Shanghai, China 3 Rice University, Houston, TX, USA Abstract. We present a new safety hardware model checker SimpleCAR that serves as a reference implementation for evaluating Complementary Approximate Reachability (CAR), a new SAT-based model checking framework inspired by classical reachability analysis. The tool gives a “bottom-line” performance measure for comparing future extensions to the framework. We demonstrate the performance of SimpleCAR on challenging benchmarks from the Hardware Model Checking Competition. Our experiments indicate that SimpleCAR is particularly suited for unsafety checking, or bug-finding; it is able to solve 7 unsafe instances within 1 h that are not solvable by any other state-of-the-art techniques, including BMC and IC3/PDR, within 8 h. We also identify a bug (reports safe instead of unsafe) and 48 counterexample generation errors in the tools compared in our analysis. 1 Introduction Model checking techniques are widely used in proving design correctness, and have received unprecedented attention in the hardware design community [9,16]. Given a system model M and a property P, model checking proves whether or not P holds for M. A model checking algorithm exhaustively checks all behaviors of M, and returns a counterexample as evidence if any behavior violates the property P. The counterexample gives the execution of the system that leads to property failure, i.e., a bug. Particularly, if P is a safety property, model checking reduces to reachability analysis, and the provided counterexample has a finite length. Popular safety checking techniques include Bounded Model Checking (BMC) [10], Interpolation Model Checking (IMC) [21], and IC3/PDR [12,14]. It is well known that there is no “universal” algorithm in model checking; different algorithms perform differently on different problem instances [7]. BMC outperforms IMC on checking unsafe instances, while IC3/PDR can solve instances that BMC cannot and vice-versa. [19]. Therefore, BMC and IC3/PDR are the most popular algorithms in the portfolio for unsafety checking, or bug-finding. c The Author(s) 2018 H. Chockler and G. Weissenbacher (Eds.): CAV 2018, LNCS 10982, pp. 37–44, 2018. https://doi.org/10.1007/978-3-319-96142-2_5 38 J. Li et al. Complementary Approximate Reachability (CAR) [19] is a SAT-based model checking framework for reachability analysis. Contrary to reachability analysis via IC3/PDR, CAR maintains two sequences of over- and under- approximate reachable state-sets. The over-approximate sequence is used for safety checking, and the under-approximate sequence for unsafety checking. CAR does not require the over-approximate sequence to be monotone, unlike IC3/PDR. Both forward (Forward-CAR) and backward (Backward-CAR) reachability analysis are permissible in the CAR framework. Preliminary results show that Forward-CAR complements IC3/PDR on safe instances [19]. We present, SimpleCAR, a tool specifically developed for evaluating and extending the CAR framework. The new tool is a complete rewrite of CARChecker [19] with several improvements and added capabilities. SimpleCAR has a lighter and cleaner implementation than CARChecker. Several heuristics that aid Forward-CAR to complement IC3/PDR are integrated in CARChecker. Although useful, these heuristics make it difficult to understand and extend the core functionalities of CAR. Like IC3/PDR, the performance of CAR varies significantly by using heuristics [17]. Therefore, it is necessary to provide a basic implementation of CAR (without code-bloating heuristics) that serves as a “bottom-line” performance measure for all extensions in the future. To that end, SimpleCAR differs from CARChecker in the following aspects: – Eliminates all heuristics integrated in CARChecker except a configuration option to enable a IC3/PDR-like clause “propagation” heuristic. – Uses UNSAT cores from the SAT solver directly instead of the expensive minimal UNSAT core (MUC) computation in CARChecker. – Poses incremental queries to the SAT solver using assumptions; – While CARChecker contributes to safety checking [19], SimpleCAR shows a clear advantage on unsafety checking. We apply SimpleCAR to 748 benchmarks from the Hardware Model Checking Competition (HWMCC) 2015 [2] and 2017 [3], and compare its performance to reachability analysis algorithms (BMC, IMC, 4 × IC3/PDR, Avy [22], Quip [18]) in state-of-the-art model checking tools (ABC, nuXmv, IIMC, IC3Ref). Our extensive experiments reveal that Backward-CAR is particularly suited for unsafety checking: it can solve 8 instances within a 1-h time limit, and 7 instances within a 8-h time limit not solvable by BMC and IC3/PDR. We conclude that, along with BMC and IC3/PDR, CAR is an important candidate in the portfolio of unsafety checking algorithms, and SimpleCAR provides an easy and efficient way to evaluate, experiment with, and add enhancements to the CAR framework. We identify 1 major bug and 48 errors in counterexample generation in our evaluated tool set; all have been reported to the tool developers. 2 Algorithms and Implementation We present a very high-level overview of the CAR framework (refer [19] for details). CAR is a SAT-based framework for reachability analysis. It maintains SimpleCAR: An Efficient Bug-Finding Tool 39 two over- and under- approximate reachable state sequences for safety and unsafety checking, respectively. CAR can be symmetrically implemented either in the forward (Forward-CAR) or backward (Backward-CAR) mode. In the forward mode, the F-sequence (F0, F1, . . . , Fi) is the over-approximated sequence, while the B-sequence (B0, B1, . . . , Bi) is under-approximated. The roles of the F- and B- sequence are reversed in the backward mode. We focus here on the backward mode of CAR, or Backward-CAR (refer [19] for Forward-CAR) 2.1 High-Level Description of Backward-CAR Table 1. Sequences in Backward-CAR. F-sequence (under) B-sequence (over) Init F0 = I B0 = ¬P Constraint Fi+1 ⊆ R(Fi) Bi+1 ⊇ R−1 (Bi) Safety check - ∃i · Bi+1 ⊆ 0≤j≤i Bj Unsafety check ∃i · Fi ∩ ¬P = ∅ A frame Fi in the F-sequence denotes the set of states that are reachable from the initial states (I) in i steps. Similarly, a frame Bi in the B-sequence denotes the set of states that can reach the bad states (¬P) in i steps. Let R(Fi) represent the set of successor states of Fi, and R−1 (Bi) represent the set of predecessor states of Bi. Table 1 shows the constraints on the sequences and their usage in Backward-CAR for safety and unsafety checking. Alg. 1. High-level description of Backward CAR 1: F0 = I, B0 = ¬P, k = 0; 2: while true do 3: while S(B) ∧ R(S(F)) = ∅ do 4: update F- and B- sequences. 5: if ∃i · Fi ∩ ¬P = ∅ then return unsafe; 6: perform propagation on B-sequence (optional); 7: if ∃i · Bi+1 ⊆ 0≤j≤i Bj then return safe; 8: k = k + 1 and Bk = ¬P; Let S(F) = Fi and S(B) = Bi. Algorithm 1 gives a description of Backward-CAR. The B-sequence is extended exactly once in every iteration of the loop in lines 2–8, but the Fsequence may be extended multiple times in each loop iteration in lines 3–5. As a result, CAR normally returns counterexamples with longer depth compared to the length of the B-sequence. Due to this inherent feature of the framework, CAR is able to complement BMC and IC3/PDR on unsafety checking. 2.2 Tool Implementation SimpleCAR is publicly available [5,6] under the GNU GPLv3 license. The tool implementation is as follows: – Language: C++11 compilable under gcc 4.4.7 or above. – Input: Hardware circuit models expressed as and-inverter graphs in the aiger 1.9 format [11] containing a single safety property. – Output: “1” (unsafe) to report the system violates the property, or “0” (safe) to confirm that the system satisfies the property. A counterexample in the aiger format is generated if run with the -e configuration flag. 40 J. Li et al. – Algorithms: Forward-CAR and Backward-CAR with and without the propagation heuristic (enabled using the -p configuration flag). – External Tools: Glucose 3.0 [8] (based on MiniSAT [15]) is used as the underlying SAT solver. Aiger tools [1] are used for parsing the input aiger files to extract the model and property information, and error checking. – Differences with CARChecker [19]: The Minimal Unsat Core (MUC) and Partial Assignment (PA) techniques are not utilized in SimpleCAR, which allows the implementation to harness the power of incremental SAT solving. 3 Experimental Analysis 3.1 Strategies Tools. We consider five model checking tools in our evaluation: ABC 1.01 [13], IIMC 2.01 , Simplic3 [17] (IC3 algorithms used by nuXmv for finite-state systems2 ), IC3Ref [4], CARChecker [19], and SimpleCAR. For ABC, we evaluate BMC (bmc2), IMC (int), and PDR (pdr). There are three different versions of BMC in ABC: bmc, bmc2, and bmc3. We choose bmc2 based on our preliminary analysis since it outperforms other versions. Simplic3 proposes different configuration options for IC3. We use the three best candidate configurations for IC3 reported in [17], and the Avy algorithm [22] in Simplic3. We consider CARChecker as the original implementation of the CAR framework and use it as a reference implementation for SimpleCAR. A summary of the tools and their arguments used for experiments is shown in Table 2. Overall, we consider four categories of algorithms implemented in the tools: BMC, IMC, IC3/PDR, and CAR. Benchmarks. We evaluate all tools against 748 benchmarks in the aiger format [11] from the SINGLE safety property track of the HWMCC in 2015 and 2017. Error Checking. We check correctness of results from the tools in two ways: 1. We use the aigsim [1] tool to check whether the counterexample generated for unsafe instances is a real counterexample by simulation. 2. For inconsistent results (safe and unsafe for the same benchmark by at least two different tools) we attempt to simulate the unsafe counterexample, and if successful, report an error for the tool that returns safe (surprisingly, we do not encounter cases when the simulation check fails). Platform. Experiments were performed on Rice University’s DavinCI cluster, which comprises of 192 nodes running at 2.83 GHz, 48 GB of memory and running RedHat 6.0. We set the memory limit to 8 GB with a wall-time limit of an hour. Each model checking run has exclusive access to a node. A time penalty of one hour is set for benchmarks that cannot be solved within the time/memory limits. 1 We use version 2.0 available at https://ryanmb.bitbucket.io/truss/ – similar to the version available at https://github.com/mgudemann/iimc with addition of Quip [18]. 2 Personal communication with Alberto Griggio. SimpleCAR: An Efficient Bug-Finding Tool 41 Table 2. Tools and algorithms (with category) evaluated in the experiments. Tool Algorithm Configuration Flags ABC BMC (abc-bmc) -c ‘bmc2’ IMC (abc-int) -c ‘int’ PDR (abc-pdr) -c ‘pdr’ IIMC IC3 (iimc-ic3) -t ic3 --ic3 stats --print cex --cex aiger Quip [18] (iimc-quip) -t quip --quip stats --print cex --cex aiger IC3Ref IC3 (ic3-ref) -b Simplic3 IC3 (simplic3-best1) -s minisat -m 1 -u 4 -I 0 -O 1 -c 1 -p 1 -d 2 -G 1 -P 1 -A 100 IC3 (simplic3-best2) -s minisat -m 1 -u 4 -I 1 -D 0 -g 1 -X 0 -O 1 -c 0 -p 1 -d 2 -G 1 -P 1 -A 100 IC3 (simplic3-best3) -s minisat -m 1 -u 4 -I 0 -O 1 -c 0 -p 1 -d 2 -G 1 -P 1 -A 100 -a aic3 Avy [22] (simplic3-avy) -a avy CARChecker Forward CAR (carchk-f) -f Backward CAR (carchk-b) -b SimpleCAR Forward CAR† (simpcar-f) -f -e Backward CAR† (simpcar-b) -b -e Forward CAR‡ (simpcar-fp) -f -p -e Backward CAR‡ (simpcar-bp) -b -p -e IC3/ PDR CAR with heuristics for minimal unsat core (MUC) [20], partial assignment [23], and propagation. † no heuristics ‡ with heuristic for PDR-like clause propagation 3.2 Results Error Report. We identify one bug in simplic3-best3: reports safe instead of unsafe, and 48 errors with respect to counterexample generation in iimc-quip algorithm (26) and all algorithms in the Simplic3 tool (22). At the time of writing, the bug report sent to the developers of Simplic3 has been confirmed. In our analysis, we assume the results from these tools to be correct. BMC IMC IC3/PDR CAR Algorithm Category 0 50 100 150 NumberofUnsafeBenchmarks 147 131 136 120 9 156 9 145 8 128 solved uniquely solved Fig. 1. Number of benchmarks solved by each algorithm category (run as a portfolio). Uniquely solved benchmarks are not solved by any other category. Coarse Analysis. We focus our analysis to unsafety checking. Figure 1 shows the total number of unsafe benchmarks solved by each category (assuming portfolio-run of all algorithms in a category). CAR complements BMC and IC3/PDR by solving 128 benchmarks of which 8 are not solved by any other category. Although CAR solves the least amount of total benchmarks, the count of the uniquely solved benchmarks is comparable to other categories. When the walltime limit (memory limit does not change) is increased to 8 h, BMC and IC3/PDR can only solve one of the 8 uniquely solved 42 J. Li et al. abc-pdr simplic3-best1 simplic3-best2 simplic3-best3 simplic3-avy iimc-ic3 iimc-quip ic3-ref 0 25 50 75 100 125 NumberofUnsafeBenchmarks 102 124 127 130 101 91 74 109 6 108 6 130 6 133 4 134 4 105 4 95 6 115 solved distinctly solved (a) Algorithms in IC3/PDR category carchk-f carchk-b simpcar-f simpcar-b simpcar-fp simpcar-bp 0 25 50 75 100 125 NumberofUnsafeBenchmarks 64 105 11 115 11 113 64 1 106 5 120 8 121 solved distinctly solved (b) Algorithms in CAR category Fig. 2. Number of benchmarks solved by every algorithm in a category. Distinctly solved benchmarks by an algorithm are not solved by any algorithm in other categories. The set union of distinctly solved benchmarks for all algorithms in a category equals the count of uniquely solved for that category in Fig. 1. benchmarks by CAR. The analysis supports our claim that CAR complements BMC/IC3/PDR on unsafety checking. Granular Analysis. Figure 2 shows how each algorithm in the IC3/PDR (Fig. 2a) and CAR (Fig. 2b) categories performs on the benchmarks. simpcar-bp distinctly solves all 8 benchmarks uniquely solved by the CAR category (Fig. 1), while no single IC3/PDR algorithm distinctly solves all uniquely solved benchmarks in the IC3/PDR category. In fact, a portfolio including at least abc-pdr, simplic3-best1, and simplic3-best2 solves all 8 instances uniquely solved by the IC3/PDR category. It is important to note that SimpleCAR is a very basic implementation of the CAR framework compared to the highly optimized implementations of IC3/PDR in other tools. Even then simpcar-b outperforms four IC3/PDR implementations. Our results show that Backward-CAR is a favorable algorithm for unsafety checking. Analysis Conclusions. Backward-CAR presents a more promising research direction than Forward-CAR for unsafety checking. We conjecture that the performance of Forward- and Backward- CAR varies with the structure of the aiger model. Heuristics and performance-gain present a trade-off. simpcar-bp has a better performance compared to the heuristic-heavy carchk-b. On the other hand, simpcar-bp solves the most unsafe benchmarks in the CAR category, however, adding the “propagation” heuristic effects its performance: there are several benchmarks solved by simpcar-b but not by simpcar-bp. 4 Summary We present SimpleCAR, a safety model checker based on the CAR framework for reachability analysis. Our tool is a lightweight and extensible implementation SimpleCAR: An Efficient Bug-Finding Tool 43 of CAR with comparable performance to other state-of-the-art tool implementations of highly-optimized unsafety checking algorithms, and complements existing algorithm portfolios. Our empirical evaluation reveals that adding heuristics does not always improve performance. We conclude that Backward-CAR is a more promising research direction than Forward-CAR for unsafety checking, and our tool serves as the “bottom-line” for all future extensions to the CAR framework. Acknowledgments. This work is supported by NSF CAREER Award CNS- 1552934, NASA ECF NNX16AR57G, NSF CCF-1319459, and NSFC 61572197 and 61632005 grants. Geguang Pu is also partially supported by MOST NKTSP Project 2015BAG19B02 and STCSM Project No. 16DZ1100600. References 1. AIGER Tools. http://fmv.jku.at/aiger/aiger-1.9.9.tar.gz 2. HWMCC 2015. http://fmv.jku.at/hwmcc15/ 3. HWMCC 2017. http://fmv.jku.at/hwmcc17/ 4. IC3Ref. https://github.com/arbrad/IC3ref 5. SimpleCAR Source. https://github.com/lijwen2748/simplecar/releases/tag/v0.1 6. SimpleCAR Website. http://temporallogic.org/research/CAV18/ 7. Amla, N., Du, X., Kuehlmann, A., Kurshan, R.P., McMillan, K.L.: An analysis of SAT-based model checking techniques in an industrial environment. In: Borrione, D., Paul, W. (eds.) CHARME 2005. LNCS, vol. 3725, pp. 254–268. Springer, Heidelberg (2005). https://doi.org/10.1007/11560548 20 8. Audemard, G., Simon, L.: Predicting learnt clauses quality in modern sat solvers. In: IJCAI (2009) 9. Bernardini, A., Ecker, W., Schlichtmann, U.: Where formal verification can help in functional safety analysis. In: ICCAD (2016) 10. Biere, A., Cimatti, A., Clarke, E.M., Fujita, M., Zhu, Y.: Symbolic model checking using SAT procedures instead of BDDs (1999) 11. Biere, A.: AIGER Format. http://fmv.jku.at/aiger/FORMAT 12. Bradley, A.R.: SAT-based model checking without unrolling. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 70–87. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-18275-4 7 13. Brayton, R., Mishchenko, A.: ABC: an academic industrial-strength verification tool. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 24–40. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6 5 14. Een, N., Mishchenko, A., Brayton, R.: Efficient implementation of property directed reachability. In: FMCAD (2011) 15. E´en, N., S¨orensson, N.: An extensible SAT-solver. In: Giunchiglia, E., Tacchella, A. (eds.) SAT 2003. LNCS, vol. 2919, pp. 502–518. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24605-3 37 16. Golnari, A., Vizel, Y., Malik, S.: Error-tolerant processors: formal specification and verification. In: ICCAD (2015) 17. Griggio, A., Roveri, M.: Comparing different variants of the IC3 algorithm for hardware model checking. IEEE Trans. Comput-Aided Des. Integr. Circuits Syst. 35(6), 1026–1039 (2016) 18. Ivrii, A., Gurfinkel, A.: Pushing to the top. In: FMCAD (2015) 44 J. Li et al. 19. Li, J., Zhu, S., Zhang, Y., Pu, G., Vardi, M.Y.: Safety model checking with complementary approximations. In: ICCAD (2017) 20. Marques-Silva, J., Lynce, I.: On improving MUS extraction algorithms. In: Sakallah, K.A., Simon, L. (eds.) SAT 2011. LNCS, vol. 6695, pp. 159–173. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21581-0 14 21. McMillan, K.L.: Interpolation and SAT-based model checking. In: Hunt, W.A., Somenzi, F. (eds.) CAV 2003. LNCS, vol. 2725, pp. 1–13. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45069-6 1 22. Vizel, Y., Gurfinkel, A.: Interpolating property directed reachability. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 260–276. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08867-9 17 23. Yu, Y., Subramanyan, P., Tsiskaridze, N., Malik, S.: All-SAT using minimal blocking clauses. In: VLSID (2014) Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.