Faculty of Informatics Masaryk University
Formal Methods for Analysis
of Biological Systems under Parameter Uncertainty
David Šafránek Habilitation Thesis
Brno, November 12, 2018
Abstract
In this habilitation thesis, we discuss a set of novel formal methods for computer-aided analysis of mathematical and computational models of biological systems. The presented techniques target two problems that have recently appeared in the field of computational systems biology — namely, the problem of parameter synthesis and the problem of robustness analysis. The techniques described in the thesis target both of the problems. Besides systems biology, the techniques presented in the thesis are applicable in the emerging domains of cyber-physical and cyber-biological systems.
The shared attribute of all techniques presented in the thesis is the fact they are based on rigorous formal methods, namely, on model checking algorithms developed for formal verification of computer systems. This attribute signifies the uniqueness of the techniques and positions them primarily in the state-of-the-art in silico toolchains of synthetic biology workflows where engineering and design of synthetically re-programmed living cells are performed.
Both discussed problems address the challenging need for efficient global analysis of computational models frequently used in systems and synthetic biology. Model classes targeted by the techniques described in the thesis represent the majority of models that can be encoded in SBML level 3, the community standard for model-based systems biology. In particular, we address methods for deterministic continuous-time models based on differential equations that for historical reasons still represent the most frequently used formalism. Next, we discuss methods working with continuous-time stochastic models that are used to describe biological systems and molecular mechanisms observed at the detailed level of individual discrete events of molecular interactions. Finally, we discuss methods working with abstract discrete models (Boolean or Thomas' Networks) representing the logic of positive and negative influences among molecules. These formalisms are used to describe biological systems at a high level of abstraction and they make an important tool for inference of new hypotheses and design of targeted wet-lab experiments. Most of the discussed methods are demonstrated in several case studies.
The thesis is based on extended versions of multiple conference and journal papers joint into a unified framework and accompanied with a significantly extended overview of other existing approaches.
Acknowledgement
First of all, I would like to thank master's and doctoral students in close cooperation with whom I obtained the results presented in this thesis, namely, Martin Demko, Petr Dluhoš, Sven Dražan, Jana Dražanová, Juraj Kolčák, Matěj Hajnal, Samuel Pastva, Adam Streck, Matěj Troják, Martin Vejnár, and Tomáš Vejpustek.
I would like to thank very much prof. Luboš Brim, the head of the Systems Biology Laboratory which I am a member of at Faculty of Informatics, Masaryk University, for his continuous moral and professional support.
I would like to thank my colleagues in close cooperation with whom I obtained the results presented in this thesis. Namely, the colleagues from Masaryk University - Jiří Barnat, Nikola Beneš, Milan Češka, Ivana Černá, Jiří Damborský, Pavel Krejčí, the colleagues from Free University Berlin -Hannes Klarner and Heike Siebert, and the colleagues from Cachan and Universitě Paris-Saclay - Loi'c Paulevé and Stefan Haar.
Additionally, I appreciate very much the discussions and support coming from prof. Marta Kwiatkowska (University of Oxford), prof. Jan Van Schuppen (University of Delft), prof. Jean-Marie Jacquet (University of Namur), Alessandro Abate (University of Oxford), Ezio Bartocci (TU Vienna), Luca Bortolussi (University of Trieste), Luca Cardelli (Microsoft Research), Jan Červený (CzechGlobe), and Ralf Steuer (Humboldt University Berlin). I would like to thank Oded Maler (who unfortunately passed away in September 2018) for great discussions forming my motivation and approach to interdisciplinary research.
Finally, I am deeply indebted to my family for their great moral support and patience without which I would have never finished this thesis.
The work presented in this thesis has been supported by a number of grant projects — in particular, European 6th Framework STREP NEST project EC-MOAN Reg. No. 043235, the Czech Grant Agency projects GA15-11089S and GA18-00178S, the EU OP project CyanoTeam (Reg. No. CZ. 1.07/2.3.00 /20.0256), and the MEYS project of the National infrastructure C4SYS - Centre for Systems Biology (Reg. No. LM2015055).
Contents
1 Introduction 1
1.1 Motivation............................. 2
1.2 Lessons Learned in Systems Biology Projects......... 3
1.3 Focus of the Thesis........................ 4
1.4 Thesis Structure.......................... 5
2 Background 6
2.1 Computational Systems Biology ................ 6
2.1.1 Biological Systems.................... 6
2.1.2 Mathematical Approach................. 7
2.1.3 Computational Approach................ 7
2.2 Biological Networks ....................... 8
2.2.1 Chemical Reaction Networks.............. 9
2.2.2 Influence Networks ................... 11
2.2.3 Reaction-Influence Networks.............. 12
2.3 Models of Biological Networks................. 13
2.3.1 Continuous Models ................... 16
2.3.2 Stochastic Models..................... 21
2.3.3 Discrete Models...................... 23
2.3.4 Parameterised Models.................. 24
2.3.5 Relations Among Models................ 27
2.3.6 Other Types of Models.................. 29
2.4 Model Encoding and Abstraction................ 32
2.4.1 Model Variables...................... 32
2.4.2 Model Parameters .................... 33
2.4.3 Parameter Perturbations................. 33
2.4.4 Parameterised Kripke Structure............. 34
2.4.5 Parameterised CTMC .................. 35
2.4.6 Discrete Abstraction of Continuous Models...... 35
2.5 Temporal Logics for Biological Systems............ 43
2.5.1 Linear Time Temporal Logic - LTL........... 45
2.5.2 Computation Tree Temporal Logic - CTL....... 46
2.5.3 Signal Temporal Logic - STL.............. 47
i
2.5.4 Value-Freezing Signal Temporal Logic - STL*..... 49
2.5.5 Continuous Stochastic Logic - CSL........... 52
2.5.6 Interpretation on Biological Network Models..... 53
2.6 Model Analysis.......................... 57
2.6.1 Restriction to Bounded Time.............. 57
2.6.2 Reachability Analysis .................. 58
2.6.3 Model Checking ..................... 61
2.6.4 Software for Model Checking of Biological Models . 64
2.6.5 Simulation and Monitoring............... 66
2.6.6 Parameter Synthesis................... 68
2.6.7 Robustness Degree.................... 70
2.7 Summary.............................. 72
3 Solved Problems 73
3.1 Parameter Synthesis ....................... 73
3.1.1 Problem Formulation .................. 73
3.1.2 Significance........................ 73
3.1.3 Existing Solutions..................... 74
3.1.4 Our Contribution..................... 75
3.2 Parameter Exploration...................... 75
3.2.1 Problem Formulation .................. 75
3.2.2 Significance........................ 76
3.2.3 Existing Solutions..................... 76
3.2.4 Our Contribution..................... 77
3.3 Robustness Analysis....................... 77
3.3.1 Problem Formulation .................. 77
3.3.2 Significance........................ 78
3.3.3 Existing Solutions..................... 79
3.3.4 Our Contribution..................... 79
4 Coloured Model Checking 81
4.1 Overview.............................. 81
4.2 Coloured LTL Model Checking................. 82
4.2.1 Algorithms ........................ 83
4.2.2 Interpretation of Parameter Synthesis Results..... 84
4.2.3 Performance Evaluation................. 86
4.2.4 Publications Summary.................. 87
4.3 Coloured CTL Model Checking................. 88
4.3.1 Algorithms ........................ 88
4.3.2 Interpretation of Parameter Synthesis Results..... 93
4.3.3 Performance Evaluation................. 93
4.3.4 Publications Summary.................. 94
4.4 Software Tools........................... 95
4.5 Applications............................ 96
4.6 Discussion............................. 96
4.7 Future Work............................ 97
5 Parameterised Uniformisation 99
5.1 Overview.............................. 99
5.2 Method Principles......................... 102
5.2.1 Local Transient Analysis................. 102
5.2.2 Global Transient Analysis................ 102
5.2.3 Model Checking ..................... 102
5.2.4 Numerical Errors of Parameterised Uniformisation . 103
5.2.5 Parameter Space Decomposition............ 104
5.3 Performance Evaluation..................... 105
5.4 Applications............................ 106
5.5 Publications Summary...................... 106
5.6 Discussion............................. 106
5.7 Future Work............................ 108
6 Robustness Analysis 109
6.1 Overview.............................. 109
6.2 Deterministic Robustness for STL*............... 110
6.2.1 Method Principles.................... Ill
6.2.2 Performance Evaluation................. 113
6.2.3 Applications........................ 114
6.2.4 Publications Summary.................. 114
6.2.5 Discussion......................... 115
6.2.6 Future Work........................ 115
6.3 Probabilistic Robustness for CSL................ 116
6.3.1 Method Principles.................... 117
6.3.2 Applications........................ 119
6.3.3 Performance Evaluation................. 120
6.3.4 Publications Summary.................. 121
6.3.5 Discussion......................... 121
6.3.6 Future Work........................ 122
6.4 Software Tools........................... 122
7 Case Studies 124
7.1 Overview.............................. 124
7.2 Biodegradation of 1,2,3-trichloropropane in E. coli...... 125
7.2.1 Problem Description................... 125
7.2.2 Model Encoding ..................... 126
7.2.3 Analysis Procedure and Results ............ 126
7.2.4 Performance........................ 129
7.3 Regulation of Gi/S cell cycle transition............ 131
7.3.1 Problem Description................... 131
7.3.2 Model Encoding ..................... 131
7.3.3 CTL-based Analysis................... 132
7.3.4 CSL-based Analysis ................... 135
7.4 Sustained vs Transient Modes of Cell Signalling ....... 139
7.4.1 Problem Description................... 139
7.4.2 Model Encoding ..................... 139
7.4.3 Analysis Procedure and Results ............ 140
7.5 Robustness of Elemental Signalling Pathways......... 144
7.5.1 Model Reconstruction.................. 144
7.5.2 Problem Description................... 145
7.5.3 Post-Processing Functions................ 147
7.5.4 Analysis Procedure and Results ............ 147
7.5.5 Performance........................ 151
7.6 Robustness of Population Dynamics.............. 154
7.6.1 SIR Model......................... 154
7.6.2 Predator-Prey Model................... 155
7.6.3 Performance........................ 157
7.7 Publications Summary...................... 159
7.8 Discussion............................. 160
Conclusion 162
7.9 Summary.............................. 162
7.10 Future Work............................ 163
iv
Chapter 1
Introduction
All biological systems, from single pathways to multicellular organisms, can be seen as complex systems of interacting components. Biological systems can also be seen as reactive systems, as they continuously interact with their environment. Systems biology thus studies complex interactions in biological systems, with the aim to understand better the processes that happen in such a system, as well as to grasp the emergent properties of such a system as a whole [241].
The paradigm of systems biology is based on the model-based approach to study the complex phenomena. The model approximately represents a certain phenomenon under given biological and physical assumptions. While the biological knowledge is always uncertain and incomplete, models represent rigorous structures that can be analysed by computational methods. The emphasis on computational methods and models refines the general paradigm of systems biology to a more concrete scheme of computational systems biology [240].
Computational systems biology can, by drawing upon mathematical approaches developed in the context of computer science and engineering [173, 309], contribute to the creation of powerful simulation, analysis and reasoning tools for working biologists. These tools can be used in designing and devising new experiments and ultimately, for understanding functional properties of a genome, proteome, cells, and organisms.
In last years we are continuously experiencing growing collaboration between biologists and computer scientists in many areas of systems and synthetic biology. This is because it has turned out that formal mathematical approaches to modelling and analysis that have been developed for parallel and distributed computer systems and are referred to as formal methods are applicable to biological systems as well as both kinds of systems have a lot in common. In particular, techniques developed in computer science for automated formal verification have the potential to be exploited in computational systems biology. Of special interest is model checking that gives a feasible methodology to verify/refute interesting biological hypotheses [84].
1
1.1 Motivation
The computational analysis of the precise dynamics of biochemical systems involves the construction of appropriate computational models. Building suitable sound dynamic models can be seen as a key step toward the development of predictive models for cells or whole organisms. While the structure of such models is usually available, some of the quantitative features of models cannot be easily determined. These quantitative attributes, which significantly affect the system dynamics, are usually reflected in the model as parameters.
In order to obtain reliable models, parameters, such as reaction rates or concentration values, need to be specified exactly. For a typical model, some of the parameters values can be determined from the literature or experimental data, many parameters values are uncertain or unknown. The reason is that measurement of many parameters in vitro or even in vivo is hardly possible. In particular, the level of detail at which parameter values can be measured is insufficient in comparison to the level of detail needed for a precise model. An example is measuring protein production rates, which vary with different regulation modes arising during protein transcription. Another problem is that the models have to be specified at a certain level of abstraction approximately representing the modelled phenomenon. This means that a parameter in a model can abstractly represent a value that is composed in a non-trivial way from several physical parameters describing detailed physical or biophysical characteristics of a modelled phenomenon.
The algorithmic analysis of dynamics generated by models with unknown parameter values is one of the main challenges in computational systems biology [284, 293]. The problem is addressed from several perspectives. On the one end, there exist data-oriented approaches of parameter estimation based on finding parameter values that optimally fit the given time-series data [288] typically employing randomised algorithms and techniques of artificial intelligence and machine learning. In systems control theory, the problem is known as parameter identification that includes the rigorous problem of systems identification from data [189] or the problem augmenting parameter estimation with the inference of statistical guarantees [314]. On the other end, there are behaviour-oriented and property-oriented approaches that target the global problem of finding the parameter values exhibiting a specified behaviour. This problem is targeted by methods of parameter synthesis or parameter scanning [147, 38] based on finding parameter values reproducing a given temporal phenomenon, or methods of bifurcation analysis [104, 58, 230], focused on finding parameter values that significantly change the behaviour of the system.
A related problem to parameter synthesis has been raised by Kitano in [242]. It addresses the so-called robustness analysis taking into account
2
the fact that various biological systems can display a certain phenomenon for parameter values sets of significantly different cardinality. Intuitively, the larger is the set of parameter values exhibiting a given feature the more robust the feature is. The metrics provided to measure robustness thus give an instrument that allows us to compare models with respect to a given property and a given range of parameter uncertainty. Since the notion is relatively novel, there are not yet efficient techniques providing a fully automatised algorithmic support for robustness analysis.
1.2 Lessons Learned in Systems Biology Projects
In the application-oriented part of our research, we have found a strong evidence for the significance of the problems mentioned above. In the following we describe our insights taken from the research projects in which we have been involved.
The project EC-MOAN1 targeted several gaps in understanding interactions between the metabolic and genetic layer of Escherichia Coli. The formal methods were utilised to help biologists to find rigorous explanation of several considered hypotheses. Model checking techniques have been pioneered to be used for analysis of biological models by employing methods known from control theory. Gaps between biologists and mathematicians/ computer scientists were narrowed but not yet sufficiently. The need for methods working under parameter uncertainty has been formulated and pioneered by two groups involved in the project [51, 38].
In the project CyanoTeam/CyanoNetwork2 we targeted new challenges of systems biology of cyanobacteria and photosynthesis. Owing to the fact cyanobacteria is not that well studied as E. coli, several new challenges have appeared. In particular, we have formulated a general framework for modelling and analysis of biological processes making the logical back-end for the comprehensive modelling web tools targeting photosynthesis [361] and cyanobacteria [245]. The platforms utilise well-established analysis techniques based on simulation. An important challenge is to bring techniques working with uncertain parameters to these platforms. A necessary step is to develop techniques that can be in future deployed to the platforms in the form of a push-button technology.
Lessons that are continuously being learned from ongoing national collaborations3 with local laboratories show the crucial importance of model-based workflow in systems and synthetic biology. Especially, development of methods targeting parameter synthesis with formal guarantees can sig-
'http://www.ec-moan.org, 6th Framework Programme FP6-2005-NEST-PATH-COM 2http://www.cyanoteam.org, EC OP project Reg. No. CZ.1.07/2.3.00/20.0256 3http:/ /c4sys.cz, The national infrastructure C4SYS - Centre for Systems Biology, MEYS project No. LM2015055
3
nificantly reduce the efforts needed to design synthetically modified cells (e.g., the research reported in [139] and described in Section 7.2).
1.3 Focus of the Thesis
The research goal targeted in this thesis is the development of efficient computer-scientific techniques that allow automatised analysis of models of biochemical systems under parameter uncertainty. Our main contribution is in narrowing the existing gaps in the current state-of-the-art technology. In particular, this is realised by introducing new original methods and techniques that utilise model checking as the central method allowing to analyse and explore models with unknown parameters with respect to a given global property (hypothesis). In particular, we target the problems of parameter synthesis and robustness analysis from the perspective of formal methods based on model checking. The results computed by these methods are parameter values that evaluate the global property with some formal guarantees.
We formulate a concrete framework supporting a specific set of model classes frequently used in systems biology. Our study is then focused to these classes. Existing state-of-the-art techniques targeting the research goal are briefly described. Under this setting, we identify existing critical gaps. Next, we formulate the problems to be investigated and solved formally and we describe technical solutions to these problems. The solutions described in the thesis are based on approaches that represent our own contribution to the field. The impact of the contributed methods is demonstrated on several biological problems with various levels of complexity.
Owing to the fact that the individual approaches have been published in separate research articles, we give their comprehensive overview unified in an abstract framework. The description of our methods is accompanied with a significantly extended overview of related approaches that are discussed and compared with our methods.
It is worth noting that this thesis focuses primarily on the problems of parameter synthesis and robustness analysis. Methods practically used in a wider context of inverse problems [160] for fitting the models to experimental data are not discussed here. A comprehensive review of those methods is provided, e.g., in [314]. A critical aspect of those methods is that the models are typically over-parameterised resulting in situations when parameters are poorly constrained by experimental data [207] thus making a good estimation of a single reliable parameter valuation impossible. Our approach avoids such situations by reformulating the inverse problem in an abstract way. In particular, parameter synthesis is a method that a priori returns a set of parameter valuations satisfying a general property instead of searching for a single optimal value fitting a given measurement.
4
1.4 Thesis Structure
The structure of the thesis is the following. Chapter 2 fixes the modelling framework and discusses state-of-the-art techniques available for this framework. Additionally, a family of temporal properties used to capture biological phenomena is discussed with a special emphasis on logics used in our methods. Chapter 3 states formally the problems considered in the remaining part of the thesis. Most important relations to related work directly applicable to the stated problems are discussed here. Additionally, our contribution to the stated problems is briefly summarised. Chapters 4-6 are devoted to a detailed description of the techniques used to solve the problems given in Chapter 3. Finally, case studies performed on several biological problems are described in detail in Chapter 7.
5
Chapter 2
Background
2.1 Computational Systems Biology
Systems biology can be characterised as an approach to the understanding of life through the study of how the properties of biological systems arise through interactions between the system components [283]. In 2002, the specialised field of computational systems biology has been introduced by Kitano [240]. In general, it is a scientific paradigm that addresses questions fundamental to our understanding of life by means of rigorous methods. Progress in this field has direct impact to practical innovations in medicine, drug discovery and engineering. An important attribute is that processes in living matter are studied as dynamical systems and systems engineering methodology is applied. The methodology is based on mathematical models used to rigorously represent biological phenomena in abstract forms.
2.1.1 Biological Systems
Biological systems are complex systems of interacting biological components, which may be molecules, cells, organisms or entire species, that change their properties with time in response to external and internal stimuli. Studying the dynamic behaviour of these systems is the basis for understanding of cellular functions or disease mechanisms.
The formalism used to model a biological system is essential, since it not only dictates the possible behaviour that may or may not be captured, but also determines the computational means of detecting and analysing that. Fisher and Henzinger [173] distinguish two kinds of models relevant for computational systems biology: executable versus denotational (or, as they phrase it, computational versus mathematical).
6
2.1.2 Mathematical Approach
Mathematical models in systems biology are typically represented by means of ordinary differential equations (ODEs) that have a long tradition in chemistry since 1864 [206, 285]. They are used to approximate the behaviour of well-mixed (bio)chemical masses deterministically by continuous functions of time. They are frequently used by biophysicists and therefore make the main formal language of computational systems biology. To share this type of models, the format SBML [228] has been developed. It is derived from XML and provides a way to prepare the models for computational processing. Commonly known repositories support this approach, probably the best known of these is Biomodels.org [265].
The most common method for analysis of mathematical models of dynamical systems is the numerical simulation. There are several tools developed especially for systems biology, e.g., Copasi [224], CellDesigner [182], CellML [271], etc. They support simulation and other numerical tasks. These tasks require detailed knowledge of the biological system, i.e., quantitative parameters identifying the physical aspects of system interactions (e.g., kinetic coefficients of chemical reactions). Therefore, of special interest is parameter estimation [288], a method that combines simulation with optimisation to find values of unknown parameters with respect to experimental data. It is a key issue in systems biology, as it represents the crucial step to obtaining better models of biological systems that give more precise predictions. In biological models, the control parameters used to define the behaviour of models are kinetic or rate parameters. Some of these parameters usually cannot be experimentally determined which leads to the need to estimate these parameters by computational methods. Due to the non-linear character of mathematical models in biology, these methods are criticised by mathematicians because of inexactness and ill-formed settings lacking precise discussions of identifiability. Some of these drawbacks can be avoided by using advanced statistical methods [314].
2.1.3 Computational Approach
Computational models form the new promising basis for studying various biological phenomena by drawing upon formal approaches developed in the context of computer science and engineering [173, 309, 229, 172, 362]. This is because it has turned out that formal approaches to modelling and analysis, referred to as formal methods, are applicable to biological systems as well, as both kinds of systems have a lot in common, especially, discrete event dynamics, concurrency and reactiveness. Moreover, in both kinds of systems the interaction of components is a source of various emergent system properties that are not explicitly encoded in individual system parts. There are two main properties of computational models that cannot
7
be, in general, easily obtained in mathematical models. First, operational concurrent discrete-event semantics of computational models gives an adequate description of reactive events occurring among molecules and other particles. Moreover, operational view allows to study how the complex behaviour emerges from individual interactions. Second, computational models can represent the system at arbitrary level of abstraction (from detailed reactive event view to higher-level events abstractly representing complex processes). Moreover, models can be written in modular and compositional form that allows possibility to integrate modules at different level of abstraction. Integration of partial models is one of the crucial tasks of modern systems biology that aims to explain the behaviour in the global context. This is difficult to be achieved with mathematical models.
In last years, many techniques and tools employing computational models and formal methods have been developed resulting with strong case studies. Examples of widely used formalisms are Boolean networks [349, 106, 244], Petri Nets [215, 70, 105, 307], timed automata [53, 333,199], compact process algebraic representations such as BioPEPA [114], Kappa [130], BNGL [161] or suitable adaptations of 7r-calculus [304, 316]. Overview of methods is given, e.g., in [45,174].
2.2 Biological Networks
The theory and practise of systems biology are inherently based on the concept of studying biological systems as networks of interacting species. Biological knowledge is reconstructed and organised in the form of biological networks. Biological networks are built from biological knowledge databases, experimental data and biophysical principles employing many simplifying assumptions. There are two fundamental types of biological networks - reaction networks and influence networks. Recent complex network reconstructions typically mix the two, we call such kinds of biological networks as reaction-influence networks.
Example 2.1 An example of a general biological network is given in Figure 2.1. The network is described in the standard visual notation SBGN (Systems Biology Graphical Notation) [264] in terms of a so-called process diagram. The network describes a simplified case of transcription of a gene (geneX) controlled by a particular protein TF (a so-called transcription factor). The outcome of the transcription process is a molecule of messenger mRNA (rnaX).
The auxiliary species denoted 0 represents the general matter out of the border of the modelled system.
In [99] the two kinds of networks (reaction and influence networks) are studied and compared by using the utility of graph morphisms. It is shown that under given fixed assumptions the reaction networks make a more
8
( 1 mt:prot I s I | ctgene | |
tf x
f \ mt:prot |—* ct:gene
tf x
\ /
-ir I-1 ct:mRNA I-
I-□-► x
Figure 2.1: An example of a biological network having the form of a reaction-influence network. The example is shown in the standard graphical format SBGN.
concrete forms (implementations) of influence networks. In particular, influence networks can be, from the graph-theoretic perspective, understood as abstractions of reaction networks. The abstraction relation is formalised by means of morphisms between networks. These morphisms reflect (and preserve) the kinetic interpretation of biological systems described by the networks.
The information encoded in biological networks can be used to study relations between individual networks topologies existing in nature. E.g., in [191, 190] a fully structural approach was proposed to measure the distance of biological networks obtained by changing the graph structure. The natural development of biological networks structure is hypothesised to follow several optimisation goals set with respect to the environment in which the living cell resides in long-time horizon [235].
2.2.1 Chemical Reaction Networks
Chemical reaction networks (reaction networks for short) provide a detailed mechanistic view of biochemical systems - nodes are (bio)chemical species and stoichiometry-labelled (multi-)edges represent elementary (bio)chemical reactions. Chemical reaction networks are used as the basic formalism in chemistry. On the theoretical side, they were studied mainly from algebraic or graph-theoretic perspectives [169, 354, 3, 233, 121, 321] allowing to infer properties of network dynamics from network structure.
For the purpose of using reaction networks in this thesis, we modify and extend the notation and definition introduced in [100]. In general,
9
chemical reaction networks make the basic formalism for describing biological systems at the elementary mechanistic level. They also form the basis of SBGN process diagrams.
Notation 2.2 Denote & the universe of all chemical species. We assume & to be lexicographically ordered.
Definition 2.3 Chemical reaction network (CRN) (or reaction network for
short) is a pair (A, 1Z) where A c & is a finite set of species and 7Z is a finite set of reactions such that q g TZ is a pair q = (re,pe) where rQ e N0A' is the reactant
complex and pe e N0A' is the product complex representing the stoichiometry of reactants and products, respectively. In both cases, we assume the components of a complex follow the lexicographical order.
For any I e A, notation re(l) is used to denote the stoichiometry coefficient corresponding to I in the reactant complex component re.
Example 2.4 An example of a simple reaction network with interesting nonlinear behaviour is a so-called Schloegel's reaction system representing a set of ordinary reactions acting on a single chemical species [326]:
qi : 2A ->• 3A q2 : 3A ->• 2A g3 : 0 ->• A g4 : A ->• 0
Encoding in terms of Definition 2.3 is the following (since the complexes include at most a single component the respective brackets are omitted):
({A}, {(2, 3), (3, 2), (0,1), (1,0)})
Ql Q2 QZ QA
Example 2.5 The following network is a simple example of a reversible catalytic reaction system where Ef represents an enzyme catalysing the forward reaction, Eb is an enzyme catalysing the backward reaction, S is a substrate, and P a final product.
6l: S + Ef SEf Q2 : SEf -+S + Ef g3 : SEf P + Ef
g4: P + Eb^ PEb g5: PEb P + Eb g6 : PEb S + Eb
Encoding reflecting Definition 2.3 is the following (the set of species is considered to follow the displayed lexicographic order):
Ql Ql
({eb, ef,s, sef, p, peb}, {((0,1,1, 0,0, 0), (0, 0, 0,1, 0, 0)), ((0,0, 0,1,0, 0), (0,1,1,0, 0, 0)), ((0,0,0,1,0,0), (0,1,0,0,1,0)), ((1,0,0,0,1,0), (0,0,0,0,0,1)), ((0,0,0,0,0,1), (1,0,0,0,1,0)),
*-v-' *-v-' *-v-'
Q3 QA QS
((0,0,0,0,0,1), (1,0,1,0,0,0))})
10
| mt:prot 1—H ct:gene |——|
tf x
| ct:mRNA |—i
IS
I t| mt:proteJn | i I | mt:proteJn | ■ \
pRB E2F1
Figure 2.2: (left) The influence graph abstractly representing the biological network from Figure 2.1. (right) A two-node influence network representing a variant of a genetic switch.
2.2.2 Influence Networks
Influence networks describe systems at a higher level of abstraction than CRNs by focusing on influences among individual system components -nodes are biochemical species or abstract biological objects and edges represent positive or negative influences. Gene regulatory networks [134] or signalling pathways [239] make typical examples used in systems biology. Influence networks are frequently used in systems biology to describe processes without giving mechanistic details on the individual (bio) chemical interactions. They make also a part of the graphical format SBGN in the form of so-called activity flow diagrams. We present the following compact definition of influence networks that we have introduced in [248].
Definition 2.6 Influence network (IN) is a pair (A, Ta) where A c & is a finite set of species and is a finite set of species influences such that C A x A. For each I e A we denote the set of its influencers as n~(l) = {x e A | (x, I) e
rA}-
Example 2.7 A simple example of an influence network is depicted in Figure 2.2 (left). It represents an influence abstraction of the biological network shown in Figure 2.1. It is presented in the SBGN format. The presence of protein TF and gene X has a positive influence on the presence of the respective mRNA molecule encoding a protein X. Note that in our framework oflNs we do not capture the information about the type of the influence, only the topology is represented.
It is worth noting that we have decided to omit specification of the influence type at the level of INs because there are many different types of individual influences that have various semantics which is typically not clearly fixed (i.e., the specification of SBGN gives a pallet of arrows representing an influence at a very abstract level without specifying the respective mechanistic function underlying the influence). Detailed specification of influence functions is moved to the level of semantics.
11
Example 2.8 Another influence network is depicted in Figure 2.2 (right). The network consists of two protein species found in human cells. These proteins are mutually influencing each other in a negative and positive way indirectly through controlling the transcription of the influenced (regulated) protein. Each of the proteins additionally self-regulates transcription of its own. This is caused by a negative (pRB) and a positive (E2F1) feedback. This genetic circuit (called a genetic switch) is important in controlling the irreversible decision of a cell to proceed with initiating the doubling procedure [345]. In Section 7.3 we provide a detailed study of this circuit.
2.2.3 Reaction-Influence Networks
CRNs and INs introduced in previous sections make a basis for formal specification of biological systems. CRNs provide a framework for detailed mechanistic description of biochemistry underlying biological processes whereas INs allow describing complex processes at an abstract level avoiding mechanistic details. As it has been discussed in Example 2.1, a mixture of both kinds of networks is typically used in practice. In typical modelling scenarios, not all modelled processes can be decomposed down to the mechanistic level. The reasons are the enormous complexity and interconnection of biological processes and the large extent of uncertainty in biological knowledge. A feasible solution is to use influences in cases where simplifying descriptions are needed (or sufficient) and to combine them with reactions to describe well-known parts of modelled processes. To that end, reaction-influence networks (RINs) are formally established in the following definition.
Definition 2.9 Let (A, 1Z) be a reaction network. We define a reaction-influence network as a triple (A, 1Z, T-ji) where Tji C AxTZisa set of reaction influences. For each g e 7Z we define a set of its influencers as n~(g) = {x e A | (x, g) e T^}. Additionally, we denote the set of influenced reactions 7Z~ = {g e 7Z \ n~ {q) / 0} and the set of uninfluenced reactions 7& = 1Z \ 1Z~.
Remark 2.10 Definition 2.9 is compatible with the notion of biological model structure as understood in the standard format Systems Biology Markup Language (SBML) level 2 [228]. In SBML, the set of species is represented as a list of species and the set of reactions as a list of reactions. Reactions influencers are called modifiers.
Example 2.11 An example of a biological network that can be represented in terms of a RIN is given in Figure 2.1. The network has four species, A = {geneX, geneX-TF, rnaX, TF}. The basic CRN has the following two ordinary chemical reactions displayed with a respective encoding reflecting the lexicographic order of A:
q1 : geneX + TF —>■ geneX-TF ((1, 0, 0,1), (0,1, 0, 0)) e 11 g2 : 0 rnaX ((0, 0, 0, 0), (0, 0,1, 0)) g K
12
First, there is a second-order reaction gi that describes a complexation process (association). A molecule of the transcription factor (TF) binds to the particular segment ofDNA (the gene X). Second reaction q2, describes formation of an RNA molecule. This first-order reaction is described in a very abstract form without chemical details (i.e., small molecules such as ATP or transcription enhancers involved in the process are not specified). All elements making the matter from which RNA is built are set outside of the system.
There is a single influence in the network, Tji = {(geneX-TF, 02)}, representing a positive effect of the complex geneX-TF on the occurrence of the second reaction. The influence abstracts from many elementary reactions describing the details of chemical interactions among all involved molecules.
2.3 Models of Biological Networks
One of the main aims of computational systems biology is to study the dynamics of biological networks, in particular, how a population of components affected by network interactions evolves in time. To that end, a biological network is extended to a biological model that associates the network structure with a suitable semantics reflecting the dynamics of network components understood at a particular level of abstraction. The semantics fulfils the following tasks:
• Network components (species) are given a mathematical interpretation as variables (numbers of molecules, molar concentrations, or a suitable abstract interpretation of these quantities),
• Network interactions (reactions or influences) are given a mathematical interpretation as rules describing the dynamics - changes in affected variables occurring in discrete or continuous time.
Discrete-value semantics can capture either quantitative (microscopic or mesoscopic [317]) interpretation of chemical/biological entities (e.g., number of particles, atoms, molecules) or abstract qualitative interpretations of selected qualities of modelled components (e.g., absence/presence of a molecule, several levels/intervals abstracting quantitative values). Continuous-value semantics represents a so-called macroscopic view [231] where the modelled components are expected to appear in large quantities provided that it is inconvenient to distinguish small differences (e.g., the molar concentration of a species in a cell or in a certain organelle).
Rules are distinguished at several levels. The most basic form of a rule is a chemical reaction. It makes the central organisation-building element in reaction networks (Definition 2.3). Elementary reactions representing mass transfer (A —> B), dissociation (AB —> A + B), and compound formation (A + B —> AB) represent the atomic rules of chemistry/biology.
13
However, modelling of biological processes at that level requires very detailed knowledge that is often not completely known (e.g. detailed schemes of enzymatic reactions or reactions in gene regulation). Moreover, such processes suffer from a large extent of combinatorial explosion due to the fact the molecules and compounds can exist in many different concurrent states (e.g., phosporylation, methylation, etc.). To that end, rules in models are used with suitable levels of abstraction and compactness. One of such abstractions, in the form of general chemical reactions, has been already considered in Definition 2.3 (non-trivial stoichiometry coefficients allowed in reactant and product complexes). Another level of abstraction that avoids combinatorial explosion by context-free partial specification of reactants and products forms, is provided in rule-based [161,130, 223, 302] and to some extent also in process-algebraic languages [310, 64].
Quantities that can be associated with rules are time and probability. Since each interaction occurs in time with a specific rate, the respective rule is executed with this rate implying the inherent time aspect of the system dynamics. Naturally, time is considered as continuous, dense quantity. When the information on rate is unknown or abstracted out due to simplifications, discrete-time semantics is employed. It deals with the shortest (discrete) time step which can represent an arbitrary finite time horizon. Discrete-time abstraction allows to treat computational models as untimed, i.e., the exact duration of a single time step is left unspecified. It is worth noting that in the most of cases the occurrence of any rule is modelled as instantaneous and it occurs immediately after the conditions for occurrence are satisfied. There exist models that refine these aspects of semantics (e.g., delayed interactions [97] or non-instantaneous interactions [29,108]).
With respect to the execution of interactions, rules can be either deterministic or stochastic. Deterministic rules represent interaction events that occur each time all preconditions are satisfied (i.e., if the given system state contains at least the required amount of all reactants, the reaction is enabled and occurs). There is no noise affecting the interaction. Interaction events can occur concurrently - concurrency is an inherent feature of biological and biochemical systems. There are several ways of modelling concurrent events in computer science and most of them are applicable to biological models. In the case of instantaneous events, the so-called synchronous or asynchronous updating scheme is frequently used [106]. The former scheme results in purely deterministic behaviour in which in every step all events occurring concurrently are synchronised as occurring in the same time instant. It is apparent that some behaviour can be lost in such settings. On the contrary, the asynchronous updating scheme employs the so-called interleaving of concurrent events. In particular, the model includes all possible sequences of concurrent events. They can be selected from a given state by means of a non-deterministic choice.
14
Deterministic rules absolutely neglect the presence of noise that causes the occurrence of events to be a random phenomenon. A well-known physical model of molecule interactions reflecting stochasticity has been addressed by Gillespie in [195,196]. To that end, stochastic rules are considered in modelling languages based on this physical model. In the stochastic settings, the exact time between individual events reflecting occurrence of a given reaction is a random variable depending on a particular state of the system and on physical aspects of the reactant molecules.
In general, the choice of time representation, event atomicity and the way of dealing with concurrency significantly affect the expressive power of the model. For example, in the very recent work [108] it is shown that in case of discrete-time discrete-value models of influence networks a more expressive semantics not previously addressed in computer science is required to capture the biological influences correctly. It is also worth noting that a large variety of semantics addressing the issues of concurrency in the ways acceptable for biological networks is comprehensively captured in the formalism of Petri Nets [70,105, 321, 307].
Summary of individual model types with respect to the semantics of species and rules is given in Figure 2.3. The left half-space shows models that work with discrete (or qualitative) semantics of species. The right half-space shows models employing continuous semantics of species. The vertical categorisation shows models with discrete (or untimed) semantics of rule dynamics in the top half-space. In the bottom half-space, models describing continuous-time dynamics are displayed. In the following subsections, we describe classes of models that are supported by methods developed in our research and over-viewed in the next chapters. First, we consider the most commonly used continuous-time continuous-value models for all kinds of considered biological networks - cCRNMs (continuous models of CRNs), cINMs (continuous models of INs) and cRINMs (continuous models of RINs). Second, in the stochastic case, we limit ourselves to the most commonly used stochastic models well-defined for chemical reaction networks - sCRNMs (stochastic models of CRNs). Although there exist scenarios under which Gillespie's theory can be applied to Hill kinetics of influence networks [159] and Michaelis-Menten kinetics of metabolic networks [322] making a subclass of RINs, they cannot be employed universally but only in cases when several additional criteria are satisfied. Finally, we consider discrete-time discrete-value models of influence networks - dINMs (discrete models of INs). These models (known as Boolean networks or Thomas' networks [348, 347]) are becoming popular for qualitative modelling of gene regulatory networks where the detailed knowledge is very complex to be modelled mechanistically. Moreover, genetic regulatory mechanisms are not yet fully explored. It is worth noting that discrete-time discrete-value semantics has been also used in case of CRNs [93]. However, we do not consider these models in our frame-
15
Figure 2.3: Model types categorised with respect to different level of details captured in their semantics. The model types covered in this thesis are typeset in bold and placed to the appropriate quadrants of the scheme.
work due to the fact it is not clear what kind of (non-structural) parameters would make good sense for parameterisation of qualitative models of CRNs.
For the purpose of this thesis, we do not consider discrete-time mathematical models that make the top right quadrant of the scheme in Figure 2.3. It is worth noting that those models are relevant for mathematical biology [123] as they are used for modelling of some abstract processes at the population level, e.g., logistic growth.
2.3.1 Continuous Models
The following definition declares the most frequently used type of mathematical models that is used with CRNs. The model considers variables to evolve deterministically in continuous time in large concentrations (macroscopic view). Therefore every model variable captures a particular species concentration as a function of time. In every continuous time instant, reactions are quantified in terms of an instant flux describing the actual amount of transferred mass. To this end, the fundamental law of mass-action [206] is employed to represent kinetic rate function that characterises quantitatively the speed of mass transfer from reactants to products. In our case, we compile several notions into a single definition and modify the notation to avoid overlaps with other notions used throughout the thesis.
Definition 2.12 (Continuous CRN Model - cCRNM)
Let M = (A, 1Z) be a CRN. Continuous chemical reaction network model of
M, denoted cCRNM(Af), is defined as a tuple (A/", VcRN^c, xo) where
16
• VcRN '■ Tl —> M>o is a (deterministic) kinetic rate coefficients map,
• Xc = M^d is a (continuous) model state space with x(t) e Xc denoting a (continuous) state in time t e M>o,
• xq g Xc is an initial state.
Instead ofx(t) we write x when a concrete value of time t is irrelevant in the respective context. Further, we assume the components of x follow the lexicographical order of A.. For any I e A, the notation xi is used to denote the state component relative to I, the so-called model variable.
Additionally, every g e 7Z is assigned a mass-action kinetic function ne : Xc —> M>o in the following form:
Ke(x)=vcrn{q) ■ n^[e(z)
zeA
Dynamics of a cCRNM is defined as a continuous function x : M>o —> Xc satisfying x(0) = xq and the following system of differential equations (given in vector form):
^ = f(x) = J2(Pe~ re)Ke(x)
Example 2.13 The CRN from Example 2.4 can be assigned a continuous model defined on the state space Xc = M>o by the following differential equation:
= vcrn{qi)x2 - vcrn{Q2)x?' + vcrn(Q3) - vcrn{q±)x
where x represents the one-dimensional model variable describing the development of species A. The kinetic rate coefficients vcrn{qi)-, ^cRn{Q4) are supposed to be fixed to some concrete positive real values.
To describe the continuous semantics of influence networks, we follow the ideas presented in [135, 54]. In particular, every component is expected to have associated some degradation dynamics and every component with a non-empty set of influencers is assigned a production dynamics that mathematically reflects all the affecting influencers in terms of a set of so-called regulation functions that are expected to be monotonous.
Definition 2.14 (Continuous IN Model - cINM)
Let M = (A, Ta) be an influence network. Continuous influence network model of M, denoted cINM(Af), is defined as a tuple (A/", (vcin+ , Vcin-),^c,xq) where
• VciN+ '■ A x N —> M>o is a production rate coefficients map and vcjn_ : A —> M>o is a degradation rate coefficients map,
17
• Xc = M^d is a (continuous) model state space with x(t) e Xc denoting a (continuous) state in time t e M>o,
• xq g Xc is an initial state.
Additionally, every I e A is assigned a (possibly empty) set of regulation functions {aj,..., af1} such that k\ > 0 and each a\ : Xc —> M>o is a continuous and (non-strictly) monotonous function of xi satisfying I e A« where A« C n~(l), Aj / 0, is a subset of influencers of I such that for all i,j e {l,...,ki},i / j.Ai n Aj = 0 and U«e{i ^ = n{l)- Moreover, vcin+ is required to be defined for all (I, i) such that I e Aj.
Dynamics of a cINM is defined as a continuous function x : M>o —> Xc satisfying x(0) = xq and the following system of differential equations (containing an equation for every I e A):
dxi kl
= fi{x) = y~]vciN+(l,i)(rKx) ~ vcIN_(l)xi
i=i
Example 2.15 The influence network from Example 2.7 can be associated with a continuous model under the condition that the semantics of both influences of mRNA_X molecule is given including their mutual (joint) effect. Assuming that TF regulates transcription o/mRNA_X positively, it is apparent that it has the key role in controlling the transcription process. Since the molecule geneX has only the qualitative effect (transcription cannot be performed without a particular gene), we focus only on the subnetwork of the components TF and mRNA_X. A model of this subnetwork may have the following form:
dx
mRNA-X
dt
dt
vcin+ (mRNA_X, l)^lKNAji(x) - vcIN_ (mRNA_X)xmRNA.x
^--^w_(TF)xTF
where the state space has two dimensions corresponding to variables mRNA_X and a?TF/ the coefficient z/cjtv+(mRNA_X, 1) is set to a concrete value reflecting (theoretically) maximal rate ofmRNA production, z/cjtv_ (RNA_X) and ^c/iv_(TF) are replaced with some concrete values representing degradation coefficients, and °mRNA x(x) zs a regulation function ofx^f defined by employing Hill kinetics in the following way:
4
,1 („\ — XTF
°mRNAJXVxJ
This regulation function is a monotonous (increasing) S-shape function ranging in the domain [0,1) and satisfying o"1lnRNA x(x) = 0.5 for x such that x^f = 3.
The requirement on the monotonicity of regulation functions is presented due to practical reasons to simplify the semantics of regulations to either increasing or decreasing functions of the (combined) influencers.
18
Finally, we close the collection of continuous models by describing the most general class considered in this thesis - the class of reaction-influence models. The concept is the following: non-regulated reactions follow the law of mass action whereas regulated reactions have the kinetics defined by a custom monotonous function projecting the system state (including the state of influencers) into the regulated reaction rate.
Definition 2.16 (Continuous RIN Model - cRINM)
Let M = (A, 1Z, T-ji) be a RIN. Continuous reaction-influence network model
of M, denoted cRINM(Af), is defined as a tuple (A/", VcRiN^c, %o) where
• ^cRiN '■ Tl —> ^>o is a (deterministic) kinetic function coefficients map,
• Xc = M^q is a (continuous) model state space with x(t) e Xc denoting a (continuous) state in time t e M>o,
• xq £ Xc is an initial state.
Additionally, every q g 7& is assigned a mass-action kinetic function ne : Xc —> M>o in the following form:
KQ(x) = VcRIn{q) ■ Y{
xr/l)
leA
Every g e 7Z~ is assigned a regulated kinetic function re' : Xc —> M>o in the following form:
k'6(x) = (tq{x,vcrin(q))
where ae is an arbitrary continuous monotonous function of all variables x\ such that I & n~(g) which is parameterised by vcrin{q)-
Dynamics of a cRINM is defined as a continuous function x : M>o —> Xc satisfying x(0) = xq and the following system of differential equations (given in vector form):
= f{x) = ^2(Pe~ re)Ke(x) + ^2 (Pe ~ re)K'e(x)
Example 2.17 The network from Example 2.11 can be assigned the following continuous model:
--Jt- = -VRIN{Ql)XgeneXXTF
-Jj.- = VRlN{Ql)XgenexXTY
^ff^ = ^rnaxOc, Vrin\q2)) -^p = -^_R/Af(ft)^geneX^TF
where the state space is determined by the four-dimensional domain of non-negative reals capturing the concentration of the four components, the semantics of the uninfluenced reaction qi is described in terms of mass action kinetics and the
19
influenced reaction q2 is assigned a monotonous regulation function of the component Xgenex-TF that may be specified by employing Hill kinetics in the following form:
VRIN{q2) ■ ^geneX-TF
Vrnax{x,VRIN{q2)) = -4-—^4-
XgeneX-TF + 6
where vrin{Q2) represents some concrete value that sets the maximal possible transcription rate of the RNA molecule that is controlled (by scalling) employing the same S-shaped regulation function that has been used in the previous example.
The remaining value ofvRiNiQi) has to be set to reflect the physical properties of the TF-DNA complexation reaction.
An important aspect that can be seen in cRINMs is the possibility of a very general use of a kinetic rate coefficient anywhere in the regulation function (in the case of assigning the dynamics to influenced reactions). From the mathematical perspective, the class of non-linear models represented by cRINMs is not restricted at the place or regulation functions. The regular form is only dictated in case of non-influenced reactions where mass-action kinetics is employed.
Notation 2.18 We will use the notation cBNM to refer to any kind of continuous model defined above (cCRNM, cINM or cRINM).
The vector function f : Xc —> Xc defining the dynamics of a cBNM is called a vector field.
Note that our framework provides the most frequently used expressions describing biological networks in terms of ODEs. There exist different forms taking into account various physical aspects of the reaction (e.g., cooperative binding [337], general mass action [225], S-systems [208], etc.)
All types of continuous-time models mentioned in this section describe the dynamics by means of a continuous function of time. Owing to the fact that these systems are non-linear, it is not in general possible to obtain the analytical solution. In particular, numerical simulation methods are used to obtain approximations of the dynamics in a finite time horizon.
The general notion that formally captures the (exact or approximated) dynamics of a continuous-time dynamical system is called a continuous signal (a signal for short), defined in the following way.
Definition 2.19 (Signal) Let n e N and T = [0, r] where r G R>0. Then I : T —> 0 is a bounded-time continuous signal and T its time domain. If T = [0, oo) we speak about a unbounded-time continuous signal.
For a given cBNM M. of a biological network M, we refer the signal representing a behaviour of M. as a signal generated by Ai. More precisely, it is either an exact signal representing an exact (analytical) solution, or an approximate signal constructed with a selected numerical method with some error.
20
The semantics of any cBNM is defined either as the exact signal that represents the unbounded-time dynamics from the initial state or an approximate signal that gives a bounded-time numerical approximated prefix of the exact signal.
Definition 2.20 (cCBNMs semantics) Let M be a cBNM with initial state xq. Exact semantics of M, denoted [.A/í]]cBNM/ is defined as a unbounded-time continuous signal i given by the continuous function x(t) representing the solution of the respective system of differential equations satisfying Ví e M>o.{(i) = x(t) and {(0) = xq.
Approximate semantics of M, denoted [A^J^bnM' zs defined as a piece-wise constant signal f : [0, r] —> M>0 (an approximate signal) constructed with a selected numerical method with some error e and a given time bound r satisfying Ví G [0,r].x(i) = £{t) +e(t) andf(0) = x0.
The semantics defined above employs the "local" view (fixed by a given initial state) typically used by systems biologists. This design decision is considered to emphasise the contrast between the common usage of cB-NMs (simulation) and other model classes for which a "global" view of the state space and global analysis are usually employed.
2.3.2 Stochastic Models
Stochastic computational models allow to incorporate noise which causes fluctuations in microscopic or mesoscopic component quantities and that way affects the biological system dynamics [159, 239]. In physics, chemistry and related fields, the probabilistic time-evolution of a system with discrete component quantities is described by so-called master equations. In the case of biological phenomena, the chemical master equation (CME) provides an exact mathematical model for stochastic dynamics [196]. It is formalised as a set of differential equations, providing a denotational representation of component quantities distribution in continuous-time. Gillespie [195, 197] has made an important breakthrough in stochastic modelling by introducing techniques for exact simulation of CME. From the computer scientific viewpoint, the CME can be equivalently represented by continuous-time Markov chains (CTMC) [299] which provide operational semantics and allow us to consider continuous-time stochastic models as computational (executable) models [142].
The following definition fixes the syntax and semantics of the stochastic model as used in the thesis. The central notion is the hazard function describing the expected rate of a state transition. It is defined by stochastic interpretation of the law of mass action. It differs from the deterministic version by taking into account the combinatorics of selecting the set of particular molecules making the reactant complex from the current solution (state).
21
Definition 2.21 (Stochastic CRN Model - sCRNM)
Let m = (A, 1Z) be a CRN. Stochastic chemical reaction model of m, denoted sCRNM(m), is defined as a tuple (A/", VsRN,^, Xq) where
• X c Nqa' is a finite set of admissible states of the model, the so-called model state space,
• Xq £ X is an initial state,
• and vsrn : 1Z —> M>o is a stochastic rate coefficients map.
Additionally, every g e 7Z is assigned a stochastic rate function (also called a hazard function) he : X —> M>o in the following form:
he(X) = vsrn(q) ■ \\ zeA
where t(l) = 1, ifre(l) > 0, or t(l) = 0, otherwise.
Lor a reaction q = (re, pQ) and a state leX we define the state change from X to X' £ X induced by q, denoted X A X', and satisfying X' = X — re + pe.
Semantics of a sCRNM M, denoted [[.MJsCRNM/ zs defined as a CTMC IMJscknu = (X, Xq, Q) representing a continuous-time discrete-state Markov process {X(t)\t e M>o} where Q G Rlxlxlxl is the transition matrix defined as:
VXi,Xj G X,X,t^ Xj.QiX^Xj) =
where Q(Xi.Xj) = {g G TZ\Xi A- Xj}, and additionally, VXleX.Q(Xl,Xl) = - Q(Xi,Xj).
Xj y^Xi
Individual state components Xi are called (random) model variables. Lor a given state X e X the set of all paths in the state transition system underlying [.MJscrnm and starting at X is denoted Path^M^BCRNM(X).
In contrast to cCRNMs, the semantics of a sCRNM is defined as an entire structure representing all possible behaviours of a system - the so-called global view of systems dynamics. The selection of an initial state Xq can be understood as an additional information used by specific analysis tasks employing the local view on systems dynamics (e.g., simulation or local model checking).
The probability of a transition from state Xi to Xj occurring within t time units is 1 — e~^Xi,x^'1, if such a transition cannot occur then Q(Xi,Xj) = 0. The time before any transition from Xi occurs is exponentially distributed with an overall exit rate E(Xi) defined as E(Xi) = T^Xjex Q(Xi,Xj).
22
A path 7T of a CTMC is a non-empty sequence tt = Xoto^iA... where Vi,j > 0. Q(Xi,Xj) > 0 and £ M>o is the amount of time spent in the state Xi for all i > 0. There exists the unique probability measure on Path\M^°R™(X) defined, e.g., in [260]. Intuitively, any subset of PathlM^°*-™(X) has a unique probability that can be effectively computed. For the CTMC C the transient state distribution is denoted irc,x,t. It gives for all states I' e X the transient probability irc,x,t(X') defined as the probability, of being in state X' at the finite time t, having started in the state X.
From the modelling perspective specific to stochastic models, there is yet another notion of a quantity that can enhance the model semantics. In particular, interactions and even variable values can be assigned quantitative costs and rewards, e.g., time spent in particular concentration levels, energy consumed by particular reactions, etc. By adding this kind of information (if available), models can be adjusted to provide interesting and detailed quantitative predictions resulting from complex dynamics. The notion of rewards is formally introduced in Section 2.5.5.
2.3.3 Discrete Models
Computational models are inherently discrete provided that the dynamics (execution) occurs in terms of a series of discrete events. In the untimed setting, non-determinism allows capturing all possible "timings" (orderings) of concurrent events.
For the purpose of this thesis, we consider a class of discrete-time discrete-value models frequently used for modelling of influence networks. These models are known as Boolean networks [348] or multi-valued Thomas' networks [347, 62]. We use a general definition that has been stated in our work [248].
Definition 2.22 (Discrete Influence Network Model - dINM)
Let m = (A, Ta) be an influence network. Discrete influence network model of m, denoted dlNM(Af), is defined as a tuple (A/", z^inv, X, Xq, m) where
• m £ N'A' is a vector of maximal values,
• X = rizeAiO) •••) mz} zs the model state space,
• Xq C X is a set of initial states,
• and VdiN '■ Q —> {0,..., m/} is an influencing configurations mapping where Q = UzeA (W x ^i) zs the set of all influencing configurations with Qi = n«en-(z){0) • • • > m«} denoting the set of contexts affecting the state component X\. A context is assumed to reflect the lexicographic order of A. The meaning of VdiNQ,^) = k (for some k e {0, ...,m/}) is the
23
intention of setting the state component Xi to the target value k ifX meets the context u.
Semantics of a dINM M, denoted \M\dmu, is defined as a state-transition system [MfldiNM = (X, T, Xq) where T C XxXisa transition relation generated by the dINM in the following way:
X X' e T iff either udIN(l, uj^X)) > Xt A (Xt < mz A X' = Xv^Xl+1]) or vdIN{l,LOi{X)) < Xt A (Xt > 0 A X' = Xv^Xl-i])
where uji : X —>■ $1/ with ui(X) = X\uen-(i){Xu} is the projection of the state X to the context affecting the component Xi reflecting the lexicographic order of A and -X^fc] denotes X with an updated component Xi such that Xi = k.
Similarly to the case of sRNMs, the semantics of dINMs is represented globally by means of a state-transition system. Moreover, the notion of initial states is generalised to an arbitrary subset of the model state space. This is a model design decision reflecting the typical usage of dINMs for global analysis of systems dynamics, e.g., finding states from which certain attrac-tors (strongly connected components of the state-transition graph) can be reached.
Example 2.23 The influence network from Example 2.7 can be associated with a dINM under the condition that the semantics of both influences of mRNA_X molecule is given including their mutual (joint) effect. The gene geneX has a direct positive effect on the production o/mRNA_X. Assuming that TF regulates transcription o/mRNA_X positively, we have both influences affecting mRNA_X positive. The logic of the mutual effect of both influences has the form of the logical AND. The reason is that the presence of both the gene and the transcription factor in the cell is required to perform the transcription of mRNA_X. Assuming the lexicographically ordered set of species A = {geneX, mRNA_X, TF}, a model that considers boolean variables (mi = lfor all species I e A) can have the following form:
vdIN(mKNAJX, (0,0)) = 0 udIN(mRNA.X, (0,1)) = 0 i/dW(mRNA_X,(l,0)) = 0 iWmRNA_X,(l,l)) = l
2.3.4 Parameterised Models
We are interested in models with uncertain or unknown parameters. To that end, we extend models to become functions over parameter values that are assigned to model parameters. Parameters may take values from a given parameter domain P. In the case of cBNMs and sCRNMs, such domain
X = {(xi, x2, x3) | xi, x2, x3 G {0,1}} X0 = X
24
is the set of positive reals, P = M>o- In the case of dINMs, the parameter domain is a set of natural numbers, P = No- An element p 6 Pis called a parameter value.
In this section we provide a general concept of model parameterisation that has been, to the best of our knowledge, not done yet. The intuition behind our concept is the following. In the context of quantitative models, the parameterisation reflects the typical scenario in systems biology where quantities represented by kinetic rate coefficients are considered as parameters. In the case of discrete models, the parameters are qualitative and reflect the settings of the logic driving the model dynamics. More specifically, the parameterisations we consider here target the kinetic coefficients mappings iscRN, vsrn, vciN+, VtfN- , ^cRiN (in case of cBNMs and sRNMs) and VdiN (m case of dINMs). The domain of these mappings is extended with a new symbol _L representing an undefined parameter value. In particular, is.:K^R>0U {-L} for • e {cRN, cRIN, sRN}, vcIN+ : A x N M>0 U {J.}, vcIN_ :A^K>0U {_!_}, and vdIN : Vl N0 U {J.}.
A concrete parameterisation of a particular model is given by the set of objects for which the respective mapping is set to _L. In case of cRNMs, sRNMs and cRINMs the parameterised objects are reactions. In the case of cINMs and dINMs the situation is more complicated since the parameters we consider are set more deeply in the structures representing the model semantics. In dINMs, the parameterised object are the selected influencing configurations. In cINMs, the parameterisation comprises species with unknown degradation coefficients and production rate coefficients assigned to individual regulation functions. The formal definition of parameterised variants of the considered model classes is given in Definition 2.24.
Definition 2.24 (Parameterised Models) Let Fbe a parameter domain. Let M be a reaction-influence network M = {A,7Z, T-ji), a reaction network M = (A, TV), or an influence network M = (A, Ta) associated with a corresponding model (cRINM, cCRNM/sCRNM, or cINM/dlNM, respectively).
For an arbitrary model M we define its parameterisation xm (the set of unknown objects that are parameterised - represented by appropriate parameters), the respective parameter valuations assigning each model parameter a particular parameter value from P, producing the parameter space VXM (the set of all parameter valuations):
• If M. is a cCRNM, M. = (A/", vcrn-, Xc, xq), then xm zs defined as a set of reactions for which the kinetic rate coefficient is considerex undefined,
Xm — {q £ I vcRn(q) = -L}- The undefined values vcrn(q) suc^ that q e xm wake the model parameters.
• If M. is a cRINM or sCRNM then all the notions are defined analogously to the previous case (the only formal difference is in considering the kinetic
25
function coefficients map vcrin or the stochastic kinetic coefficients map vsrn instead ofi/crn, respectively).
• If M is a cINM, M = (TV, (vcin+ , vciN-), ^c, xq), then xm l's defined as
Xm = Xmuxm whereXm - *) G A x N I ucin+ {l,i) = ±m< h} and Xm — {I £ A | vi-{l) = -1} are arbitrary sets of respective coefficients considered undefined. The undefined values vciN+{l,i) such that e xm and vciN- (0 such that I e Xm ma^e the model parameters.
• If M. is a dINM, M. = (A/", VdiN> X, Xq, in), then the parameterisation xm is defined as a set of influencing configurations with undefined target value, Xm — £ ^ I VdiN{l,u) = _L}. The undefined values VdiN{I such that (I, uj) e xm make the model parameters.
The parameter space of M is defined as VXM = pxm. A mapping /x e VXM, I1 '• xm —y IP/zs called a parameter valuation.
Intuitively, the parameter space VXM is given by a set of valuations of model parameters. The parameterisation xm targets the interactions (or reactions) for which the corresponding quantities assigned in the respective coefficients map are considered unknown.
The following notation sets the notion of concretisation of a parame-terised model by replacing a given coefficients map with the selected parameter valuation.
Notation 2.25 For a model M of arbitrary type and a parameter space VXM considered for a given parameterisation xm we denote M-pxa corresponding param-eterised model.
For a given parameter valuation /x e VXM we denote M-px (/x) the concrete model in which the parameters are set with respect to /x.
In particular, if M.-p is a cCRNM then A4j> (fi) is a variant of M where vcrn{q) = fJ-ig) far every model parameter vcrn{q), q g Xm- Parameter valuations ofcRINMs and sCRNMs are treated analogously.
If M.-p is a cINM then M-px (/x) is a variant of M where vcin+ {I, i) = for every model parameter vcin+{1, i) such that (I, i) e Xm an^ vciN^ (0 = MO for every model parameter vciN- (0 such that I e xM-
If M.-p is a dINM then M-px (/x) is a variant of M. where VdiN{l,w) = w) for every model parameter VdiNih w) such that (I, uj) g xm-
Finally, we lift the notion of model semantics to the semantics of pa-rameterised models. In the case of infinite sets of parameter values (this is inherent for continuous parameters considered in cases of cBNMs and sRNMs), the lifted notion of semantics can result in infinite sets of structures.
26
Notation 2.26 The semantics of a parameterised model Mpx is denoted IM-p where • e {cRNM, cRINM, cINM, sCRNM, dINM}. It represents the set of structures representing semantics of individual unparameterised models:
lMvJ.^{lMVx^)j.\fi^VXM}.
The parameterisation framework presented in this section is universal in terms of the possibility to extend the basic parameterisation to more complex parameterisations of mathematical functions representing the model semantics. A way how this can be done is demonstrated in the case of cINMs. A similar scenario may be relevant in the case of cRINMs when more parameters can be necessary to be considered per a single regulated kinetic function. In our simplified case, we have fixed a single parameter per a regulated kinetic function.
In some cases, the concept of model parameterisation can be extended to include the choice of initial conditions as a parameter. This is often employed in cBNMs or sRNMs in order to globalise the view on systems dynamics. In this thesis, we do not consider this kind of parameterisation although it has been targeted in our work on cBNMs [86] and sRNMs [359]. In particular, in this thesis we clearly separate the global analysis wrt generalisation of initial conditions from the globalisation performed by param-eterising the models. Mixing of both forms of abstraction in a single notion would be counter-intuitive.
Bounded Parameters
It is worth noting that the parameter domains considered in Definition 2.24 are considered to be unbounded in general. However, in the case of dINMs it follows from the semantics (Definition 2.22) that the domains of individual influences settings mappings are always bounded by the maximal value that closes the range of the affected variable. In the case of cBNMs and sCRNMs the domain of parameters is, in general, an unbounded set of positive real numbers. In practical scenarios, the parameters are always considered bounded by (bio)physical limits (e.g., maximal physically possible rate of complexation given by the structural properties of a related enzyme, maximal physically possible rate of degradation of a species in a given physical environment, etc.) These limits can be found in biological databases (e.g., Brenda [327] or BioNumbers [287]).
2.3.5 Relations Among Models
From the computational point of view, there exist many well developed and efficient techniques for exhaustive analysis of models appearing in the top left quadrant of the scheme in Figure 2.3. In the case when boundedness
27
assumptions are applied, the models have a finite number of states. However, models in other quadrants incorporate continuous or dense quantities which significantly complicate or even disallow the exact analysis. Model reduction techniques are often employed to translate models into a simplified form in which analysis can be made more efficient, decidable or semi-decidable depending on the character of the model and the respective analysis problem.
In the non-stochastic part of the bottom left quadrant, the reduction of timed automata into untimed finite automata [7] enables exhaustive analysis for a certain class of continuous-time discrete-value models. In the stochastic part, there is a reduction procedure allowing to reduce a continuous-time stochastic model by a continuous-time Markov chain to a discrete-time Markov chain and a Poisson process (or a birth process) by uniformization techniques [339, 142]. All reductions at this level are exact, provided that no information is lost.
Concerning the bottom right quadrant, there are techniques to abstract (or approximate) continuous-value models by discrete-value models. Formal abstractions allow specific properties to be preserved by means of over-approximation (resp. under-approximation) of model behaviour. Over-approximating abstractions are conservative in the sense that each behaviour of the original model is also present in the abstract model but there can appear a new behaviour, not present in the original model. Under-approximating abstractions ensure that no execution is added to the abstract model but some can be lost. For certain classes of ordinary differential equations, there exist formal abstraction techniques providing a discrete-time discrete-value over-approximation in terms of non-deterministic finite automata [135, 55, 209, 120] or a continuous-time discrete-value over-approximation in terms of timed automata [276]. In the former case, the extent of falsely added executions is usually large whereas the latter case prevents the addition of any executions with non-realistic timing and therefore the number of false executions can be reduced.
Besides formal abstraction, there are approximations that distort the original behaviour. Such approximations do not guarantee the preservation of dynamics but the deviation of behaviour is ensured not to exceed a certain (specified) approximation error. In the case of continuous-time deterministic models, typically non-linear, the most widely used approximation is provided by numerical simulation (integration) methods.
Some classes of continuous-time stochastic models can be approximated by deterministic continuous-time models (ODEs) by means of fluid-approximation techniques [131]. Based on these techniques, more sophisticated analysis methods for stochastic models, combining fluid-approximation with CTMC analysis have recently been proposed [74]. The advantage of these techniques is that they avoid the state-explosion problem. Another approximation technique for stochastic models is called lin-
28
ear noise approximation [71]. It is based on approximating the discrete-value stochastic semantics of a sCRNM in terms of a continuous-value Gaussian process.
2.3.6 Other Types of Models Rule-Based Models
Models defined in previous sections have dealt with fixed (and finite) network structures. Such a setting is adequate for high-level modelling, but it can be limiting when a more detailed level of modelling is required. For example, let us consider a phosphorylation reaction that affects a structure of a certain protein P by attaching a phosphate group to one of its amino acid sites. In terms of reaction networks, we have to consider a new species P' that represents the protein under that particular modification. However, there might be many other sites that can be phoshorylated in a similar way. In particular, protein P can exist in several states that combine the presence of phosphorylated sites arbitrarily. When using reaction or influence networks, we have to model all those states explicitly as individual forms of P. The central feature of rule-based models is to avoid the combinatorial explosion by lifting a concrete species to a species type that assigns the species with a set of allowed modifications (states). Reactions are lifted to rules that act over species types instead of concrete species. Such an approach allows powerful compaction of model descriptions.
Rule-based languages that form a fundamental basis of the rule-based paradigm are re-calculus [130, 78] and BNGL [161, 63]. In addition to expressing species states the language has constructs to describe bonds among the species structures. In full re-calculus, rates are assigned to rules allowing to turn the model state space into a continuous-time stochastic model driven by a CTMC in a way similar to Definition 2.21 under the assumption that the set of solutions generated by the rules is finite. There are also ways to deal with unbounded models in terms of simulation [336]. Owing to the complex forms of rules targeting structural changes of biological objects, achieving a correct stochastic semantics is a non-trivial goal and there are several recent papers targeting those issues in [96,170].
The principle of rule-based modelling has been employed also in the tool BioCham addressing a comprehensive set of analysis methods [167, 94, 166]. In our research, we have developed an abstract rule-based language BCS allowing compact human-readable description of chemical reaction networks [136]. In [88] we have demonstrated on existing models of photosynthesis that rule-based description can significantly reduce the size of the model by compactly capturing its unambiguous core part.
Based on the idea of rule-based languages, there has been recently developed a new typed multi-scale language Chromar allowing to express
29
very general attributes of biological objects [223] while being based on rigorous structural operational semantics. The language makes an important contribution for application of rule-based methods in modelling of biological structures and processes at the multi-cellular level.
Process Algebraic Models
Process algebraic models provide an expressive framework that allows formal specification and modular (compositional) analysis of concurrent processes [310]. The processes are described in a compact form without any ambiguity about the interactions, communications, and synchronisations between a collection of concurrent processes (also called agents). Under this framework, biological species can be modelled as processes interacting with each other. The compositionality is an important aspect of process algebraic approaches, it offers the possibility of defining the whole (possibly multi-scale) system hierarchically, starting from the specification of its subcomponents. Examples of process algebraic approaches in computational systems biology are Beta-Binders/BlenX [138], SPiM [304], BioSPI [316], Bio-PEPA [114], sCCP [76], and BioShape [42]. Enhancing the process algebraic description with typed structures allows description of high-level processes such as translocation or movement between cellular parts [138]. Several approaches specifically focus on inter-cellular interaction [315, 98].
The relation of process algebraic and rule-based description and the framework of biological models used in the thesis is the following. Specifications encoded in process algebraic or rule-based languages are usually employed as intermediate models that are then translated in basic computational models using different semantics: deterministic (ODEs), stochastic (Continuous Time Markov Chains), or qualitative (transitions systems). That way, the rule-based and process-algebraic framework can be considered as a pre-processor for biological models defined in Section 2.3. Practically, the translation can be done under the assumption that the resulting models have a finite number of state components.
Petri Nets
The modelling framework of Petri nets was introduced by Carl Adam Petri in 1962 with the purpose of describing chemical processes [303]. Petri nets have been exploited by computational systems biologists as a formalism suitable for the description of biochemical reaction systems, where the tokens are interpreted as single molecules of the species involved. The Petri net formalism provides a natural framework in which both qualitative (given by the static structural topology of the Petri nets) and quantitative (given by the time evolution of the token distribution) analysis are tightly
30
integrated [215]. Important tools for Petri nets used in computational biology are Snoopy [216], MARCIE [328], GreatSPN [21], and MIRACH [246].
Advantages of Petri nets related to their application to biological models are two-fold. First, high-level extensions of Petri nets such as coloured Petri nets [194], Time or Hybrid Petri nets [307, 281, 144] allow describing processes at an abstract and even multi-scale level. Second, Petri nets provide a basis for specific analysis methods that are performed at the level of the formalism, e.g., [214, 321, 69,107, 202].
Discrete Time Stochastic Models
Discrete computational models employ non-determinism to model concurrency. To quantitatively differentiate among all possible executions in a particular state, rules can be assigned probabilities, resulting in discrete stochastic models [77, 367] most typically represented by discrete-time Markov chains (DTMC).
Hybrid Models
In addition to the basic model paradigms mentioned in previous sections, it is worth mentioning that there is a class of models that allows to combine the discrete and continuous semantics. Such models are called hybrid models. They are most typically represented by means of hybrid automata [217] or process algebraic techniques [185, 75]. Hybrid models allow to mix discrete-value components with continuous-value components and discrete-time dynamics with continuous-time dynamics. Such a complicated semantics limits the model analysis [219, 95]. Hybrid models can be satisfactorily used for modelling and simulation [144] and, when simplifying assumptions are employed (e.g., considering linear dynamics of continuous components), also for a more advanced analysis of biological processes [128, 181]. A hybrid model represented in terms of a hybrid automaton with uncertain parameters is employed in [28] targeting design of robust cardiac pacemakers. The work exploits the design problem of estimating optimal parameters of pacemakers.
To incorporate noise, stochastic hybrid models [335] (i.e., stochastic hybrid automata) allow both discrete-time and continuous-time dynamics to evolve randomly. Coupling of both kinds of dynamics while keeping their stochasticity complicates analysis even more. To this end, simulation-based (statistical) [133] or fluid-flow approximation techniques [75, 122] have been developed.
31
2.4 Model Encoding and Abstraction
In Section 2.3 we have presented several kinds of mathematical and computational models used in systems biology. The main focus in this thesis is to cBNMs, sCRNMs, and dINMs that cover the commonly used types of models. In this section, we address the problem of translating these models into low-level formalisms that enable us to perform computer-aided analysis of the models.
In particular, we start with discussing the issue of bounded model variable and parameter domains that make an important precursor of decidability of most of the analysis methods. Next, we present extended variants of several formal methods structures. Finally, methods that allow translating the models into these formalisms.
2.4.1 Model Variables
Firstly, it is important to discuss the number of variables that might appear in a model. Since this number gives the dimension of the underlying dynamical systems, for many methods it is crucial to have a finite number of dimensions. In the case of all network models, we consider in this thesis, the number of species is always assumed to be finite. However, in biological reality (e.g., from the perspective of evolutionary biology) the number of distinct species can be undetermined. The only computational models that are capable of dealing with this issue (e.g., expressing polymerisation) are provided by rule-based or process-algebraic formalisms. However, in the case of unbounded network models, the analysis tasks are limited to static analysis and simulation [336].
Secondly, it is important to note that component quantities in biological models most typically do not evolve unlimitedly under normal conditions. In particular, concentration (or a number of molecules) is always limited by physical degradation processes. However, it might not be easy to identify the exact bounds without a deeper analysis of the model. In any case, some physical assumptions on maximal (extreme) bounds on component quantities can be always considered. In our encoding of models, we always consider bounded variable domains.
Considering dINMs, bounds on variables are inherently included in the model. In particular, every species has a finite domain of discrete "concentration levels" provided that the maximal level can be understood as an "arbitrary value above any number from the lower level". The problem of mapping experimentally measured data into such a finite domain has been extensively studied and algorithms are available [332].
In the case of the continuous and stochastic models, no bounds are given and the dynamics of the model can theoretically evolve unboundedly. Introducing any bounds without the risk of losing some important
32
information on systems dynamics requires detailed analysis of the systems dynamics that can be very hard or even intractable in general (e.g., equilibria and their asymptotic properties). In practical cases, reasonable physical bounds are used together with assumption-based principles applied to reason about the dynamics at the system's boundary. In any case, the variable domains remain uncountable and must be treated accordingly in a computer.
2.4.2 Model Parameters
The parameter domains we consider are M>o and No, for quantitative models and discrete models, respectively. Apparently, these domains reflect all theoretically possible settings (valuations) of parameter values and are thus unbounded in general. In the case of dINMs, the set of parameter values is typically finite since the parameters representing unknown influences take values from some finite domain (combinations of values from the finite domain of variable values). In the case of quantitative models, the range of admissible parameter values is typically bounded by physical limitations of the particular reactions or processes (corresponding maximal numbers are usually publicly available). Nevertheless, the range remains infinite since the parameters domain in these models is uncountable.
2.4.3 Parameter Perturbations
In many practical cases it is useful to study the system dynamics in a smaller range of parameter values around a certain reference value. In that case, a lower bound and an upper bound are considered for model parameters. In other words, the meaning of _L covering all admissible values is changed to cover only values from a given interval. The (multidimensional) range of parameter valuations is called a perturbation space.
Let M be a cCRNM, a cRINM or a sCRNM with a parameterisation xm ■ We assign each g £ \M a so-called perturbation interval [mine, maxQ] C P where ming, maxQ £ P are the minimal and maximal allowed rate coefficients of the reaction g. The parameter space VXM can be set to exactly cover a given perturbation space, in particular, every parameter valuation /x G PXA1 then satisfies fi(g) £ [mine,maxe] for every g e \M- m general, the set of parameter valuations restricted to the product of perturbation intervals is called a perturbation space, denoted P, P C VXM, defined in the following way:
P = {/x | n{g) £ [mine,maxe),g G XM,V G ^xaJ-
Every parameter valuation having such a restricted form is called a perturbation. The concept of perturbation space is analogously extended to
33
remaining types of models (dINMs and cINMs). In general, every parame-terised object is assigned a min-max interval instead of _L.
2.4.4 Parameterised Kripke Structure
For non-stochastic models, we use the notion of Kripke structure as the underlying low-level formalism to encode the model semantics. The reason is that most of our methodology is based on model checking methods which require to associate state-transition semantics with temporal logic. Kripke structure [115] is a state-transition system extended with atomic propositions associated to states and thus allowing to reason over its states and paths with temporal logics. In particular, we extend the standard notion of Kripke structure to allow encoding of parameterised models. To that end, the notion of parameterised Kripke structure is defined in the following definition.
Definition 2.27 Let AP be a set of atomic propositions. A parameterised Kripke structure (PKS) over AP is a tuple JC = (V, S, Sq, —>, L) where V is a set of parameter values (a so-called parameter space of JC), S is a finite set of states, Sq C S is the set of initial states, L : S —> 2AP is a labelling of the states and -^CSxPxSisa transition relation labelled with the parameter valuations. We write s4t instead qf(s,p, t) e —>. We assume that the PKS is total, i.e., for all s, p there exists at least one t such that s4i.
Notation 2.28 We use the notation V(s, t) = {p e V \ s A t} to denote the transition-enabling set of all parameter values under which the transition from s to t is allowed.
Remark 2.29 For the specific analysis task based on LTL model checking we also use a slightly extended definition of (fair) PKS described by a tuple JC = (V,S,Sq,F,^-,L) where the set of fair states F C S is added.
Fixing a concrete parameter value p £ V reduces a parameterised Kripke structure JC to a standard Kripke structure JCP = (S*, So, A, L). A parameterised Kripke structure can be seen as a Kripke structure with labelled transitions, where the transition labels are the sets V(s,t).
Definition 2.30 Let JC = {V, S, S0, L) be a PKS. Define a p-path in the PKS
JC, denoted tvp, as a (possibly infinite) sequence of states tvp = sqsi ■ ■ ■ such that sq e So, Si A Si+ifor all i > 0. To denote the ith state ofirp we use the notation
Tfp{i), 7Tp(«) = Si.
For a PKS JC = (V, S, Sq, F, —>, L) with fair states F, a p-path tvp in JC is defined as a fair infinite-length p-path, i.e., a p-path such that there is a state 7 G F which appears in the sequence infinitely often.
34
It is worth noting that a notion similar to parameterised Kripke structure we employ here has been also mentioned in the context of hardware verification, in particular, to capture delay parameters in logical circuits [301, 211]. Our notion is conceptually different since we define parameterised Kripke structure as a general structure that encapsulates the family of Kripke structures for each possible valuation of the parameters. In our case, the parameterised object is the entire transition relation.
As depicted in Figure 2.4 and described in Section 2.4.6, any cBNM can be approximated and abstracted by means of a PKS. The relevant procedures are described in Section 2.4.6.
It is also worth noting that the notion of parameter space of a PKS does not necessarily coincide with the notion of parameter space VXM of a given model M.. In particular, this applies in cases of cINMs and cRINMs. Again, more on that is discussed in Section 2.4.6.
2.4.5 Parameterised CTMC
In the case of sCRNMs, we set the parameterised models semantics in terms of parameterised CTMCs. In particular, we assume an sCRNM M = (J\f,vSRN,X,Xo) with a parameterisation \M where every g e \M is assigned a perturbation interval of parameter valuation vsrn{q) £ [mirig, maxp] C M>0 forming the perturbation space P C VXM as defined in Section 2.4.3. For every /i 6 P, the dynamics of M-px (//) is represented by a CTMC CM = (X, Xq, Qm) according to Definition 2.21. The parameterised sCRNM M. is therefore represented by a family of CTMCs induced by the parameter perturbation space P, C = {CM | // £ P}. The individual CTMCs in C differ only in the transition matrix that is affected by changing the parameter values.
2.4.6 Discrete Abstraction of Continuous Models
In order to enable the application of automatised global analysis methods to cBNMs, a model is approximated and abstracted in terms of a state-transition system that represents a finite quotient of the continuous model. The workflow of model approximation and abstraction is shown schematically in Figure 2.4 (left).
A cBNM is approximated by an intermediate model also represented in the semantics domain of ODEs but having a more regular structure that is suitable for rigorous abstraction [10]. In the general case, a piece-wise multi-affine ODE (PWMA) is employed for the intermediate model. PWMA reflects a finite partition of the vector field provided that on every region of the partition the vector field is defined by a multi-affine function. For such kind of systems, rigorous abstraction methods have been developed, e.g., [135, 55, 49, 276, 120], allowing to formally over-approximate
35
(cCRNM) (cINM) (cRINM) [cIINM] [sCRNM]
Q_ approximation
(pwma)
I
C^^stractigrT^
parameterised Kripke structure
parameterised CTMC
Figure 2.4: General workflow of encoding a parameterised models employed in the thesis.
s W V\W
\ V\\\\\\\\
\ \ \\\\\\v\
\ wwwwn
w wnnnxw
\ wwnww
\\\\\\\\v\ \\\\\\\\\\
\ nn\'v\'\ M mm
Figure 2.5: A vector field of an approximated ODE system (left) and discretisation of the emphasised region (middle). Thresholds determining the rectangles were obtained by the algorithm in [205]. States and transitions of the Kripke structure corresponding to the discretisation (right).
the continuous system by a finite state-transition system. In this thesis we employ rectangular abstraction methods introduced in [55, 49, 120], previously implemented in software tools RoVerGeNe [49] and BioDiVinE [35], and applied e.g. in [205, 46]. The abstraction is presented by using an encoding of the resulting parameterised state-transition relation in terms of SMT formulae. This allows us a compact description of the abstract parameterised model that copes with continuous parameter space. This encoding has been first proposed in our work [56].
Model Approximation
Semantics of a cBNM A4 of any type as defined in Section 2.3 has the general form of a continuous dynamical system x = f(x,p) where x : M>o —> M>0 where n G N is the dimension of the system (determined by the cardi-
36
0 1 2 3 4 5 6
concentration ot [A]
Figure 2.6: Optimal approximation of sigmoid functions by piece-wise affine functions according to [205].
nality of the set of species of the respective biological network) and p £ Mm is a parameter vector where m £ N is the number of parameters (determined by the cardinality of the parameterisation xm)- The shape of the vector field / depends on the particular model type:
• If A7! is a cCRNM representing a network where all stoichiometry coefficients are 1 then for all i < n, f is multi-affine in x and affine in p. In case there is a stoichiometry coefficient higher than 1, the affected fi is a polynomial of degree equal to the maximal stoichiometry coefficient. Affinity in p is preserved.
• If A7! is a cINM or cRINM then for all i < n, fi is in general a nonlinear function typically having the form of a rational function of x which is affine in p. Nevertheless, there can be components of x in which some fi might be polynomial or even affine (e.g., a species in a reaction-influence network which does not take part in a complex affected by any influenced reaction).
The approximation is needed if the criterion that every fi is multi-affine in x is violated. Apparently, this applies in cases of cCRNMs with a stoichiometry coefficient higher than 1 and in case of any cINM or cRINM that contains a non-linear regulation function (or a reaction with a reactant with stoichiometry greater than 1 in case of cRINMs). To achieve that we employ the approach defined in [205]. In particular, each regulation function occur-
37
ring in fi is approximated with an optimal piece-wise multi-affine function as shown in Figure 2.6 (in case of a one-dimensional regulation function the approximate function is piece-wise affine). In this procedure, a finite number of segments is introduced for every component of x. Segments partition the vector field orthogonally into a finite set of (hypher)rectangular regions. Analogous procedure is applied to polynomial functions representing kinetics of reactions with the stoichiometry of reactants greater than 1.
Model Abstraction
We assume that we are given a set of thresholds {9\,..., 9%n.} c M>o for each variable x\ such that 9\ < 9\ < • • • < 9%n.. In case of a cINM or cRINM, the thresholds are given by exactly the points that separate individual segments of an approximated regulation function. However, in case of a cCRNM with stoichiometry coefficient at most 1 the selection of thresholds can be arbitrary. Further, we assume that the dynamical system satisfies for all i < n, fi is multi-affine on every re-dimensional rectangle , 9ji+1] x • • • x [6™n, • Each rectangle is uniquely identified via an re-tuple of numbers: R(JU ...,jn) = [9^, 0ji+1] x • • • x [0? , 0?n+1], where the range of each ji is {1,..., re« — 1}. We also define VR(j±,... ,jn) to be the set of all vertices of R(j±,..., jn).
Boundary Condition In order to establish a finite rectangular abstraction of the intermediate PWMA model, special care has to be given to boundary rectangles. A boundary rectangle is any rectangle R(ji, ...,jn) where for some i either ji = 1 or ji = re« — 1. Any dimension i satisfying that condition is called a boundary dimension of R(ji,jn)- We restrict ourselves to models where the dynamics is bounded in the range specified by lower and upper thresholds - trajectories cannot exit that range (note that this could occur only in boundary rectangles). Formally, all trajectories determined by the PWMA model are required to keep x\ £ [9\, 9%n.]. We restrict ourselves to parameter spaces where this requirement is satisfied for all parameter values. More precisely, we assume the so-called boundary condition to be satisfied: for every boundary rectangle R(ji, ...,jn) we assume that for all p G F,i G {l,...,n},x G R(ji,...,jn) it holds that (ji = 1 A xt = 9\) => fi(x,p) > 0 and (ji = n{ - 1 A x{ = 9ln.) =>- fi(x,p) < 0.
Abstraction-Encoding PKS There are two basic questions solved by the abstraction procedure. First, for a given rectangle we need to identify to which neighbouring rectangle the dynamics flows. Second, we need to check if there is a possibility to remain in the given rectangle for infinite time.
38
Technically, the rectangular abstraction results in an (abstraction-encoding) parameterised Kripke structure JCabst = (VXm,S,Sq,^>,L) with S = l(ii)---)in) | Vi : 1 < ji < rii} where each a g S represents the rectangle R(a) and 5*0 = {ao} such that xq g R(ao) (a rectangle corresponding to the initial condition xq). The atomic propositions representing concentration inequalities are assigned to adequate states by means of the labelling L. In particular, L : S ->• 2AP where AP = {xt 0 6j | 1 < i < n,l < j < rij},0 g {<,>}}• The parameter space of )Cabst is directly given by the parameter space of the abstracted model.
The parameterised transition relation —> is defined between any two states representing neighbouring rectangles. Each transition is associated with a subset V(a,a') of parameter values under which it is enabled. In particular, the set V(a,a') is encoded symbolically by a formula V(a, a') = {p g V | p |= $a,a'}, that expresses the condition that conservatively characterises continuous flows from R(a) to R(a') by investigating the vertices of the facet R(a) n R(a').
Formally, Let a = (ji,... ,jn) g S, 1 < i < n and d g {—1, +1}. We denote ahd = (ji,... ,ji + d,... ,jn) (if ji + d is in the valid range). Thus ahd describes all the neighbouring rectangles of a. We further denote vl,+1 (a) = VR(a)D{(...,ji + l,...)} and?/'^(a) = VR(a)n{(..., ji}...)}. Foreverypair of states a, a%'d e 5, 1 < i < n, 0
Additionally, the rectangular abstraction approximates the potential existence of a fixed point in any rectangle a g S. This is achieved by means of introducing a self-transition a —> a. In particular, a self-transition is enabled in a state a g S for all parameter valuations p g P satisfying 0 g hull{f(v,p) | u g VE(o;)} (the zero vector included in the convex hull of the rectangle vertices). This is symbolically encoded by the formula $>a:a defined in the following way:
3ci,...,cfc: ^/\cj>0^ A (^2ci = l)j A (^2* ■ f{Vi,p) = oj
(2.1)
where k = | VR(a) | is the number of vertices of the rectangle a.
The formula $a,a can be simplified with a coarser characterisation given by the following formula:
V (-^«,-1,0, A -'^a)a«,+i A A $a«,+i,a))
39
The formula $>'a a, is true just if there is either a pair of transitions ah 1 —> a —> al,+1 or a pair of transitions al,+1 —> a —> a1^1 provided that the respective two transitions are the only transitions allowed in ith dimension through the rectangle a. According to [49], this situation implies that the zero vector is not included in the convex hull of the vectors in rectangle vertices. That makes a necessary condition for non-existence of a fixed point inside the rectangle. In particular, $'aa ^a,a.
Interval-Based Encoding of Parameter Value Sets
In case every fi(x,p) depends on at most a single component in p (for every i < 0 there is at most one unknown parameter occurring in fi, we call the parameters to be mutually independent), encoding of parameter values can be significantly simplified. In such case the set V(a, a') can be represented as a Cartesian product of closed intervals describing ranges of individual parameters values that enable the transition a —» a'. In a bounded parameter space these sets form (hypher)rectangular regions. In consequence, the representation of parameter sets, as well as the overall parameter synthesis procedure, can be significantly simplified.
Relation between a cBNM and its Discrete Abstraction
In [49, 55] it is shown that for a given multi-affine or even piece-wise multi-affine continuous system, the rectangular abstraction mentioned above conservatively abstracts almost all signals of the continuous system. A comprehensive overview of the conservativeness results for piece-wise affine and multi-affine systems is given in [120].
Formally, we consider a piece-wise multi-affine system (PWMA) x = f(x,p) defined on the bounded state space X = niLit^I' ^nj c ^>o where n is the dimension of the system, 9\ and 0™. is the lower and upper bound of variable x-i, and the parameter p ranges in a parameter space P C M>0 where m is the dimension of the parameter space. We consider we have obtained a PKS ICabst representing a rectangular abstraction of the PWMA system. We assume )Cabst satisfies the boundary condition and is constructed using the steps mentioned above. Before we characterise the relation between )Cabst and the abstracted PWMA system, we state the definition of signal-path correspondence.
Definition 2.31 Let JCabst = {VXM, S, Sq, —>, L) be a parameterised Kripke structure encoding a rectangular abstraction of a PWMA system x = f(x,p) satisfying the boundary condition. We say a continuous signal { : T —> M™0 corresponds to a p-path tvp = olq, a±,... of K.abst iff all the following conditions are satisfied:
1. i satisfies {(0) e R(ao),
40
2. for any teT there is i e No such that {(£) e R(a,i),
3. i passes subsequently through the states ofirp.
The relation between the semantics of the PWMA system and K,ahst is characterised by the condition of global sufficiency [120] stated formally in the following claim.
Claim 2.32 Tor any initial condition xq g X and the respective continuous (exact) signal i generated by a PWMA system x = f(x, p), {(0) = xq, there exists a p-path Tip in JCabst such that { corresponds to ttp.
Remark 2.33 The dual condition called global necessity has been also discussed in [55, 120]. It requires for any path tt in JCabst there exist an initial condition xq and a corresponding signal { in the PWMA system such that {(0) = xq and { corresponds to tv. However, it is shown that even for piece-wise affine systems the condition is not satisfied by the rectangular abstraction.
The conservativeness of the abstraction guarantees the resulting Kripke structure provides an over-approximation of the original PWMA system. In particular, no behaviour is lost by the abstraction but false behaviour can be introduced.
Next, the relation between the original cBNM and the abstraction has to be discussed. Due to the fact the approximation step affects the shape of the signals, we cannot directly transfer the global sufficiency condition to the cBNM. Precise characterisation of the error produced by the approximation step is not yet available. However, it has been shown in [205] and also in our own work the vector field of the original cBNM and the approximated PWMA system can agree well if sufficient precision is used for local piece-wise linearisation of the regulation functions. The more significant problem is the extent of over-approximation - the number of false (so-called spurious) paths introduced in the abstraction is usually high. To that end, we have provided a variant of the procedure that combines the abstraction procedure with local numerical simulation [87]. However, the method is computationally demanding. Recently we have developed a method employing ^-decision SMT over formulae with reals [79] that embeds the c)-decision-based solver of ODEs [187] within the concept of rectangular abstraction.
Finally, we can summarise the overall procedure - a given parame-terised cBNM Mpx with an initial condition xq and a parameterisation \M is first approximated by a PWMA model and then abstracted in terms of a parameterised Kripke structure constructed as mentioned above and denoted K,ahst€ (Mpx ). The first step (approximation) means that for any given /x g PXA1 the semantics [A/(px(/i)]]cBNM represented by an exact signal is approximated with some (unspecified) numerical error e by a signal f
41
generated by a PWMA model. In the second step (abstraction), the signal £e is over-approximated by a set of all paths ir in K.abst€(A4-px)^ such that 7r(0) = ao and xq g -R(ao)- The resulting semantics of M-p obtained after the approximation and abstraction steps is denoted [A^^Jcbnm anc^ defined:
lMvxtMU = ^abst\MVx).
Other Approaches
The main characteristics of the abstraction presented above are represented by two properties: (i) the abstract structure represents the behaviour of the dynamical system globally (it covers any initial condition of the original continuous system), (ii) the abstract structure captures parameter uncertainty by encoding the parameter values in terms of first-order formulae (it covers any parameter value in the considered domain of uncertainty). An alternative way to represent global behaviour together with parameter uncertainty is using polytopes as implemented in RoVerGeNe [49] and further refined in SpaceRover [67] (the tools are performing on cINMs but can be easily extended to work with general cRINMs). The approach introduced in [51] targets qualitative abstractions that are specific for cINMs. It relies on representing the regulatory function by means of discontinuous step functions resulting in piece-wise affine dynamics defined on a partition that is given symbolically by qualitative (numerically unspecified) thresholds. The method is implemented in the GNA tool [135].
A universal global approach to the finite representation of dynamical (and even hybrid) systems under parameter uncertainty is to encode the models entirely in terms of first-order formulae over real numbers and solve them using a ^-decision SMT solver [156, 187, 252]. This provides an implicit (relational) abstraction of the continuous dynamics flow [323]. Such a relation gives a unique symbolic representation of the system dynamics in the given bounded domain. However, in the case of global analysis (e.g., large and complex initial sets and the need to explore the entire bounded domain of systems dynamics), these techniques quickly become computationally demanding. We believe that techniques that will combine discrete abstractions with symbolic approaches have high potential.
Abstractions of nonlinear systems based on interval analysis have been introduced in [344, 313]. Although they are guaranteed to give over-approximations to the discrete dynamics of the continuous system (sufficiency), the quality of the approximation can be (and typically is) rather poor.
A technique close to the approach employed here is presented in [65]. Rather than using a fixed polyhedral subdivision, a simplicial subdivision is computed based on the system itself, and the resulting flow induced multi-valued map provides a good abstraction of the continuous system.
42
However, a good polyhedral subdivision itself is extremely difficult to compute, and the method is likely to be useful only in low dimensions.
There is a group of approaches based on qualitative reasoning (theorem proving) with predicates over the reals [305, 350]. In that case, the construction is not fully automatised and it is challenging to find suitable abstraction predicates for non-linear systems.
Several techniques allowing hybridisation of the continuous dynamics have been introduced, e.g., [18, 129]. These techniques follow the dynamics with a significantly less extent of false behaviour but they are restricted to bounded-time reachability problems. In [66] the authors employ conic abstraction restricted to affine systems that allows unbounded reachability analysis. For multi-affine systems, conic abstractions have been combined with rectangular partitioning to reduce the extent of over-approximation [60].
The techniques presented in [101, 276, 364] employ timed automata in various forms for the abstraction. An abstraction method allowing to capture unbounded-time behaviour is addressed in [101].
For certain problems, polynomial boundaries (barrier certificates [308] or multivariate polynomial partitioning [250]) can be employed. Such methods give definitely more precise results than polyhedral partitioning but are computationally demanding and not yet applicable to models with uncertain parameters.
2.5 Temporal Logics for Biological Systems
In this subsection we give an overview of temporal logics that are considered relevant for expressing properties of biological systems. The mentioned set of properties is not complete but it rather focuses on the types of logics used in our research. Logics used in following chapters are described formally.
Qualitative Logics
With respect to the level of abstraction employed, there are two classes of temporal properties. Qualitative properties abstract away from any quantitative information like time aspects or energy costs of systems dynamics. Qualitative properties are in general interpretable on all types of models, especially on untimed discrete-value models (dINMs and discrete abstractions of cBNMs). Two basic logical formalisms allowing to express qualitative properties of systems dynamics are the linear-time temporal logic (LTL) [306], interpreted on individual model executions (paths), and computation tree logic (CTL) [117], interpreted on trees of (non-deterministically) branching model executions.
43
Quantitative properties are usually expressed in formalisms based on the aforementioned temporal logics. They can be roughly divided into deterministic (applicable to cRINMs, cCRNMs and cINMs) and stochastic logics (applicable to sCRNMs). Deterministic logics are mostly focusing on a quantitative notion of time. The time extension of CTL called Timed Computational Tree Logic (TCTL) has been introduced in [7].
Quantitative Logics
Metric Temporal Logic (MTL) [254] is general quantitative temporal logic that allows to quantify modalities with the time frame represented by a closed time interval. MTL possesses both discrete and continuous semantics, as it can be interpreted on infinite timed state sequences as well as continuous signals. The possibility of singular values occurring as time-quantifiers of the modalities causes the problem of formula satisfiability to be undecidable for MTL. To that end, Metric Interval Logic (MITL) has been introduced in [8] as a practical restriction of MTL that allows only non-singular intervals to quantify the modalities. Another time extension of LTL called Timed Propositional Temporal Logic (TPTL) [9] is based on freeze-quantification where extra clocks are used to specify temporal constraints. Motivated by the application of verification and monitoring techniques to continuous-value and hybrid systems, Signal Temporal Logic (STL) has been introduced [277]. It combines the dense time modalities of MITL with the numerical predicates over real numbers. Technically, the logic is interpreted over piece-wise linear continuous-time signals on which a qualitative and a quantitative semantics [151] is defined including an efficient algorithm [149].
A version of LTL with constraints over the reals, called LTL(M), has been proposed in [15] to express temporal properties of molecular concentrations and their derivatives. A quantifier-free fragment of the first-order extension of LTL(M), called QFLTL(M), has been considered in [163]. It allows to use free variables in the atomic propositions and, thus, it enables to analyse robustness of numerical data time series with temporal logic and to automatically compute LTL(M) specifications from experimental traces.
Stochastic logics express the probability and performance measures on Markov chains. To formalise properties of CTMCs, Continuous Stochastic Logic (CSL) [20] has been introduced. It is a probabilistic extension of CTL with continuous-time semantics. To further broaden the scope of possibly expressible behaviour, CSL have been extended to allow the specification over reward-based stochastic models, i.e., Markov chains with real-valued rewards/costs attached to states and transitions [260]. The extension enables to express properties such as the expected time a system spends in a specified set of states over a time interval or the expected number of times that a particular reaction occurs.
44
Extended Logics
Expressing biological phenomena can require extensions of existing logics. Biologically relevant temporal logic extensions target precise quantitative description of oscillations [143,43] or qualitative properties combining linear-time properties with branching-time [280]. In our work we have introduced two extensions of temporal logics. In the domain of linear-time logics we have addressed an extension of STL by means of introducing a value-freezing operator that allows to store a signal value at the time point where the operator is evaluated. The stored value can be then referred in predicates placed in the scope of the value-freezing operator. This allows STL* to specify and distinguish various dynamic aspects which occur in biological systems, in addition to the phenomena mentioned above, these can be, e.g., damped oscillations or local extremes in species concentration.
In the domain of branching-time logics, we have combined two known extensions of CTL - an extension HCTL adding hybrid operators including past operators and allowing to use of state variables that can be fixed in certain parts of the formula as well as quantified [16], and an extension UCTL adding event predicates over single-step system evolutions [346]. The resulting logic called HUCTL [58] allows to efficiently express global and local properties of phase spaces of dynamical systems that cannot be expressed in LTL/CTL, e.g., the presence of a given number of mutually exclusive stable attractors. The need for hybrid branching-time logics in the domain of biological systems has been also addressed in [17].
In the remaining part of this section, we describe in more detail the logics that are selected to be used in following chapters of the thesis.
2.5.1 Linear Time Temporal Logic - LTL
LTL captures the temporal properties of paths in discrete state-transition systems. In particular, LTL formulae are interpreted on infinite paths generated by a Kripke structure.
LTL formulae are defined by the following abstract syntax:
if ::= Q | -xp | tpi A (f2 | Xtp | ?iU?2
where Q ranges over atomic propositions taken from a set AP. We use the standard abbreviations like Ftp which stands for trueXJp or Gp which stands for -iF-np.
The semantics of an LTL formula tp is interpreted on infinite paths of a Kripke structure K. = (S, So, —>, L). For an infinite path ir = sqsi... and some i 6 No we use the notation -k% to denote the infinite path -k% = and the notation -k{%) to denote the state Sj.
TrhcQ iff Q g L(tt(0))
45
\=K _,' iff 7T ty=JC if
|=/C >l A ^2 iff 7T |=/C 2
7T |=£ X 5 iff 7T1 |=/C 5
^ |=£ ViU^2 iff 3i £ No such that 7r* |= p>2 and Vj < i.iri |= ?i
We say a Kripke structure AC satisfies ip, written K. |= ?, iff for all s & So it holds that every path tt in K, such that 7r(0) = s satisfies tt \=jc ip. This is called a universal interpretation of LTL.
Examples of some typical LTL formulae used for biological systems are [165]:
• F [ip] expresses a reachability of a state where the condition tp holds,
• G [ip] expresses a stabilisation with tp being continually true,
• [ipi F1P2] A [ip2 Fipi] expresses permanent oscillation between
ipi and p>2-
2.5.2 Computation Tree Temporal Logic - CTL
The key characteristics of CTL is it captures the branching behaviour of discrete state-transition systems. More precisely, CTL formulae are interpreted on states of a Kripke structure.
CTL formulae are defined by the following abstract syntax:
ip ::= Q I -up I tpi A p>2 I AXtp \ EXp | AU | EU
where Q ranges over atomic propositions taken from a set AP. We denote by cl(ip) the set of all subformulae of tp. We use the standard abbreviations like EF5 which stands for E(true U ip) or AG? which stands for -1EF -up.
The semantics of a CTL formula tp is interpreted on states of a Kripke structure K. = (S,Sq,^-,L). For every state s 6 S and a particular form of the formula the semantics is given in the following way:
--K Q iff Q £ L(s)
-K. ~y iff ^K*5
-K. P>\ A p>2 iff s \=jc ipi and s \=jc p>2
-fC AX tp iff for all tt in K, such that 7r(0) = s it holds that ir(l) \=jc ip -fC EX tp iff there exists tt in AC such that 7r(0) = s and ir(l) \=jc ip
-fC A((/?i U (^2) iff for every tt in AC such that 7r(0) = s there exists
i G No such that ir(i) |= ?2 arid Vj < i.ir(j) |= ?i =x; E(?i U (^2) iff there exists 7r in AC such that 7r(0) = s and
3i G No such that ir(i) |= p>2 and Vj < «.7r(j) |= p>\
46
We say a Kripke structure K. satisfies ip, written K. |= ip, iff for all s e 5*0,
Examples of some typical CTL formulae used for biological systems are [165]:
• EF [ip] expresses a reachability of a state where the condition ip holds,
• AG [ip] expresses a stabilisation with ip being continually true,
• EF[AG[?i]] A EF[AG[?2]] expresses a bistable switch (two different stable situations p\, p2 can be reached).
Most frequent types of temporal properties investigated for biochemical models have been collected in [291]. There are two important fragments of CTL relevant for biological models. A formula is said to be positive if it does not contain any negations. We say that a formula is existential (or in ECTL) if it is positive and only contains existential temporal operators. We say that a formula is universal (or in ACTL) if it is positive and only contains universal temporal operators.
2.5.3 Signal Temporal Logic - STL
STL is interpreted on continuous signals. It has been introduced in [277] to address the needs to specify the temporal properties of real-valued signals. Due to the deterministic nature of signals, STL is entirely based in the linear time framework. It is worth noting that LTL cannot be directly employed for continuous structures as it inherently targets discrete paths (e.g., there is no direct meaning of the operator X (next) in the continuous settings). To that end, STL employs principles of linear temporal logics quantitatively addressing time (MTL [254] and MITL [8]) and refines them with predicates evaluated over real values. Technically, STL is based on MITL where the time quantification is restricted to non-singular closed intervals due to decidability reasons. The notion specific for STL is a predicate defined in the following way.
Definition 2.34 Let i e (M>0)T be a signal for some n e N and r e T a given time point. A predicate p is defined as a generic constraint applied to a signal value in the time point t:
p({,t) > 0
It can be interpreted as a subset of the signal value domain M>0 satisfying the given constraint.
The syntax of STL is the following: Definition 2.35 Syntax of STL is defined by the following grammar:
p ::= p | true \ ~^
\ V ip2 | >iU/?2
47
where p is a predicate as of Definition 2.34 and I C M>0 a closed non-singular interval.
The time-quantified versions of operators F and G are defined in the standard way: Fjtp = trueXJjtp, Gjtp = -iFj-xp.
Qualitative semantics is defined with respect to the structure of the formula as follows.
Definition 2.36 Let { e (M>0)T be a signal for some n e N and r e T a time point. Qualitative satisfaction of an STL formula
• {(*) G P
({,*) 1= <=
({,*) \= >l V (f2 <= => ({, t) |= (f! V ({, t) |= 0)T be a signal for some n e N and r e T a time point. Quantitative satisfaction of an STL formula 0.7) F[3,s\[%2 > 0.7)] expresses that for each time point t £ [0, 300] it holds that if the value of x\ in t is greater than 0.7 then there exists time t' £ [t + 3, t + 5] such that the value of x2 in t' is also greater than 0.7,
• g[0,t][((x > maxx) F[piiP2](x < minx)) A {{x < minx) F[pi P2](x > maxx))] expresses a periodical oscillation of x in the time horizon r such that the minimal value minx and the maximal value maxx are interchangeably being reached with a period fluctuating in the interval [pi,p2]-
2.5.4 Value-Freezing Signal Temporal Logic - STL*
In STL it is not possible to express (and distinguish) the classes of signal oscillations such as damped oscillations or oscillations with increasing amplitude. The reason is the impossibility of globally referencing (and relatively comparing) concrete signal values occurring in time points in which some local property is satisfied. STL does not provide any constructs allowing to omit references to concrete values in predicates. E.g., the damped oscillation in Figure 2.7c can be expressed in STL as a sequence exactly reaching the 15 local extremes in the given order. In general, there can appear any number of local extremes in the observed time interval. Such a general property cannot be expressed in STL. To that end, we have introduced the logic STL* that allows referring signal values reached in the past.
As in the case of STL, a formula of STL* expresses a temporal property of finite-time continuous signals. The crucial phenomenon introduced in STL* is the concept of value-freezing. Signal value freezing is facilitated by the notion of frozen time vector defined in the following definition. The structure is used to store time values at various time points which then can be referred in predicates.
Definition 2.38 Let The a freezing operators index set. Frozen time vector t* is a function:
t* ~. X —y M>q
49
Figure 2.7: Various types of oscillations: (a) a permanent oscillation with constant amplitude, (b) oscillation with increasing amplitude, (c) damping oscillation.
The symbol t* = t*(i) is referred to as i-th frozen time.
Predicates comprise Boolean expressions over values of a signal £ at time t and each frozen time where Xj denotes the j-th component of the signal at time t, i.e. {(£) = (x±,..., Xj,..., xn), and x*f the j-th component at time t*. When \X\ = 1, we usually write x* instead of x*1.
In STL* only predicates given by linear inequalities are considered. This is an important assumption employed in Section 6.2 (analytical expressions of robustness can be efficiently computed for linear predicates). This restriction is sufficient in the majority of practical cases.
Definition 2.39 Letn e N, b e Randa^ G Rwherei e {0}UX, j G {1,... ,n}
and not all aij are zero. A predicate is defined as a subset ofRn x (Rn)x such
Freeze operator is used to store the time point into frozen time vector, thus facilitating signal value freezing. The following definition introduces an auxiliary concept of storing the current time t as the ith component of the frozen time vector.
Definition 2.40 Let t* be a frozen time vector, i,j e X and t e M>o- Freezing ith component of t* in t is denoted as t* [i iU/?2 | *i <-P
50
({,*,**) |= true
({,*,**) l=P
({,*,**) 1= ^
l V ^2
({,*,**) |=
({,*,**) |= *i V
where i e Z, true denotes the true constant, p is a predicate as of Definition 2.39 and I C M>0 a cZosed non-singular interval.
Note that all Boolean connectives and temporal operators F and G can be defined using the basic operators defined above in the same way as is done in STL. Similarly to predicates, when \T\ = 1, we usually omit the index of freeze operator, as in * Gi(x > x*) = *i Gi(x > x*1).
Definition 2.42 Let {• e (M>0)T be a signal, t e T a time point and t* e Tx a frozen time vector. Formula satisfaction is defined inductively:
=^ ({(*),{°t*) ep
=^ ({,*>**) |= Pi V ({,£>**) 1= =^ 3 t' E t © / : ({, t', t*) |= if2 A
Vt"e [t,f] :({,*",**) |=(^i =^ (t,t,t*[i *]) |=v
Operator o is wsaf to denote function composition, i.e. ({ o t*) e (M^q)1 and (f o t*)(i) = f(t*) and t © / stands for {t + u|ue/}.
We say a a signal { satisfies a formula ?, written £ |= ?, i^f ({, 0,0) |= ? wfere 0 denotes the zero frozen time vector, i.e., 0 = X x {0}.
Intuitively, interpretation of ? is the following: freeze operator stores signal values at the time of tp evaluation, which can then be referred to using index i in predicates of tp. An example property, "in the next five time units, x increases by 8" can be specified as:
*F[0i5](x > x* +8)
where x* refers to value of x at time 0.
Quantitative semantics of STL* is described in detail in Chapter 6 as a part of a mechanism for robustness analysis of cBNMs.
Representative examples of STL* formulae are the following:
• G[0j80_(4+5)] * [G[4_5j4+5] (y* = x)\, for some small 5 > 0, is an example of delayed correlation of two signals — the signal x copies the values of the signal y with a delay of 4 seconds,
• G[0,60] [F[o,io] * [G[i,io](^* > x + c)} A F[0)io] * [G[lil0](x* < x - c)] expresses a damping oscillation (such as depicted in Figure 2.7c) — there is always a time instant in near future which is a local maximum for some future interval and there is also another time instant in near future which is a local minimum for some future interval (the constant c sets the strength of damping).
51
2.5.5 Continuous Stochastic Logic - CSL
Let C = (X, Xq,Q,L) be a CTMC extended with a labelling function L which assigns to each state X g X the set L(X) of atomic propositions that are valid in state X. We consider a bounded time fragment of CSL with rewards.
The syntax of CSL is defined in the following way. A state formula ip is given as
ip ::= true \ q \ -up \ ip a ip \ P^p[4>] | R~r[C-*] | R^r[l=t]
where 0 is a path formula given as 4> '•'•= X? | ip U7 tp, q is an atomic proposition, ~g {<, <, >, >}, p g [0,1] is a probability, r g M>o is an expected reward and / = [a, b] is a bounded time interval such that a, b g M>o a a < b. Path operators G (always) and F (eventually) are derived in the standard way using the operator U. In order to specify properties containing rewards (R^r[C-*] is the cumulative reward acquired up to time t, R^r[l=t] is the instantaneous reward in time t) the CTMC C is enhanced with reward (cost) structures. Two types of reward structures are used. A state reward rew(X) defines the rate with which a reward is acquired in state X g X. A reward of t ■ rew(X) is acquired if a CTMC remains in state X for t time units. A transition reward trew(Xi, Xj) defines the reward acquired each time the transition Xi —> Xj occurs.
The formal semantics of the bounded fragment of CSL with rewards is defined similarly as the semantics of full CSL [20]. The key part of the semantics is given by the definition of the satisfaction relation |=. It specifies when a state X satisfies the state formula tp (denoted as X |= tp) and when a path 7r satisfies the path formula 4> (denoted as tt |= 4>). Let us recall the notation Pathc(X) denoting the set of all paths starting at X (Definition 2.21). The informal definition of |= is as follows:
• X |= P~p[0] iff the probability of all paths tt g Pathc(X) that satisfy the path formula 4> (denoted as Probc(X, 4>)) satisfies ~ p, where
- 7r satisfies X pi iff the second state on tt satisfies ip
- tt satisfies ip U7 ^ iff there exists a time instant i g I such that the state on tt occupied at t satisfies ^ and all states on tt occupied before t' g [0, t) satisfy p>
• X |= R^r[C-*] iff the sum of expected rewards over Pathc(X) cum-mulated until t time units (denoted as Expc(X, Xc} denotes the set of states that satisfy p>.
52
The syntax and semantics can be easily extended with "quantitative" formulae in the form ip ::= P=?[0] | R=?[C-*] | R=?[l=t], i.e., the topmost operator of the formula ip returns a quantitative result, as used, e.g., in PRISM [261]. In this case the result of a decision procedure is not in the form of a Boolean yes/no answer but is the actual numerical value of the probability Probc(X, 4>) or the expected reward Expc(X, X) for X e {X|=t, Xco.g[F[1,2l(A = 30)] expresses that the probability that the species A will be represented by 30 molecules between 1 and 2 time units is at least 0.9,
• P=?[G[500'100°](A < 30)] expresses the probability that the number of molecules of species A keeps below 30 within the time window between 500 and 1000 time units,
• R=?[C-1000](A > 20) expresses the fraction of time the system exhibits more than 20 molecules of A within the time horizon 1000 time units.
2.5.6 Interpretation on Biological Network Models
In this subsection, we relate semantics of temporal logics presented above with the model classes considered in Section 2.3. For each model class, we define the basic notion of formula satisfaction in a given model and we refine the notion to the settings of the parameterised variant of the particular model class.
dINM
Considering the logics described in detail in previous subsections, only basic temporal logics LTL and CTL can be directly interpreted on dINMs. Let us consider a dINM M. = (M, VdiN, ^> -^o> m) and an LTL or CTL formula ip. The transition system [LWfldiNM = (%-,T,Xo) is turned into a Kripke structure K,{M) = (S, So, —>, L) in the following way:
• S:=X,
• So := {X0},
53
• —> is obtained by totalising the transition relation T — every state X £ X with no outgoing transition in T is assigned a self-transition
X -> X,
• L is defined in such a way that atomic propositions AP = {Xi < 9 | 0 < 9 < rrij} U {Xi = 9 | 0 < 9 < rrij} are assigned to corresponding states.
We say M satisfies ip, written M |= ip, iff K.(M) |= ?. Parameterised dINM
The representation of a dINM in terms of a KS can be directly extended to parameterised models. This is done by encoding the semantics of a parameterised model in terms of a PKS. In particular, the explicit representation by means of a set of transition systems is turned into a compact form of a single PKS.
Formally, let Mpx = (A/, VdiN, X, Xq, m) be a parameterised dINM with a parameterisation \M- The semantics of M-p , fldiNM/ is turned
into a PKS K{MVx) = (VXM ,S,S0, L) where
• S, So and L are assigned in the same way as in the non-parameterised case described above,
• for any fi £ s A s' is included in the transition relation of IC(A4px) iff (s,s') £ T where T is the transition relation
Of [Xpx(/i)]dINM-
Note that the set of parameter valuations VXM makes directly the parameter values set of K,{M.-px).
Finally, we say that a parameterised model M-p satisfies ip wrt a set of parameter valuations P C VXM iff V/x £ P.K.(Mpx)^ |= The situation can be also rephrased in the following way: M-p satisfies tp robustly wrt P.
cBNM
For all kinds of cBNMs, there is natural interpretation of STL and STL*. In particular, we say that a cBNM M satisfies an STL or STL* formula )• The satisfaction can be understood also approximately, e.g., for signals obtained by numerical simulation: a cBNM M approximately satisfies an STL or STL* formula ip, written M \=e ip, iff an approximate signal f = lMfcBNU satisfies f \= ip (we can write directly [XlcBNM 1= >)• LTL and CTL cannot be interpreted on cBNMs directly but only on discrete abstractions of cBNMs. The relation can be specific for particular fragments of the logics and the particular form of differential equations un-
54
derlying the abstracted cBNM. The rectangular abstraction defined in Section 2.4.6 is conservative resulting with the following characterisation.
First, we need to clarify the approximate interpretation of LTL over signals. We say that M. approximately satisfies an LTL formula p, written [-MflcBNM \=e Ja6s*e is depicted by the polygonal region.
Owing to the fact K,ahst€ makes an over-approximation of the intermediate PWMA model, every fi e VXM such that [A^ (//)]] £bnm 1= > implies that an LTL or ACTL formula ip is (theoretically) satisfied also in the corresponding PWMA model. However, there might be // £ T^xm such that \M-px cBNM V= V but me formula is (theoretically) satisfied in the corresponding PWMA model.
Finally, since M. comes through the approximation procedure, we cannot tell anything exact about satisfaction of ip in the original cBNM M-p (//). Thus we can speak about "approximate" satisfaction as discussed above: if lMpx(fi)}cBNM l= > men M-px([i) \=e ip (\=e is understood up-to the distance between the signal [A/(px(/i)]]cBNM arid its approximated counterpart generated by the PWMA approximation of 73 (//)). The situation for a set of parameter valuations P wrt to an interpretation of an LTL or ACTL formula ip on the original cBNM, its approximate PWMA and the corresponding abstract Kripke structure is depicted schematically in Figure 2.8.
sCRNM
The semantics of a sCRNM model M. is given as a CTMC (X, Xq, Q). To enable interpretation of temporal logic over CTMC, the structure is extended with a labelling function L : X —> 2AP that assigns atomic propositions to states. Since the matrix Q represents a (transition) relation among states, the CTMC extended with L induces a Kripke structure (the transition rela-
56
tion has to be totalised in a similar way as mentioned above). Qualitative interpretation of LTL and CTL neglecting the stochasticity can be then realised in the same way as in the case of dINMs.
The logic naturally interpreted on sCRNMs is CSL. We say M. satisfies a formula p> of the form P^p[0], R^r[C-*], or R^r[l=t], written M. |= tp, iff X0 g SatlM}aCRNM(ip).
Parameterised sCRNM
For a parameterised sCRNM M.-px the semantics [A^^Jscrnm is given by the (continuous) set of individual CTMCs representing the semantics of M-px(fi) for particular /i g VXM.
We say a CSL formula ip satisfies Ai-p wrt a given set P c VXM iff for all fi g P it holds that M-p (/x) |= tp. This is a theoretical notion that cannot be in general case decided exactly. In Chapter 5 we provide an approximate solution.
2.6 Model Analysis
In this section we describe a group of formal methods that makes a basis for advanced techniques employed for analysis of biological models. We start with elemental methods of general formal verification and finally describe methods relevant for biological models.
2.6.1 Restriction to Bounded Time
From the context of an observed phenomenon and the time-scale of relevant model behaviour, it can be possible to identify the time-horizon (or the number of steps in the untimed case) for which it is guaranteed that the phenomenon occurs. Even periodically repeating phenomena, e.g., cir-cadian clock, can be approximately detected and analysed in finite time in the order of an appropriately selected time-scale. However, a non-trivial analysis has to be performed in some cases to estimate or over-approximate correctly the time horizon.
In the case of dINMs, the model dynamics is fully untimed. We consider the asynchronous semantics of concurrent updates that is conservative -no information on possible timing (ordering) of events is lost. Cycles in the dynamics correspond to possible equilibria (or attractors). However, it cannot be decided from the model what is the time horizon or periodicity of the real dynamics corresponding to a cycle.
In the case of continuous models, simulation-based method need to deal with correct reflecting of the time horizon. The techniques based on qualitative abstraction abstract the time-aspects conservatively in the lines similar to dINMs.
57
Stochastic models are based on CTMCs where the ergodicity allows to analyse the behaviour in infinite time horizon (steady-state analysis can be used to explore the long-time behaviour in ergodic models). However, when the transient analysis is performed, the considered time-horizon might be critical and needs to be treated with special care.
2.6.2 Reachability Analysis
One of the most basic problems of analysis of dynamical systems is the reachability analysis, a general problem well-studied in graph theory. The problem is to decide whether a given state (or a given set of states) can be reached in finite time from an initial state. The problem has been addressed in formal verification of concurrent systems. In the case of discrete models, many efficient algorithms that does not require explicit generation of the state space exist taking the advantage of a particular encoding of a specific class of models (see for example [69,171, 356,112] for untimed systems or [263,125, 370] for timed systems).
dINM
In the case of dINMs, many of the general algorithms can be employed that perform either on the explicit or implicit representation of the respective Kripke structure. Especially, techniques employing symbolic representation of the state-transition relation in terms of BDDs have proved to be efficient in this case [62, 106]. However, reachability analysis of parame-terised dINMs is still challenging. Several techniques exploring static analysis to characterise parameter valuations solving a given reachability problem have been recently addressed in [248].
cBNM
In the case of cBNMs the reachability analysis is undecidable and must be targeted approximately by employing symbolic ^-decision procedures [187, 252], numeric over-approximation methods [156], or by using finite abstractions (or hybridisations) of continuous dynamics. The global abstraction methods as discussed in Section 2.4.6 can be used for reachability analysis provided that the results have to be carefully interpreted with respect to the fact the abstraction is typically an over-approximation (in such case the property of not-reaching a certain state is guaranteed through the abstraction). Approaches that directly target reachability construct local abstractions of the flow of the dynamics by employing so-called flow-pipes [25,109, 251] that over-approximate the signals or use various sophisticated forms of geometric or symbolic representation of system
58
states sets [6, 178, 313]. In some work, zonotopal representations are employed [5,198]. Local methods address the flow of the dynamics quite well. Several methods target unbounded-time analysis in some form [25, 119]. The work presented in [110] combines local and global approaches by targeting abstractions of linear thresholds-driven hybrid systems. The authors use local flow-pipe construction to construct a quotient transition system that models the reachability from one cell to another. The tool Pro-bReach [331] (based on ^-decision procedures) addresses statistical sampling of initial conditions and is suitable for the global analysis where the initial condition is uncertain.
sCRNM
sCRNMs are modelled by means of CTMCs. The transitions between the states in CTMC are governed by the transition rate matrix. It assigns a rate r to each pair of states in the CTMC, which are used as parameters of the exponential distribution, i.e., the probability of the transition being triggered within t time-units equals 1 — e~r*. The reachability analysis problem is solved by exact techniques based on so-called transient analysis of the Markov process [23, 260] or by approximate techniques based on statistical sampling [232, 27] of simulations [195], employing moment-closures [73, 22, 13], or by methods employing correct deterministic approximation of the transient distribution [71, 334].
Transient analysis is based on the computation of transient probability - having started in state X, the probability of being in state X' at time instant t. Formally, given an initial distribution 7rC'X°'° (i.e. ttc,x°,0(X) = 1 if X0 = X, and 0, otherwise) at time 0 of a CTMC C = (X, X0, Q) what will the transient state distribution 7rc'Xo'* look like in some future yet finite time t g m>0.
A standard technique for computing transient probabilities is based on uniformization. The key idea is for a given CTMC to construct the uni-formised DTMC where all exponential delays in the CTMC are normalised with respect to the fastest transition rate q. Then each step of the uni-formised DTMC corresponds to a single exponentially distributed delay with the parameter q. The ith matrix power of the uniformised DTMC gives the probability of jumping between each pair of states in the DTMC in i steps. The transient probability in time t is computed as the sum of the matrix powers weighted by Poisson probabilities giving the probability of i such steps occurring in time t.
Formally, for the rate q satisfying q > max{E(X) \ X £ X} (E(X) is the exit rate of state X as described in Section 2.3.2) the uniformised DTMC
59
unif(C) is defined as unif(C) = (X, X0, Qunif(c)) where
( Q(X,X') .f y , y,
Qu«m{x,x') = \ "^~Qtf„ ****
{ 1~2~2x"^Xq—1 otherwise.
and the ith Poisson probability in time t is given as 7i)(?.£ = e~q'1 • The transient probability can be computed as follows:
oo ube
^,x0,t = Y^,q.t ■ 7vc^° ■ (Qu"]Wy « • 7rC,Xo'° • (Qunif(c)r.
i=0 i=lbe
Although the sum is in general infinite, for a given precision e the lower and upper bounds lbe, ube can be estimated by using techniques such as of Fox and Glynn [176] which also allow for efficient solutions of the Poisson process. In order to make the computation of uniformization feasible the matrix-matrix multiplication is reduced to a vector-matrix multiplication as follows:
^CXcO . (Qunif(C)y = ^C,X0,0 . £Qunif(C)y-l) . Qumf(C)_
Standard uniformisation can be intractable when the system under study is too complex, i.e., contains more than in order of 107 states and the upper estimate ube, denoting the number of vector-matrix multiplications as iterations, is high (more than in order of 106). Therefore, many approximation techniques have been studied in order to reduce the state space and to lower the number of iterations ube. State space reductions are based on the observation that in many cases (especially in biochemical systems) a significant amount of the probability mass in a given time is localised in a manageable set of states. Thus neglecting states with insignificant probability can dramatically reduce the state space while the resulting approximation of the transient probability is still sufficient. Methods allowing efficiently state-space reduction are based on finite projection techniques [292, 220] and dynamic state space truncation [142].
Since the number of iterations ube inherently depends on the uniformisation rate q that has to be greater than the maximal exit rate of all the states of the system, a variant of standard uniformisation, so-called adaptive uniformisation [357], has been proposed. It uses a uniformisation rate that adapts depending on the set of states the system can occupy at a given time, i.e, after a particular number of reactions. In many cases, a significantly smaller rate q can be used and thus the number of iterations ube can be significantly reduced during some parts of the computation. Moreover, adaptive uniformisation can be successfully combined with reduction techniques mentioned above [142]. The downside of adaptive uniformisation is that the Poisson process has to be replaced with a general birth process which is more expensive to solve. See, e.g [357], for more details.
60
2.6.3 Model Checking
Model checking is an exhaustive technique of checking whether all signals/paths generated by a given model Ai, satisfy the inspected property described as the formula p. In order to generate all signals or paths, the whole state-space has to be stored and evaluated. Owing to this need, model checking generally suffers from the state-space explosion problem. There exist several approaches to reduce this problem, e.g., efficient symbolic representation, state-space reductions or iterative abstraction refinement. Model checking algorithms differ depending on the temporal logic employed. An extensive review of model checking methods is given in [24]. In this section we restrict ourselves only methods relevant for the thesis.
LTL Model Checking
There are several approaches to model checking a finite state Kripke structure K, against an LTL formula p [24]. The goal is to check K, |= p, in particular, whether for every so £ ) and Expc(X, X) is reduced to the computation of the transient probability distribution 7TC,x,t, see [23, 260] for more details. Thus, for different initial states I 6 I we have to compute the corresponding transient probability distributions separately. The key idea of the global model checking method is to use backward transient analysis. The result of backward transient analysis is the vector alC'A'* such that for arbitrary set of states A c X, the value alc'A'*(X) is the probability that A is reached from X at the time t. Without going into details the vector alC'A'* can be computed in a very similar way using the uniformised DTMC unif(C) as in the case of vector iTC,x,t. Only vector-matrix multiplication is replaced by matrix-transposed-vector multiplication and alc'A'°(X) = 1 if X g A, and 0, otherwise.
The vector ProbC^ is computed using backward transient analysis. Since the definition of next operator X ip does not rely on any real time aspects of CTMCs, its evaluation stems from the probability of the next event that can be easily obtained from the transition matrix Q. The evaluation of the until operator ?iUJ' p>2 depends on the form of the interval / and is separately solved for the cases of / = [0, £1] and / = [£1, £2] where £1, £2 g m>o- It is based on a modification of the uniformised infinitesimal generator matrix Qumf where certain states are made absorbing. This means that all outgoing transitions are ignored in dependence on the validity of ipi and ip2 in these states.
For any CSL formula tp, let C[p] = (X, Xq, Q[ip], L), where Q[p](X,X') = Q(X,X'), if X |= p, and 0, otherwise. The formula 4> = ipi U'°'*l p>2 can be evaluated using the vector alC'A'* in the following way:
pJobC^ = ^Ch^a^] At where x g A iff X |= p>2.
63
For the formula 0 = can be evaluated using the vector al^'* that takes a vector v instead of a set A (i.e., alc'1''0 = v) in the following way:
pTotf^ = u?h2.
The backward transient analysis can be also used in the case of reward computation. Since operator R^p[l=t] expresses the expected reward at time t, the vector Exp ' can be computed as Exp ' = al^'* where v encodes the given state reward structure.
For evaluation of the operator R^P[C-*] we have to use mixed Poisson probabilities (see, e.g., [259, 260]) in the backward transient analysis. It means that during the uniformisation the Poisson probabilities 7i)(?.£ are replaced by the mixed Poisson probabilities 7^.4 that can be computed as:
Using the given state reward structure we can compute the vector Exp ' c-' as Exp ' c-' = alC'1''* where v encodes the state reward structure and the mixed Poisson probabilities yi:q.t are used.
To review the overall method of stochastic model checking of CTMCs over CSL formulae we summarise the procedures from an abstract perspective. The evaluation of a structured formula ip proceeds by bottom-up evaluation of a set of atomic propositions, probabilistic or expected reward inequalities and their Boolean combinations. This evaluation gives us a discrete set of states that are further used in the following computation. The process continues up the formula until the root is reached. The final verdict is reported either in the form of a Boolean yes/no answer or as the actual numerical value of the probability or the expected reward.
The robust implementation of the technique is available in the widely used tool PRISM [261] and recently also in the tool STORM [137].
2.6.4 Software for Model Checking of Biological Models
The main tools that allow to apply model checking methods to biological models interface the encoding specific for biological models to an internal language of some basic model checking tool.
cBNM
One of the first tools that has supported model checking of cCRNMs is BIOCHAM [93], it is a general framework for formal analysis of biological
64
networks and it has been continuously developed until now [166]. It has introduced verification of CTL properties on qualitative models of cCRNMs employing a unique asynchronous Boolean semantics working well with the standard symbolic model checker NuSMV [111]. A feature that is original to BIOCHAM is model checking of both LTL and CTL over finite discrete approximations of signals generated by cBNMs [162,163,164].
In [290], the authors identify CTL-based patterns of formulae relevant for cINMs. The tool GNA [135] adopts NuSMV and CADP model checkers [50,52] for piece-wise affine abstractions of cINMs [162,163,164]. In [49] the authors introduce an approach employing the abstraction method described in 2.4.6. It is used with LTL model checking to address liveness properties of cINMs, the method has been incorporated in a Matlab-based tool RoVerGeNe [54]. In our work [57], the technique has been transferred to general biological systems (cBNMs) [82]. Although in both cases there is a more complex technical framework working with parameterised models, the techniques can be directly used for models with fixed parameters.
dINM
Considering dINMs, CTL model checking has been widely employed targeting analysis of gene regulatory networks [62, 282, 352] and signalling networks [325,158, 203]. The tool GINsim [106] is continuously developed to address modelling and analysis of dINMs, it passes the models to the standard symbolic model checker NuSMV [111]. The tool Antelope [17] goes beyond CTL by introducing hybrid operators. The dedicated model checker is capable of verifying properties such as presence of an attractor in the system without the previous knowledge on its location. Extensions of CTL for biological models have been also targeted in [280].
LTL model checking of dINMs has been addressed in dINMs [341,184, 342] with a prototype tool support. The tool TREMPPI [340] provides a comprehensive collection of methods based in LTL model checking of dINMs. An interface PyBoolNet targeting analysis and visualisation of dINMs has been presented in [243]. A framework aimed at addressing biologists as direct users of dINMs is provided within the BMA tool [59] including a module for LTL model checking [2]. The recent valuable work has been invested in development of a Python-based interface utilising Jupyter Notebooks for interoperability of tools working with dINMs [294]. An overview of some tools dedicated to model checking of dINMs is given in [289].
sCRNM
The general tools existing for model checking CSL properties over CTMCs can be directly used for analysis of sCRNMs. Several versions of the tool
65
PRISM [261] have been successfully applied to analysis of sCRNMs, e.g., [258,253, 255,268]. The main advantage of the tool is it incorporates a large set of different techniques ranging from exact to approximate techniques and using symbolic and explicit representations or their combinations. The tool is also integrated within higher level frameworks. In [113] the PRISM is used as a model checker for models specified in BioPEPA process algebraic modelling framework for biological models. In [320] a web service for biological models analysis integrating several tools including PRISM is presented. Another probabilistic CSL model checker addressing biological models is MARCIE [328] based on its own symbolic techniques. In future we can also expect the usage and adaptation of the STORM [137] model checker that also supports CSL model checking.
Probabilistic model checking has also been employed to the analysis of stochastic models of influence networks where an automatised translation of models into a CTMC, based on quasi-steady-state approximation (QSSA), has been proposed [274, 363]. The related tool iBioSim gives a powerful technology for analysis of stochastic models of genetic networks.
As an alternative to exact probabilistic model checking which can be computationally inefficient for large models, statistical approaches can be employed for properties making a fragment of CSL. The recently developed tool U-Check [72] implements smooth model checking based on Gaussian noise approximation and is very promising for statistical analysis of cCRNMs. The tools developed in the past are [329, 270].
In [92], the authors consider signal transduction in the RKIP-inhibited ERK pathway. They overcome the state-space explosion problem of probabilistic model checking by rescaling model component quantities to lower numbers of population levels. The main problem with statistical model checking is caused by rare events, i.e., temporal formulae whose satisfaction probability is very small. When estimating the probability of such formulae, the number of simulations needed to ensure a good estimate becomes unfeasible. In [116], the authors show that the importance sampling, a variance-reduction technique for the Monte Carlo method, and the cross-entropy method, a general Monte Carlo approach to combinatorial and continuous multi-extremal optimisation and importance sampling, can efficiently address this problem. They use Bounded Linear Temporal Logic, a variant of LTL where the temporal operators are equipped with time bounds, to reason about biochemical reactions in systems biology.
2.6.5 Simulation and Monitoring
Monitoring techniques are based on the satisfaction test of a formula over an individual simulation trace (signal). The responsibility for exhaustive coverage is delegated to the procedure that generates the traces. The key observation behind efficiency of monitoring performed over simulations is that for large and complex systems, the simulation is generally easier and
66
faster than building a concise representation of global transition systems required for the exhaustive model checking approach. However, since a single simulation generates a single trajectory out of all the possible executions of a system, usually the average values among several simulations need to be considered to achieve the necessary level of confidence in the results obtained. Owing to the fact that the individual monitoring problem targets a single path, linear temporal logics are employed (they are typically extended with quantitative operators evaluating the time-aspects of the trace such as MTL, MITL or STL as discussed in Section 2.5).
A possible way to improve the accuracy of monitoring techniques is to employ the statistical model checking [116] that addresses general stochastic systems in terms of statistical inference. It samples the behaviours (simulations) of a model, verifies their conformance with respect to a temporal formula (i.e. performs the membership test), and finally applies a statistical estimation technique to compute an approximate value for the probability that the formula is satisfied. The accuracy of statistical model checking is affected by the accuracy of stochastic simulations techniques that are employed and also by the structure of the model or more precisely by the level of details (initial conditions, parameters, etc.) we have about the system under study.
A comprehensive survey of monitoring techniques is given in [44]. The techniques are divided to two main groups: offline monitoring (working on a generated trace) or online monitoring (working in real time). In the basic setting, the method computes Boolean satisfaction of a formula (e.g., computing the qualitative semantics of STL or STL* over a signal as stated in Sections 2.5.3 and 2.5.4). Additionally, the quantitative semantics (e.g., as of Definition 2.37) can be evaluated. In such a case, we speak about robust monitoring.
On the practical side, there exist several tools targeting monitoring, one of the first tools were Temporal Rover [154] (qualitative satisfaction of LTL/MTL) and JavaPath Explorer [213] (qualitative satisfaction of temporal properties over Java byte-code executions). The tool AMT [298, 297] provides algorithms for the qualitative satisfaction of STL. Several tools have been developed regarding robust monitoring with quantitative semantics: Breach [145] (Matlab-based, piece-wise constant signals wrt STL), S-TaLiRo [14] (Matlab-based, finite timed sequences wrt MTL), and U-check [72] (standalone, monitors simulations of CTMCs wrt STL). Our own contribution implemented in the tool Parasim (standalone) is described in Section 6.2. The tool BIOCHAM [166] mentioned in Section 2.6.3 also provides offline monitoring algorithms for finite timed sequences wrt LTL and even CTL. All of these tools with the exception of U-check are directly applicable to cBNMs as they work with the differential semantics of models. U-check is directly applicable to sCRNMs. There exist several interesting applications to biological problems conducted with Breach and BIOCHAM [148, 351, 279].
67
influencing reality
parameters
real-world reconstruction mathematical
system model
^observation.
admissible parameters
observed properties
2oi™alisation]!ľ>
specified properties
required properties
Figure 2.9: A general scheme of parameter synthesis methods based on system properties formalised in a temporal logic.
It is worth noting that the monitoring procedure can be also applied to experimentally measured time-series data. Monitoring then enables methods of automatised inference of a logical specification of real-world systems dynamics from experimental data [295].
2.6.6 Parameter Synthesis
The problem of parameter synthesis is for a given parameterised model to identify parameter valuations that guarantee the system satisfies the given global property formalised in some temporal logic. The technique provides an alternative way to more traditional method of fitting model parameters to experimental data (also called parametric systems identification [314] or parameter estimation [288]). In contrast to the traditional approaches to tackle the inverse problem (e.g., [179, 183, 193, 314, 288]) parameter synthesis methods typically focus on identifying reliable subsets of parameter space instead of finding singular parameter values.
The overall settings of property-driven parameter synthesis are depicted in Figure 2.9. Hypotheses mined from biological literature as well as time-series experiments from wet-labs can be considered as temporal properties restricting the admissible set of parameter valuations.
dINM
In the case of discrete models, usage of model checking for guiding synthesis of admissible parameters has been originally targeted for CTL in [62] supported with the tool SMBioNet [237] (employing NuSMV). The approach uses the naive algorithm of separate model checking tasks run for
68
every parameter valuation of the model. Decisions over parameter space of models with synchronous semantics can be done very efficiently using SAT, SMT [192] or answer-set programming [175, 300]. The problem becomes more challenging for semantics with asynchronous updates as is considered in our case for dINMs. Parameters can be synthesised itera-tively by using a model checker with an appropriate symbolic encoding of parameters as is done in [51] by employing NuSMV.
Parameter synthesis for LTL has been first addressed in our work presented in Section 4.2 where also a detailed discussion of related work is given.
cBNM
Several techniques have been developed for parameter synthesis of continuous-time models. In the case of linear-time TL, the dominating approach is based on monitoring simulated trajectories of the system [37,86,148,312, 319]. These techniques rely on numerical solvers which are well-developed for systems with fixed parameters or small parameter spaces (perturbations). An advantage of these techniques is that they consider the function defining the systems dynamics as a black box provided that there is basically no limitation on the form of parameterisation of the system. The algorithm implemented in Breach [145] utilises robust monitoring of STL. It is based on a hybrid approach built on sensitivity analysis [126]. The parameter space is traversed with respect to sensitivity of the quantitative semantics of STL wrt changes in parameters. The tool BIOCHAM [166] supports exploration of parameters with respect to LTL formulae with constraints over reals. It brings the methods known from traditional parameter estimation to the settings based on temporal logics. Parameter space is searched with a covariance matrix evolutionary strategy that attempts to identify a gradient in quantitative formula satisfaction wrt changes in parameters.
The main drawback of the simulation-based methods is the need to sample the parameter space and initial states while losing robust guarantees for the results. This drawback can be overcome by replacing numerical solvers with Satisfiability Modulo Theories (SMT) solvers that can cope with non-linear functions and real domains up to required precision [187]. However, these techniques are limited to reachability analysis (e.g., the tool BioPsy [275]) and their extension to work with general TL specifications is a non-trivial task yet to be explored.
Techniques based on finite abstractions of cBNMs employ an exact (symbolic) representation of the maximal set of parameterisations satisfying a given TL property. These techniques are based on model checking performed directly on a qualitative finite quotient or hybridisation of systems dynamics (e.g., [49, 51, 67] and our approaches [33, 82] described
69
in Chapter 4). Several approaches encode parameter sets symbolically in terms of polytopes [49, 205]. Another solution is to encode parameterisa-tion sets by means of predicate formulae with non-linear arithmetics over real numbers and use SMT to reason on them.
In the case of sCRNMs, parameter synthesis methods are based on probabilistic model checking or variants of statistical model checking. Most statistical approaches to parameter synthesis [318,12,124] rely on approximating the maximum likelihood wrt to a given set of data. The advantage is the possibility to analyse infinite state spaces [12] (employing dynamic state space truncation with numerically computed likelihood) or even models with no prior knowledge of parameter ranges [124] (using Monte-Carlo optimisation for computing the likelihood). The state-of-the-art moment closure approaches are able to cope with multi-modal distributions occurring in multi-stable systems [12, 273, 22]. The technique implemented in the tool U-check uses so-called smoothed model checking [73] that brings an improved Bayesian statistical algorithm which performs model checking over paths simultaneously for all values of the model parameters from observations of truth values of the formula over individual paths of the model obtained for isolated parameter valuations.
Approaches based on Markov Chain Monte-Carlo sampling and Bayesian inference [200,232,247] can be extended to sample-based approximation of parameter values, but at the price of undesired inaccuracy and high computational demands [61, 27].
2.6.7 Robustness Degree
The concept of robustness addresses aspect of the functional evaluation of a dynamical system by considering a weighted average of every behaviour across a space of perturbations affecting the model parameters (hence its behaviour) and in a particular way, having a certain probability of occurrence. The general definition of robustness [242] gives us robustness degree that quantitatively characterises to what extent is the evaluated system functionality preserved under considered perturbations:
where M is the system (represented by an appropriate model), A is the function under scrutiny, P is the set of perturbations, P C VXM, ip(n) is the probability of the perturbation // e P and D^(fi) is an evaluation function stating how much the function A is preserved under a perturbation //.
sCRNM
70
Kitano proposed that the evaluation function (//), stating how much the functionality A is preserved in perturbation //, should be defined using a subspace B of all perturbations, where the system's function is completely missing and the remaining P \ B where the function's viability is somehow altered. Formally, the definition is the following:
For perturbations // £ P \ B where the system maintains its function at least partially, Kitano proposes to express the evaluation function D^(^) = iU(/-0/./U(0) relatively to the ground (unperturbed) state /^(0). This is meaningful, e.g., for naturally living systems where the ground state is measurable and is considered as an optimal performance state. Such a definition enables the comparison of some common property of different species. On the other hand, in cases when no ground state is given, the absolute value can provide more adequate measure of robustness.
On the technical side, the concept of robustness has been widely studied in the deterministic modelling framework based on ordinary differential equations (ODEs). There exist several well-established analytic techniques based on static analysis as well as dynamic numerical methods for effective robustness analysis of ODE models.
Several studies have brought formal methods into this concept. The idea is to formalise the analysed property in terms of temporal logic formulae and to employ quantitative satisfiability to measure the robustness of formula satisfaction. The first such framework has been implemented on the top of the BIOCHAM tool [162,165] employing LTL extended with constraints (predicates) over reals to express properties of timed traces of real-valued states. The approach is "property-oriented" provided that the robustness is understood as a measure indicating how much the predicates in the formula can be altered in order to affect the validity of the formula wrt the given trace. The approach introduced in [168] and implemented in the TaLiRo tool [14] brings "behaviour-oriented" robustness degree measured against MTL formulae interpreted over discrete-time and continuous-time traces and discrete state space. The quantitative semantics of MTL for a given trace (signal) £ is defined as a distance of { from the set of signals violating the formula. The approach using STL [151] extends the concept of behaviour-oriented quantitative reasoning to work with real-valued signals and to consider a perturbation of traces in time (time robustness), state space (space robustness), and the mix of both. The methods are implemented in the tool Breach [145]. All of these methods are applicable to cCRNMs, cINMs and cRINMs.
In the case of stochastic models (sCRNMs), the concept of robustness is not yet as established as in the case of ODE models. Komorowski etal. [249]
0 fi e B C P
71
introduce the notion of robustness in the context of linear noise approximation of stochastic kinetics. To the best of our knowledge, our approach discussed in Chapter 6 is the first work addressing robustness in stochastic systems from the perspective of temporal properties (CSL). Alternative techniques employ statistical approaches and simulation working with STL specifications [40, 72].
2.7 Summary
In this chapter, we have set a unifying framework targeting the model classes most frequently used in computational systems biology. The framework can be easily extended to include other models used in biology (e.g., more complicated kinetics models of INs or RINs, discrete models of CRNs, etc.) Additionally, we have introduced the background on the semantics of the models with respect to a set of selected basic temporal logics and their extensions that have been already successfully used in systems biology.
Finally, we have given a brief overview of model analysis techniques based on formal methods. The selection is focused on the analysis of models under parameter uncertainty by means of formal methods based on model checking and monitoring procedures. These methods have been described in more detail as they are discussed in next chapters from the perspective of our original methods.
72
Chapter 3
Solved Problems
In this chapter we formulate three key problems that address the analysis of models included in the framework discussed in Chapter 2. These methods target analysis of parameterised models. In particular, we revisit the following problems from the perspective of formal methods: parameter synthesis and robustness analysis. Moreover, we introduce a novel problem of so-called parameter exploration that adapts the framework of parameter synthesis to the settings of stochastic models.
3.1 Parameter Synthesis
3.1.1 Problem Formulation
In general, the problem is, for a given parameterised biological network model and a given temporal property, to find all parameter values satisfying the given property. For non-stochastic models, the problem is precised as parameter synthesis problem formally stated in the following definition. In our case we target an instance of the problem that is stated from the perspective of the basic temporal logics.
Definition 3.1 (Parameter Synthesis Problem) Let M-px be a parameterised cCBNM or dINM. Let p be a formula ofLTL or CTL.
The parameter synthesis problem for M-px is to find the maximal set P C VXM of parameter valuations such that M-px (//) |= pfor all // e P.
3.1.2 Significance
The problem is highly relevant for systems biology because the models are usually built from (bio)physical first-principles that are mathematically represented by means of parameterised kinetic functions. Parameters typically reflect thermodynamics conditions in a particular cell and are very difficult to be obtained in vivo. Parameter synthesis gives the unique opportu-
73
nity to analyse the model globally with unknown parameters. In particular, by parameter synthesis it can be decided if a given parameterised model can or cannot display behaviour described by a given temporal property. In positive case, all satisfying parameter values settings are returned. If stochastic dynamics is considered, the parameter synthesis procedure gives all parameter values that satisfy the property with a probability greater or equal than a given probability threshold.
3.1.3 Existing Solutions
The naive solution to the problem is to enumerate the individual parameter valuations of the given parameters domain by running a separate model checking task for every parameter valuation. Such approach requires the set of admissible parameter valuations to be finite. This is obviously ensured in case of dINMs [62]. However, it is worth noting that in that case the number of admissible parameter valuations can grow combinatorially with respect to the number of influences affecting a certain species (in-degree of the influence network nodes).
In the case of cBNMs and sCRNMs, the parameter domain is uncountable. Apparently, the naive approach suffers from fundamental problems in such cases. In order to achieve exact approximation of the results with formal guarantees, a suitable finite partitioning of the parameter space is inevitable.
The very first algorithm for parameter synthesis of cBNMs employing model checking over discrete abstractions has been introduced by Batt et al. in [49,368]. The algorithm in [49] (implemented in the RoVerGeNe tool) is sequential and relies on execution of two model checking procedures per each class of valuations. In the average case, the number of analysed parameter classes can be reduced by a suitable BDD representation of the parameter space. In [51] the authors target the piece-wise affine framework [135] adapted to regulatory networks. The notion of uncertainty is lifted to a higher level of abstraction provided that qualitative (symbolic) valuation is considered. That allows to efficiently employ the symbolic NuSMV model checker [112] for the parameter synthesis procedure. However, the problem of such solution is that it does not construct the satisfying parameter valuations set. In particular, for a given model with a symbolically encoded parameter perturbation it either answers true provided that the perturbation entirely satisfies the formula, or false, provided that a single counter-example is returned by the model checker. The latter case first requires to check if the counter-example is not a false-positive path introduced by the abstraction. If the false-positiveness (spuriousness) is falsified, the procedure is further iterated with a new parameter perturbation excluding the non-satisfying parameter valuation inferred from the counter-example. As the counter-examples gives just a single such param-
74
eter valuation, the approach can be very close to the naive solution.
Alternatively, parameter synthesis for cBNMs can be done approximately by using sampling of the parameter space joint with numerical simulation of the individual (approximate) signals as is done in [166,145]. Such approach does not give formal guarantees but rather a result interpreted up-to some numerical error (the finite sampling of the parameter space gives an approximate under-approximation of the synthesised parameter valuations sets as the individual signals are simulated with the numerical error of the solver, as used, e.g., in the BIOCHAM tool [166]). To avoid the exhaustive (and unnecessary) sampling of the parameter space, sensitivity analysis of quantitative semantics of a temporal formula wrt changes in model parameters offers an efficient systematic way [150] (the approach is employed in the tool Breach [145]).
The problem of parameter synthesis has been also addressed in the settings of discrete time dynamical systems that are significantly different to models considered in this thesis. An example is technique based on the Bernstein polynomial representation that targets polynomial dynamical systems [127]. It is implemented in the tool SAPO [152].
3.1.4 Our Contribution
In our work, we have targeted the parameter synthesis problem with a fully automatised procedure working on common parameterised biological models. Firstly, we have developed coloured model checking - a method that adapts parallel model checking algorithms for several temporal logics and works with dINMs (directly) and cBNMs (by utilising the discrete abstraction procedures of ODEs). The method and its mapping to the problem formulated in Definition 3.1 is described in Chapter 4. The method expects encoding of the models by means of PKS and introduces symbolic representation of parameter values that overcomes the problems of large (or infinite) parameter spaces.
3.2 Parameter Exploration 3.2.1 Problem Formulation
In the case of stochastic models, the presence of a specified property in a given (parameter-uncertain) model is evaluated quantitatively with a given probability measure. To that end, the problem of parameter synthesis is reformulated in the quantitative context provided that the goal is to compute all parameter values that satisfy the property with probability greater or equal than the given threshold. Since it is necessary in this context to quantitatively evaluate parameter valuations with respect to a given formula,
75
we speak about a more fundamental problem of parameter exploration. This problem is formally stated in the following definition.
Definition 3.2 (Parameter Exploration Problem) Let M-px be a sCRNM pa-rameterised with a parameterisation xm- Let M>o that for each parameter valuation /i e PXA1 returns the numerical value of the probability or the expected reward for the formula
and P consists of the following parts:
1. to define and approximately compute the evaluation function D^1 : P —> M. such that for any // e P, D^i(fi) evaluates quantitative semantics of ip on the signal [Mp,(/i)]CBNM,
2. to approximately compute the robustness degree of p> in Mpx wrt P defined as:
Jp
In the case of stochastic systems, the situation is more difficult. Since every behaviour is distorted with noise and randomness of relevant events occurring in time, we have to deal with the shape of probabilistic distributions of system states and its development in time (transient analysis) or its shape in the steady state (steady state analysis). In our setting, we assume the behaviour is described by means of a probabilistic temporal property. The evaluation function reflects the probability of the property being satisfied under the given perturbation. The robustness is inversely proportional to the amount of noise that causes potential violation of the formula. In contrast to the deterministic case, in this case the robustness is understood as an inherent property of the systems stochasticity.
Definition 3.4 (Probabilistic Robustness for CSL) Let Mpx be a parame-terised sCRNM. Let and P is to approximately compute the robustness degree
rd^p d= Jp D^4(fi)dfi where the evaluation function D^4 : P —> M. is defined in a way that for every /i 6 P, D^i(fi) reflects the quantitative semantics of
in a suitable manner.
3.3.2 Significance
The problem of evaluating robustness is important to obtain deeper understanding of the role of parameters on the presence of the specified behaviour [343]. Robustness degree provides a tool that significantly helps
78
to compare several models presenting the same behaviour [48]. There are many examples in systems biology literature that have used robustness analysis to get mechanistic insights into a certain phenomenon. For example, in [4, 30] the concept of robustness is used to correctly reconstruct the mechanism of bacteria chemotaxis control. The analysis conducted in [338] provides a study of the organisation of a two-component signalling pathway in E. coli from the perspective of global concentration robustness of individual components in the pathway.
The robustness degree also provides a computable measure allowing targeted optimisation of cellular processes by means of tuning of known parameters [148, 319]. This is an inevitable precursor of successful design and engineering of synthetic cellular circuits [238,132].
3.3.3 Existing Solutions
In the context of deterministic models, robustness analysis with respect to functionality specified in terms of temporal formulae has been recently introduced [168, 319]. There exist two major approaches of defining and analysing robustness. If only the parameters of the model are perturbed, we speak of a behaviour-oriented approach to robustness. This approach has been explored by Fainekos and Pappas [168], further extended by A. Donze et al. [151] and implemented in the toolbox Breach [145]. Another way to look at perturbations is from the perspective of property uncertainty. If the system is considered fixed and all parameters exactly known, the uncertainty then lies in the property of interest. For a specific property it can be explored how much would have the atomic constraints in the formula to be altered in order to affect the property's validity in the given model. This approach has been adopted by Fages et al. [319] and implemented in the tool BIOCHAM [167]. When only the parameters of the property are perturbed, it is the case of a property-oriented approach to robustness.
The work presented in [40] is based on the idea of directly adapting the concept of behaviour-oriented robustness to stochastic models. Individual simulated trajectories of the CTMC are locally analysed with respect to an STL formula. For each simulated trajectory, the so-called satisfaction degree representing the distance from being (un)satisfied is computed, thus resulting into a randomly sampled distribution of the satisfaction degree. This distribution thus gives another source of information in addition to the probability of formula satisfaction (the fraction of valid trajectories in the sampled set).
3.3.4 Our Contribution
Our contribution to robustness analysis with formal methods is two fold. Firstly, we have extended the set of techniques used to analyse robustness
79
on timed traces (signals) generated from continuous-time models (cRINMs, cINMs and cCRNMs). In particular, we have defined an extension of signal temporal logic called STL* that allows to express freezing of values referred within temporal operators. The extension is important especially to express several aspects of signals that cannot be expressed in plain STL (e.g., presence of local extremes and their mutual relationships). In consequence, we have defined quantitative semantics of STL* that fits well in the existing robustness analysis framework developed for plain STL. The algorithm for computing the quantitative semantics is implemented within the PARASIM tool. The method and its mapping to the problem formulated in Definition 3.3 is described in Chapter 6. Our result makes a fundamental step towards fully automatised analysis of complex properties over timed sequences and is universally applicable in the wide domain of (cyber-)physical and (cyber-)biological systems. It is already received well by the community [44,102, 296].
Secondly, we have introduced one of the first property-based formulations of robustness degree in the context of continuous-time stochastic processes. To that end, we have adopted the concept of parameterised uni-formisation as the underlying computational machinery and have used the landscape function as the main tool for inferring robustness of stochastic dynamics. The method and its mapping to the problem formulated in Definition 3.4 is described in Chapter 6. In the community, our result is considered as one of the fundamental approaches to the problem [373].
80
Chapter 4
Coloured Model Checking
As described in Chapter 3, to solve the parameter synthesis problem, we have developed a method called coloured model checking (CMC) that is based on enumerative model checking. In this chapter, we summarise our results regarding the development, implementation, and evaluation of the CMC method. In particular, we give definitions of the variants of the CMC problem targeting different temporal logics (LTL and CTL).
4.1 Overview
In Figure 4.1, the general scheme of the coloured model checking is shown. The method assumes the model to be encoded as a parameterised Kripke structure. The method is used to solve the parameter synthesis problem I (see Definition 3.1). It works for cBNMs and dINMs (encoding of cBNMs and dINMs in terms of PKSs is described in Chapter 2).
As a general method, coloured model checking works on any parameterised Kripke structure for which the local parameterisation problem can be decided for every pair of states. The individual variants of the method differ in the temporal logic employed. The chosen logic significantly affects the algorithmics as well as the interpretation of the results. We employ enumerative algorithms for both kinds of logics. CTL characterises the temporal properties in individual states and that way allows to capture global properties of systems dynamics. Validity of a given formula is typically evaluated in all states of the system. On the other hand, LTL focuses on properties of paths and the analysis is typically localised into a given initial state (or a set of initial states) from where all the reachable paths are explored. In the context of the parameter synthesis problem, such difference affects the interpretation of the obtained results for the original biological network model.
Since the parameter space cardinality increases exponentially with the number of unknown parameters, our primary goal is to reduce the com-
81
dabstraction^
(pwma)
r
satisfying parameter valuations
Figure 4.1: General workflow of parameter synthesis by coloured model checking.
plexity for the expected case. To this end, the method is appropriate for models satisfying the following requirements:
1. The transition-enabling parameter values sets V(s, s') can be computed efficiently from the knowledge of the model and the state s. In particular, we suppose there exists an implicit representation of these sets.
2. A small change in the value of a single parameter causes only a local change in the transition relation. This implies that sets V(s,s') represent significant portions of the parameter space (transition relations of respective non-parameterised Kripke structures are, to a large extent, similar).
Note that cBNMs and dINMs exhibit such properties. The primary goal of CMC is to provide an algorithm which in practice performs parameter synthesis on suitable models reasonably fast and which is effectively par-allelisable.
The problem of coloured LTL model checking for a given PKS and a given formula is to explore every path starting in some initial state and to find an exact set of parameter values for which the formula is satisfied on the path.
4.2 Coloured LTL Model Checking
82
Formally, the problem is formulated using the automata-based LTL model checking theory. In the standard way, the given LTL formula is translated to a Buchi automaton (a so-called never-claim) that accepts the language covering exactly the Kripke structure paths not satisfied by the formula. The intuition behind the LTL CMC is given in Figure 4.2. The notion of Kripke structure extended with the set of accepting states F is considered here. The following definition provides a formal description of the problem.
parameterized Kripke structure of the model never claim Buchi automaton
Dheck if the product accepts an empty language
normally for each parameterization separately we decide on all parameterizations at once
property is robust GF ([A]>2.5 | [B]>2.5)
return set P of parameter values
violating the property inverse of P in entire parameter space
is the maximal set of valid parameters
Figure 4.2: Intuition behind the coloured LTL model checking method.
Definition 4.1 Let K = (V, S, So, F, —>, L) be a parameterised Kripke structure over AP. Let further
2V. To detect a path cycling on a state s £ F, we compute the set of colourings Succ(s, P) which is a function Succ(s, P) : S —> 2^ assigning to each state s'gS the maximal set of parameter values P' C P such that sA+ s' for allp £ P'.
Algorithm 1 performs the computation in the manner described above. The algorithm iteratively accumulates the parameter value set P and can terminate as soon as P = V, which allows for a fast discovery of paths violating a given property.
Algorithm 1 Coloured LTL Model Checking over the product BA K, x B Require: fC x B
Ensure: p £ P iff so A* sA+ s for some so £ 5*0 and s £ F 1: P <- 0
2: compute Succ(so, P)
3: assign each state s £ S a colouring Col(s) <— Succ(so, V)(s) 4: for all s £ F, Co^(s) \ P / 0 do 5: P <- P USucc(s, Col(s) \P)(s)
Algorithm 2 computes Succ(s, P). It starts with an empty colouring and incrementally updates it. An update is a tuple (s, P), meaning that the set of parameter values P q V should be added to the colouring for the state s £ S. The set of pending updates is stored in Q. The algorithm terminates as soon as there are no more pending updates.
By Q(s) = [J{P C P | (s,P) e Q} we denote the set of parameter values that are currently scheduled for addition to Col(s). To prevent Q from containing multiple updates for the same state, we use the merge operation Q®Q' defined as Q ® Q' = {(s, P) | P = Q(s) U Q'(s) A P / 0} to update Q (line 11).
4.2.2 Interpretation of Parameter Synthesis Results
Coloured LTL MC provides a solution to the parameter synthesis problem stated in Definition 3.1 and can be thus applied to dINMs and cBNMs. In
84
Algorithm 2 Compute Succ(sq, P) over the product BA K x B
Require: fC x B; P C V; sq e 5 (so can be an arbitrary state) Ensure: Vs e 5 : Col(s) = Succ(so,-P)(s) for all s G 5 do CoZ(s) <- 0
Qnew , L, F) automaton belongs to 0(|F||5|2C(|'P|)) as discussed in detail in [33]. The linear dependence on the number of accepting states (affected by the given formula) comes from lines 4-5 in Algorithm 1. Note that the LTL formula is expected to be already translated into the Buchi automaton (LTL model checking is exponential in the size of the formula and so is LTL CMC). The quadratic dependence on the state space size comes from line 11 in Algorithm 2. The dependence on the size of the parameter space V is included in the function
86
(. It depends on the model type and complexity of the colouring maintenance (the union and intersection on lines 10-11 in Algorithm 2). In the case of cBNMs with mathematically independent parameters, these operations on parameter sets can be performed in constant time due to representation of parameter values as continuous intervals of reals. In the case of dINMs, we have considered explicit representation of the parameter sets encoded by means of bit vectors. Implementation of these operations is reduced in this case to operations on bit vectors.
From the practical point of view, we have performed several experiments by employing two independent prototype multi-core implementations working on cBNMs and dINMs respectively. In the case of cBNMs, we were able to compute models up-to 200 thsd states (7-dimensional model) with up to 3 parameters unknown (3-dimensional parameter space) in 30 minutes on a common hardware (details in [38]). In the case of dINMs, we have employed enumerative computation of the intersection/union operations by using binary encoding of the parameter values. This has allowed us to compute large parameter spaces having up-to 100 thousand billions of parameter values on models with thousands of states in several seconds [244]. The low numbers of states are typical for this kind of models.
4.2.4 Publications Summary
First results introducing this method have been published in [38] primarily focused on cBNMs. The paper also provides complexity analysis of the problem and discusses a mechanism for reducing the extent of over-approximation by means of fairness. Consequently, in [33] the method has been generalised for both cBNMs and dINMs and applied to several biological case studies. Please note that for purpose of this thesis we have simplified and changed some of the notation.
The work has been realised in collaboration with my colleagues Luboš Brim, Jiří Barnat with whom I have developed the method. Bringing the method on a paper and in a code has been realised with a significant help of master students Martin Vejnár, Tomáš Vejpustek, Adam Streck and Adam Krejčí. My contribution is in setting up the problem for biological networks, guiding and partially realising the first formalisation of the method including the formal analysis of its correctness. In the case study part, I have made a selection of models useful for fine-tuning and demonstrating of the method to systems biologists. Finally, I have built up a scenario followed in experimentation procedures realised with all these models.
The work on LTL CMC for dINMs has been realised with the help of my master's students Adam Streck and Juraj Kolčák. The first implementation has made a part of the Esther web-based tool [341]. Based on this result, we have started a collaboration with Hannes Klarner and Heike Siebert (FU Berlin) resulting with a paper [244] that gives a new view on using
87
CMC for dINMs by means of introducing quantitative measures allowing to rank the synthesised parameter values concerning their fitness with a given set of experimental data. My contribution to that work was in taking a significant part in designing and setting up the problem and adapting the LTL CMC for the new problem.
4.3 Coloured CTL Model Checking
The problem of coloured CTL model checking for a given PKS and a given formula is in our case to obtain a table where every system state is associated with an exact set of parameter values for which the formula is satisfied. Formally, such table is formalised using a function t, formally defined in the following definition.
Definition 4.2 Let k. = (v, S, Sq, —L) be a parameterised Kripke structure over AP. Let further compute the sets ColSat(ip) = {(p, s) x S | s |=jc(
for maximal genuine ip' G cl(tjj)
return {(p, s) G V x S | (p, s) G ColSat(ip)}
State Colouring
Similarly to the LTL case, the notion of the colouring is employed to represent the assignment of parameter values to states. In contrast to the LTL case where the LTL formula is encoded implicitly in the product automaton, here the Colouring is defined as a function over formulae. One of the
88
reasons is that the labelling algorithm computes satisfaction for all subfor-mulae of the given formula. To that end, for a given CTL formula ip the notion of Colouring is lifted to a relation called tp-satisfying colouring, denoted ColSat(ip), defined as ColSat(ip) = {(p, s) e V x S \ s \=kp V?}- Intuitively, it assigns the formula with a set of pairs containing a parameter value p and a state s such that the formula is satisfied in s under the parameter value p. The algorithm for computing the function T operates recursively on the structure of ip starting from atomic propositions and iteratively computing ColSat(ip). The computation is done in standard way using the labelling algorithm [117]. Its basic idea is described in Algorithm 3. The computation of ColSat(ip) from the results obtained for subformulae of ip (having the size smaller by one wrt ip) reflects the particular temporal operator applied to the subformulae. Since here we focus on parallel algorithms that require further extensions to the standard labelling-based model checking algorithm, description of the labelling procedure extended to the parame-terised settings is included as a part of the following steps.
Parallelisation with Assumption-Based Semantics
To parallelise the procedure, we adapt the assumption-based distributed CTL model checking algorithm [91]. By following that approach, the algorithm is run on a cluster of n computational nodes (workstations). Each workstation owns a part of the original PKS as defined by a partition function. This part is extended with the so-called border states. Intuitively, border states are states that in fact belong to another computational node and represent the missing parts of the state space. They serve as a proxy between two parts.
More precisely, the fundamental notion is a PKS fragment JCi which makes a substructure of the PKS K, satisfying the property that every state in Ki has either no successor in JCi or it has exactly the same successors as in K. The states without any successors in K% are called the border states of K-i. A partition of the PKS K. is a finite set of PKS fragments K.±,..., K.n such that every state of K, is present in exactly one JCi as a non-border state; it may be present in several other JCj as a border state. In fact, every border state is stored several times: as original one on the node that owns it and as duplicates on nodes that own its predecessors.
The information in border states is considered with the notion of the truth under assumption. For a given formula ip, this is captured by the so-called assumption function, A :V x S x cl(ip) —> Bool that for a given parameter value, state and a subformula of ip returns the assumption on satisfaction of the subformula in the given state and under the given parameter value. An important fact is that the value returned by A can be considered undefined. The intuition is the following: A(p, s, ip) = tt if we can assume that ip holds in the state s under parameter value p, A(p, s, ip) = f f if we
89
Algorithm 4 Main Idea of the Parallel Algorithm
Require: parameterised KS JC, CTL formula ip, partitioning function Ensure: T
Partition JC into JC\,..., JCn for all JCi where i e {1,..., n} do in parallel Take the initial assumption function repeat
Update the assumption function using the node algorithm; Exchange relevant information with other nodes; Modify assumption function; until all processes reach fixpoint
can assume that tp does not hold in the state s under parameter value p, and A(p, s, ip) =_L if we cannot assume anything.
The coloured model checking problem is then revisited with the assumption function in the following way:
ColSat(tp) = {(p, s) eV x S \ A(p, s, tp)= tt}
In general, computation of the assumption function adopts the labelling algorithm. It starts from setting its values for all states to _L and updating the results iteratively in individual fragments. Results depending on the bordering states are completed once all the necessary information is collected. Finally, the assumption function is completely defined and must return either true or false for every state.
In more detail, the main idea of the entire parallel computation, summarised in Algorithm 4, is the following. Each fragment JCi is managed by a separate process (node) Proci. These processes are running in parallel (simultaneously on each node). Each process Proci initialises the assumption function Ai to the undefined assumption function A±. After initialisation, it computes the new assumption function from the initial assumption function using the so-called node algorithms described below. The node algorithms extend the standard labelling algorithms defined for individual temporal operators to the assumption-based and parameterised setting.
Once the algorithm has finished computing the assumptions, the node exchanges information about border states with other nodes. It sends to each other node the information it has about that node's border states and receives similar information from other nodes. After this exchange is completed, the computation is restarted. These steps are repeated until the whole network reaches a fixpoint, i.e., until no update is received by any node.
90
Assumption Function Computation
The main operation of the parallel algorithm is the iterative computation of the assumption functions starting from the simplest subformulae of (the atomic propositions) and moving towards tp by structural induction. The algorithm takes into account the assumptions of border states, initially set to _L. Since the algorithm traverses the formula by means of structural induction, the central role in the algorithm play the so-called node algorithms that compute the assumption function for a subformula of a particular shape (determined by a specific temporal operator). Each of these algorithms assumes that all possible assumptions for all subformulae have been already computed (given the current assumptions on border states).
Algorithm 5 Compute explicit assumptions for EXtp
Require: PKS fragment JCi, CTL formula
) = tt}}
4: for (s, P) in init do
5: for (s', P') such that P' = P n V(s', s) / 0 do 6: for all psP' do
7: A(p,s',
.
Algorithm 6 Compute symbolic assumptions for EX-0
Require: PKS fragment JCi, CTL formula tp = EXip, initial assumptions Am Ensure: new assumptions A
1: A : = A%n
2: set A(s, ip) := (f f, tt) for all non-border states s
3: init := {(s, $t, $/) | Ain{s, rp) = ($t, $f)}
4: for (s, <3?t, $f) in init do
5: for s' £ Si such that $S/)S is satisfiable do
6: :=lV,^)V($tA$sV) 7: A*(s',
,s))
In the case of SMT-based parameters encoding, the assumption function is represented symbolically. The symbolic assumption function A is a function that assigns to each pair (s,tp) a pair of SMT formulae (<3?t, <3?/) such that for all p G V: p |= <3?t iff s,tp) = tt and p |= $j iff A(p,s,tp) = ff. Each such function thus divides the set of all parameter values into three sets: those parameters that ensure the satisfaction of ip (<3?t), those that ensure that ip is not satisfied (3>/), and finally those parameter values under which the satisfaction of tp is undefined (-i<3?t A -i<3?/). Algorithm 6 shows how the symbolic assumption function is updated in the case of EX^ formula. The algorithm deals with the two parts (true and false) of the symbolic assumption function separately by using the notation (At(s, ip),Af(s, ip)) = A(s, ip). To that end, the lines 2 and 3 are updated accordingly with respect to the explicit case. Lines 6-7 show the advantage of symbolic encoding. The formula 3>S/)S encodes the set of parameter values V(s', s) enabling the transition from s' to s. Finally, union and intersection operations on parameter sets are replaced with disjunction and conjuction, respectively. Instead of enumerating the parameter value sets the computation is reduced to manipulating with the formulae and performing satisfiability checks. All these steps are delegated to calls of an SMT solver.
It remains to note that node algorithms for other temporal operators are constructed in similar way with the exception of the operator AU where the computation can end up with the assumption function left undefined
92
for some states as discussed in the case of distributed labelling algorithm in [91]. Therefore additional computation is needed in that case.
4.3.2 Interpretation of Parameter Synthesis Results
The employment of the labelling approach implies exhaustiveness in two aspects. First, the satisfying parameters are synthesised for all subformu-lae of the given formula. Second, the resulting parameter values reflecting all subformulae are obtained for all states of the Kripke structure. In particular, for a given PKS and a given formula, the method gives for every subformula a table where every state is associated with a set of satisfying parameter values.
It is important to discuss how the CMC results can be interpreted when applied for parameter synthesis of dINMs and cBNMs. In the former case, there is no limitation and the results in the form of a satisfying/unsatisfying parameter values sets for every state and formula are directly interpretable. However, in the case of cBNMs the approximation and abstraction steps affect the interpretation. For same reasons as in the LTL case, the effect of approximation of cBNMs by means of a PWMA system cannot be characterised. It is therefore necessary to evaluate the results at the level of PWMA system. The over-approximating abstraction affects CTL model checking results in the following way:
• For a formula in the universal fragment of CTL (ACTL), the abstraction causes satisfying parameter values synthesised by model checking to be under-approximated wrt the entire set of parameter values for which the formula is exactly valid in the PWMA system. Unsatisfying parameter values are over-approximated.
• For the existential fragment (ECTL), we obtain over-approximation of the exact set of satisfying parameter values. Unsatisfying parameter values are under-approximated in this case.
4.3.3 Performance Evaluation
On the theoretical side, the worst case time complexity of the overall CTL CMC algorithm for a given formula
and the size of K. However, the size of the PKS is significantly extended wrt the size of unparameterised KS. In particular, the added aspect is the size of the parameter space V. Regarding its role in the most of the node algorithms, this is critical in the explicit assumption function case (e.g., the intersection at line 5 of Algorithm 5). In the case of cBNMs where the parameters are represented as continuous closed intervals of reals, the intersection of two such intervals can be done in constant time as well as
93
their union. This cannot be done easily in case of dINMs where the parameters do not represent numbers but different settings of the regulatory logic. In the worst case, the parameter space size can affect the time complexity quadratically. In case of symbolic encoding, the enumeration of V is avoided but replaced with calls to the SMT solver. In the worst case, the overall computation depends on the number of SMT solver calls linearly.
On the practical side, we have performed several experiments on different parallel platforms to evaluate scalability of the algorithms. The prototype implementation has been provided for cBNMs, we have not yet explored the performance of CTL CMC on dINMs. The results we have obtained show that both multi-core and distributed implementation scale with the number of states and the size of the parameter space almost linearly. In general, despite the compactness of the parameter space in PKS representations, the challenging issue is the size of the parameter space that can be very large in practise. On a common hardware, considering the explicit assumptions, we were able to tackle cBNMs up-to three millions states (6-dimensional models) with up-to 6 unknown parameters in times less than 15 minutes (see [82]). Considering the symbolic assumptions, the performance has been reduced due to a large number of SMT solver calls (running in parallel). In particular, we have used Microsoft Z3 with common settings (see [56]). The profiling has shown the majority of computation is performed in SMT solver calls.
4.3.4 Publications Summary
Results overviewed in this section have been first published in [82] where the algorithm with the explicit assumption function is introduced and analysed. The symbolic assumption function and the respective algorithm is published in [56]. A set of biological case studies providing a deeper view into performance of CTL CMC using both assumption functions is provided in [139]. Please note that for the purpose of this thesis we have simplified and changed some of the notation.
The presented work is a joint work with my colleagues Nikola Benes and Lubos Brim. The implementation of the work, the performance experiments and biological case studies have been realised with significant help of students Samuel Pastva and Martin Demko. My contribution has been in revising the problem of CMC in the context of cBNMs and CTL and guiding the work on several steps leading to a functional prototype of the method. I have also contributed in setting up the experiments and interpreting the achieved results.
94
4.4 Software Tools
Several prototype software tools implementing coloured model checking have been developed. They focus on efficient computation on common models and support scalable execution in terms of parallel implementation.
To address the LTL-based parameter synthesis for discrete abstractions of cBNMs (Section 4.2), the prototype tool PEPMC [38] has been developed. It is implemented to support multi-core parallelisation of Algorithm 1. The tool has been merged with BioDiVinE [35] (a previously developed tool supporting distributed LTL model checking of finite abstractions of cB-NMs, implemented at the top of the DiVinE model checker [36]). The resulting NewBioDiVinE1 supports the procedures of plain model checking and parameter synthesis wrt LTL specifications. The parameter synthesis module utilises multi-core parallelisation whereas the plain model checking module employs distributed state space exploration. The implementation is entirely done in C++.
CTL-based parameter synthesis supporting discrete abstractions of cB-NMs has been first addressed by the prototype tool BioDiVinECTL2 implemented in Java and supporting the distributed algorithm as described in Section 4.3. This version supports the interval-based encoding of parameter sets and is therefore limited to cBNMs with independent parameters. The tool has been further extended with hybrid CTL enhanced with action-labels operators and utilising the SMT-based encoding allowing to analyse models with interdependent parameters. The resulting stable release called Pithya3 embedded with a web GUI4 has been approved by the artefact evaluation procedure of the CAV conference [57]. The tool is publicly available as an online service5.
We have also targeted the parameter synthesis procedure for dINMs wrt LTL. The prototype tool Parsybone6 has been developed to provide an efficient implementation of the algorithms presented in Section 4.2 working directly on parameterised dINMs. Parsybone is entirely developed in C++. It has been embedded within a web GUI Esther [341]. The implementation has been further improved in collaboration with Adams Streck and the group of Heike Siebert at Free University Berlin. The resulting web service TREMPPI7 is publicly available.
^ttps: / / github.com/sybila/NewBioDiVinE 2https: / / github.com/sybila/biodivineCTL 3https: / / github.com/sybila/pithya-core 4https: / / github.com/sybila/pithya-gui shttps://pithya.ics.muni.cz 6https: / / github.com/sybila/Parsybone 7http:/ / tremppi.fi.muni.cz
95
4.5 Applications
The coloured model checking technique has been successfully applied to several biological case studies ranging from small models describing basic biological mechanisms (enzyme kinetics, cell-cycle control, A-phage) to large-scale models providing analysis of several complex biological systems.
In [33] a cBNM model of ammonium transport system in E. coli is analysed using the LTL-based approach. In Chapter 7 we describe a couple of case studies performed with CMC techniques and published in [139] -synthesis of a synthetic TCP-degradation pathway in E. coli (Section 7.2) and analysis of the well-known bistable switch present in a cell cycle of mammalian cells (Section 7.3.3). Additionally, in [210] we have provided a preliminary analysis of signalling pathways of FGFR3 fibro-blast grow-factor in rat cells that also employs CMC. This case study is overviewed in Section 7.4.
4.6 Discussion
To compare the approach presented in Sections 4.2 and 4.3 with the closest previously existing work, it is worth noting that the algorithm in [49] is sequential and relies on execution of two model checking procedures per each class of valuations. In the average case, the number of analysed parameter classes can be reduced by a suitable BDD representation of the parameter space. On the contrary, our approach is supposed to be based on a principally different idea. Enumerative LTL model checking procedure (reduced to detection of accepting cycles in the model-property product automaton) is employed directly on a graph that compactly represents the dynamics of all valuations. The computational effort can be significantly reduced in the average case due to small variations in subgraphs corresponding to different valuations. Moreover, we take the advantage of the choice of enumerative model checking and provide a parallel algorithm that accelerates the computation with the increasing number of CPUs.
The parameter synthesis approaches based on finite abstraction have been studied also in the field of hybrid systems. Owing to the non-linear character, these methods (including the abstraction procedure considered in this thesis) are usually applicable only to certain subclasses of biological models. The approach of [146] supports arbitrary non-linear ODE models. It is based on a hybrid approach built on sensitivity analysis. Approaches [218, 177, 368] inherently rely on reachability analysis in hybrid systems and are sufficiently applicable only to piece-wise affine or multi-affine systems. These algorithms technically rely on computing reachability for parameterised systems by means of polyhedral operations. This re-
96
stricts the parameter synthesis procedure to safety properties only. Live-ness properties, e.g. oscillations, for which any contradicting counterexample is an infinite path requiring monotonous time progress, can be considered only with a specific care [49]. This problem is also related to our method for cBNMs, in particular, the extent of over-approximation of the possibility of non-exiting a rectangle in finite time as discussed in Section 2.4.6 significantly affects the practical application of the rectangular abstraction to liveness analysis of non-linear systems. An improvement targeting this problem is proposed in [67] by lifting the rectangular abstraction to the framework of multi-affine hybrid automata. Besides a general increase in precision, this extension enables the universal handling of time-dependent properties. However, the price is the higher time complexity of the parameter synthesis algorithm and worse scalability.
Another approach to parameter synthesis for cBNMs is given in [269] where ^-complete decision procedures [186] for first-order logic (FOL) formulae are employed to overcome undecidability issues. In this setting, a FOL formula describes the states reachable with a finite number of steps. The parameter identification problem is reduced to finding a satisfying valuation of the parameters for this formula. The approach requires enumerating all the discrete paths of a particular length, which leads to performance degradation for large models.
4.7 Future Work
In the case of cBNMs the critical place for improvement is the approximation/ abstraction procedure. First, an important aspect is to have a tool that will allow to characterise or predict the distance between a signal generated by a cBNM and a signal generated by its PWMA approximation. It will enable to take the approximation method closer to the concept of numerical error known from predictor-corrector algorithms used for numerical simulation of ODEs. Second, the abstraction makes a significant bottleneck of the entire procedure due to the large extent of over-approximation introduced by the rectangular discretisation. Some of the methods discussed in Section 2.4.6 partially solve the problem. However, they are not yet prepared to be employed for models with uncertain parameters.
The technique of CTL CMC has not been yet brought to the framework of dINMs. The reason for that is mainly that symbolic model checking works well with those kinds of models including parameterised versions. However, as shown in [17], it is definitely useful to bring hybrid extensions of CTL to the framework of parameterised dINMs. The reason for that is the necessity of abstract reasoning about various attractors in systems dynamics that might coexist in the system under certain settings of parameters. The case study in Section 7.3 makes an important example of
97
such a system. To that purpose, it seems very relevant to adapt the techniques we have introduced in [58] (bifurcation analysis) and [31] (detection and counting of terminal strongly connected components in parameterised Kripke structures).
98
Chapter 5
Parameterised Unif ormisation
To solve the parameter exploration problem for stochastic models described in Chapter 3, we have developed a method called parameterised uni-formisation. The method is built upon the probabilistic model checking of quantitative properties of the continuous-time systems dynamics occurring in finite time horizon - the so-called transient analysis. The fundamental uniformisation technique is extended to the parameterised setting and works on properties belonging to an appropriate (bounded time) fragment of CSL with rewards.
5.1 Overview
The general workflow of parameter exploration of a sCRNM employing the parameterised uniformisation procedure is depicted in Figure 5.1. We assume that for a given sCRNM M. parameterised by a parameterisation \M within a perturbation space P C VXM, the semantics of M is represented by a parameterised CTMC C = {CM | fi £ P}. As declared in Chapter 3, the problem of parameter exploration is as follows: for each state X £ X compute the landscape function A^-' : P —> M>o that for each parameter valuation fi £ P returns the numerical value of the probability or the expected reward for the formula p. It means that we consider "quantitative" formulae in the form p ::= P=?[?] | R=?[C-*] | R=?[l=t], i.e., the topmost operator of the formula p returns a quantitative result, as used, e.g., in PRISM [261]. In general, the formula p can contain nested probabilistic and reward operators whose evaluations define discrete sets of states further used in the computation of the resulting numerical value. In consequence, the corresponding landscape function is not in general continuous but only piece-wise continuous. An example of a simple landscape function describing a bimodal transient distribution in a given time varying with a parameter is shown in Figure 5.2. The distribution describes a snapshot of behaviour of the model described in Example 2.4. The bimodal distribution is typical for
99
stochastic temporal formulae
(sCRNM)
CP^ameteriled-X -[parameterised CTMCl uniformisation^^ V i
landscape function
Figure 5.1: General workflow of parameter exploration by parameterised uniformisation.
reaction networks with positive feedback. The model is parameterised in such a way that k\ denotes the parameter vsrn(qi)- The corresponding formula CSL that exactly specifies the states distribution in time t is P=?[F-*].
To solve the parameter exploration problem we extend global quantitative model checking techniques described in Section 2.6.3. The most crucial part of the problem is given by the fact that the perturbation space P is continuous and thus the family C is infinite. Especially, to compute the problem we cannot run the global quantitative model checking techniques for each CTMC C^eC individually.
Our solution to the problem is based on a technique called min-max approximation. The key idea is to approximate the landscape function A^'P using a lower bound min^9 = mm{A^-'P(/i) | fi £ P} and an upper bound moi|'P = mai{Aj'P(/i) | /x £ P}. Since the computation of the exact bounds is computationally infeasible, we further approximate these bounds, i.e., we compute approximations min^f and max^ such that mirijf < min^f and max^f > max^f. Although the proposed min-max approximation provides the lower and upper bounds of the landscape function, it introduces a numerical error with respect to parameter exploration, i.e., such approximation can be insufficient for the inspected perturbation space P and the given formula ?. Formally, the inaccuracy for a state X is given as the difference max^ — min^f. In Figure 5.3 there are shown two approximation variants of a landscape function from the example depicted in Figure 5.2. It can be seen that the approximation error can significantly affect the result.
An advantage of the min-max approximation is that it allows us to it-
100
Figure 5.2: Example of a landscape function describing a snapshot of transient distribution of states in a parameterised CTMC varying with a particular perturbation of a single parameter kl.
3.0
■ Upper bound
■ Lower bound
3.0
.a
800 1000
200
400
States
600
800
1000
200"
400
States
600
Figure 5.3: Min-max approximation of states distribution at a certain time of transient analysis of a parameterised CTMC varying with a particular perturbation of a single parameter kl. The two presented cases differ in absolute error bound: (left) 0.01, (right) 0.001.
eratively reduce the error to a required bound. The key idea is based on iterative parameter space decomposition where the perturbation space P is divided into subspaces that are processed independently. The result of such computation is an approximation of the lower bound minx 1 and the upper bound maxx,Fl for each subspace P«. Such decomposition provides more precise approximation of the landscape function XXF and enables to reach the required accuracy bound.
To effectively compute the min-max approximation for the given formula the method called parameterised uniformization is developed. In general, it allows to efficiently approximate the transient probabilities for the parameterised CTMCs. The key idea is a modification of standard uniformization [201] so that an approximation of the minimal and maximal transient probability with respect to the family C can be computed.
101
5.2 Method Principles
5.2.1 Local Transient Analysis
The parameterised uniformisation is a modification of the standard uni-formisation [260], a widely used technique for transient analysis of CTMCs. For the given parameterised CTMCs C, the initial state Xq £ X and time t £ M>0/ the parameterised uniformisation returns vectors t^'^0'* and
C X t
ir±' °' , such that for each state X' £ X the following holds:
7r^Xo'\X') > max{^c^x^(X') | Ca £ C} A7I-C,x0,ip^ < mmj/^A^I') | CM £ C}
where 7rc'"Xo'* denotes the transient state distribution of CTMC Ca in the time t. The key idea is to compute for each state X' the local maximum (resp. minimum) of Tic^x,t(X') over all Ca £ C with respect to the current computation step of -kc^^x^. It means that only the maximal (resp. minimal) values of predecessors of X' from the preceding step are considered. To obtain the local maximum (resp. minimum) of irc>1,x,t{X') we define a function returning for the parameter valuation fi £ P the difference of probability mass inflow and outflow to/from state s.
Remark 5.1 In [89] we have shown that if all reactions are described by mass action kinetics the function is monotonous with respect to any single perturbed stochastic rate parameter included in the parameterisation xm- This allows us to efficiently identify fi £ P that maximises (minimises) the value nc^,x,t(X') and
C X t C X t
thus to obtain the vectors 7rT' °' and it±' °'.
5.2.2 Global Transient Analysis
The aforementioned parameterised uniformisation can be employed also for backward transient analysis that is used for the global model checking procedure discussed in Section 2.6.3. For the given set of states A C X and time t we can efficiently compute the vectors alt' '* and al_|_' '* such that for each state X £ X the following holds:
^A*(I) > maxi^^iX) | Ca £ C} A iL^'A'*(X) < mm{/"'A-t(I) | Cn £ C}
where iLC^'A'*(X) denotes the probability that the set A is reached from X at time t in the CTMC Cu.
5.2.3 Model Checking
The min-max approximation employs the results of the parameterised uni-iormisation (i.e., the vectors ttt , tt± alt' and n± ) to approxi-
102
mate the largest set of states satisfying p>, and the smallest set of states satisfying tp with respect to the perturbation space P. It computes the approximation SatJ,((p) and Sat^((p) such that
SatKv) 5 (J Satc^) A Sat&ip) C f| SatCll(
) iff X satisfies the formula ? in CTMC C^. To obtain such approximations we extended the standard satisfaction relation for CSL logic [89]. For more details about the min-max approximation see [89].
5.2.4 Numerical Errors of Parameterised Uniformisation
It is worth noting that the parameterised uniformization for the parameterised CTMC C in general does not correspond to standard uniformization for any individual CTMC CM £ C. The reason is that we consider a behaviour of a parameterised CTMC that has no equivalent counterpart in any particular C^. First, the parameters (minimising/maximising the inspected value) are determined locally and thus independently for each state. Second, the parameters are determined independently for each computational step.
In a given state X, the inaccuracy of the min-max approximation related to the computation of parametrised uniformization, called unification error, denoted as uniErr^' , and defined:
unibrr^- = [maxx — maxx ) + {minx — minx )
Apart from the unification error the min-max approximation introduces an inaccuracy related to approximation of the landscape function XXF, called approximation error, denoted as appErr^'p, and defined:
appErr^ = maxx — minx
Finally, the overall error of the min-max approximation, denoted as Err^'P, is defined as a sum of both errors:
r-
,>,<,>}) in the top-most operator of ip is ignored (i.e., it is treated as =?). Given these settings the evaluation function can be set in several different ways:
0 (i e B C P
D%{li) = { Evaf^ ^P\B A ~G {>, >} (6.1b)
Eval(C,j,,ip)
/ieP\B A ~e{<,<}
r.r, n f 0 /i e B c P x D*M = \ Eval(C^) ,£P\B (6-lG)
0 fj, e B C P
\Eval(Cfj,,ip) - X\2 /xGP\B
D$(») = { .„ " . v|2 (6.1d)
where X = agr{Eval(C^,ip) \ G P}, agr G {min,max,avg} and B is a subspace of all perturbations, where the system's function is completely
117
missing. The degree of robustness, denoted as rd^P, is defined as the integral of the evaluation function over the perturbation space P:
JP
Definitions (6.1a) and (6.1b) are possible for specifications where the topmost operator of the formula tp includes the threshold r. In the first definition (6.1a) the evaluation function D^{p) returns a qualitative result, therefore the robustness degree rd^P specifies the measure of all perturbations in P for which the property holds in a strictly Boolean sense - it is the fraction of P where the property is valid. This definition can be used, e.g., with the CSL property
= P>o.8[F'°'5'(X > 300)], which specifies that in 80% of cases the population X increases above 300 within 5 seconds. For this property and a model with a parameter k G [0,10] the robustness gives us the fraction of the parametric interval [0,10] for which the model satisfies ip.
In the second definition (6.1b) the evaluation function D^{p) returns the quantitative value that is relative to threshold r. Therefore, robustness can be interpreted as the average relative validity of the property over P. If r corresponds to the validity of ip in conditions considered natural for the inspected parameterised CTMC C (i.e., to the unperturbed state) then this interpretation complies with the original definition of Kitano. Let us consider the same property ipA arid the same parametric space k £ [0,10]. If for all values of k the model has a 60% change that its behaviour will lead to a population of X larger than 300 within 5 seconds than its robustness is 0.6/0.8 = 0.75. If the probability is different in each k then robustness gives us the average value with which our expectations will be met.
The third definition (6.1c) is possible for specifications using the quantitative semantics of formula tp. Here robustness gives the mean validity over all P, regardless of any probability threshold r. This interpretation is convenient when there are no a priori assumptions about the system's expected behaviour.
Finally, to express the fact that the system behaviour remains the same (with respect to the evaluation function) across the space of perturbations, we introduce the fourth definition (6.Id). It uses an aggregation function to compute a mean value and expresses the variance from the mean. This definition enables us to compare models which have the same numerical values of robustness in the sense of definition (6.1c) but which achieve the average value with very different landscapes of evaluation function.
While the last three definitions require precise computation of the probability value in every p G P, the first definition is amenable to approximate solutions. In this case it suffices to ensure that the probability is larger or smaller than r. In many cases it can be achieved without computing the
118
precise value and thus statistical model checking techniques can be used efficiently.
Robustness Analysis Procedure
Having the definition of the evaluation function D^ we can describe an effective method for computation of the robustness degree rd^P. Let us first consider the case where the perturbation space P does not contain different initial states.
The evaluation of D^(p) includes the computation of Eval(C^, p), i.e., the solution of standard CSL model checking problem. Since the problem can be rather complex even for a single perturbation point p, an explicit computation of the integral over the whole space of perturbations is infea-sible. Therefore, we consider an approximation of the evaluation function using the upper bound D^p T and the lower bound D^p ± with respect to P defined as:
^CP,T>™^{^)I^P} ,fi9v
D^±,P,T - 2^ ipl ^¥>,P«,T raftP,l - 2^ ipl ^¥>,P«,-L
i=l 11 i=l 1 1
rdgp - \ (rdgPiT + rd£Pi±) ± Err£p Err£p = i (rd£PiT - rd£Pi±)
As we can see, the key step in our approach is computation of the values T anc^ _l ^or *ne given CTMC C, the formula p and the perturbation space P. In particular, the method of parameterised uniformisation described in Chapter 5 is adapted to effectively approximate the evaluation function D^(p). Intuitively, for a formula p the algorithm computes the upper and lower bounds of the function Eval(C^, p) with respect to all perturbation points p £ P. In consequence, these bounds are used to obtain the values D^F T and D^F ± such that Equation 6.2 is satisfied.
6.3.2 Applications
We have conducted two extensive case studies that have proved the method gives valuable insights into biological systems modelled in terms of sCRNMs. The results are published in [359].
119
First, the method has been applied to a known model of gene regulation of the mammalian cell cycle [345] that explains the transition from Gi-phase to S-phase. In particular, at this point, an irreversible decision is made causing the cell either proceeding with the cell division or ceasing the division. The problem incorporates a so-called biological switch that is in computational systems biology analysed by methods of bifurcation analysis. Since the system is based on interactions among proteins and genes, stochasticity plays a crucial role and therefore it is important to get insights into the robustness of the switch under randomness of individual molecular interactions. After rewriting the original deterministic model (a cINM) into a sCRNM we employed probabilistic robustness analysis of the model. The results have brought new insights into robust stabilisation of the switch in either of the two decisions.
Second, we have applied our technique to provide comparison of two different models (sCRNMs) of a two-component signalling pathway in Es-cherichia Coli with respect to robustness of a signal transfer under randomness of involved molecular interactions and fluctuations of molecule numbers. The analysis gives insights into how robustness of signal transfer can be increased by synthetically modifying the respective reaction network. The systems have been previously analysed in [338] analytically by means of deterministic models confirming higher robustness in a synthetically modified pathway. Our analysis has brought new insights that could not be obtained with the deterministic framework. In particular, we have shown that in the stochastic setting the synthetic pathway gives higher robustness of signal transfer only for low signals whereas in case of strong signals the natural pathway performs significantly better. Thus case study is overviewed in Section 7.5.
6.3.3 Performance Evaluation
The time complexity of our framework in practice depends mainly on the size of the state space, the number of reaction steps that have to be considered, and the number of perturbation sets that have to be analysed to provide the desired precision. The size of the state space is given by the number of species and their populations. The framework is suitable for low populations and is relevant especially in the case of gene regulation. In the first case study we have considered only a single molecule of DNA and thus the state space of resulting CTMC was manageable. In the second case study, we have abstracted the feedback loop mechanism using a sigmoid production function to reduce the state space and to make the analysis feasible. If such an abstraction cannot be used, our framework can be effectively combined with general state space reduction methods for CTMCs, e.g., finite projection techniques [292, 220], dynamic state space truncation [142], and aggregation methods [371]. The number of reaction steps
120
can be reduced using separation of fast and slow reactions as demonstrated in the second case study or using adaptive uniformisation [357,142].
In the first case study, several hundreds of perturbation subsets had to be analysed and the overall robustness analysis took a few hours. However, in the second case study several thousands of perturbation subsets where required to achieve reasonable precision making the computation demanding.
6.3.4 Publications Summary
The work mentioned in this chapter has been entirely published in the paper [359]. The paper includes a supplementary material including formal details related to the method. The main text of the paper is primarily devoted to the biological case studies as they are of main interest to the community. Additionally, the technical aspects of the method are comprehensively provided in the technical report [83].
This research has been done in collaboration with Luboš Brim, Milan Češka, and Sven Dražan. My contribution was in setting the notion of robustness for the framework of stochastic models and temporal formulae. I have contributed to development of the method and formalisation of the elementary steps. Significant part of my contribution is in the case studies where I have guided the procedure and helped with interpretation of the results (Section 7.5).
6.3.5 Discussion
Our framework for robustness analysis of stochastic biochemical systems is entirely based on the parameterised uniformisation discussed in Chapter 5. It allows us to quantify and analyse how the validity of a hypothesis formulated as a temporal property depends on the perturbations of stochastic kinetic parameters. The framework extends the quantitative model checking techniques and numerical methods for CTMCs and adapts them to the needs of stochastic modelling in biology. Therefore, in contrast to statistical techniques such as Monte Carlo simulation, parameter sampling and adaptive grid refinement [40]), our framework is customisable with respect to the required precision of computation. This is obtained by providing the lower and upper bounds of the results that allow us to rigorously focus on a considered perturbation space of interest and to provide a detailed analysis of the evaluation function.
It is worth noting that the evaluation function can be discontinuous or may change its value rapidly on a very small perturbation interval in situations when the given CSL formula contains nested probability operators. In particular, this leads inevitably to the formulation of a hypothesis requiring a detailed temporal program [369] of the biological system (e.g.,
121
temporal ordering of events). This makes another reason why there is a need to guarantee the approximated shape of the evaluation function.
Both applications have demonstrated that the framework can be successfully applied to the robustness analysis of nontrivial sCRNMs. They have shown how to use CSL to specify properties targeting transient behaviour under fluctuations in a finite time horizon. The lesson learned from the applications is that there exist properties that cannot be directly formulated using CSL with rewards. To that end, we have introduced a concept of further processing the evaluation function to express and study the mean quadratic deviation of the molecule population distribution of the signal response regulator protein. This is described within the particular case study in Section 7.5.
6.3.6 Future Work
The crucial points for future work are similar to those stated in Chapter 5. Having robustness analysis techniques deployed on the top of hybrid (combined deterministic and stochastic) dynamics will allow to bring in better scalability necessary for analysis of larger scale models.
Additionally, an ongoing work is to focus on combining the param-eterised uniformisation framework with statistical techniques. This extension has the potential to efficiently scan multidimensional parameter spaces and to identify interesting subspaces that can be analysed in detail using the framework. Several steps have been already done in these directions (see Section 5.6).
A standalone task is to implement a software tool that supports robustness analysis with a suitable GUI allowing visualisation of the results.
6.4 Software Tools
The algorithm for deterministic robustness with STL* is implemented on the top of the tool Parasim [86]. The tool is developed as a modular environment for monitoring and robustness analysis of continuous-time deterministic models. It reads a specification in the form of STL and STL* formulae in an XML format and a model in the SBML level 2 format. The main functionality reflects the scheme in Figure 6.1 employed for cCRNM, cINM or cRINM models represented in SBML. Robustness analysis of the model is performed with respect to a given specified perturbation of model parameters which is sampled upto a given minimal distance among sampled points of the perturbation space. To evaluate the robustness, the quantitative semantics of STL and STL* is computed for each sampled parameter valuation. The tool is open source. It is written in Java and available
122
on GitHub1. The tool has been developed in several phases supported by student projects and theses supervised by myself. The students that have significantly contributed to the tool development are Jan Papoušek, Tomáš Vejpustek, Samuel Pastva, Vojtěch Brůža, and Aleš Pejznoch.
The algorithm for probabilistic robustness analysis with CSL has been implemented by Milan Češka as a prototype using the procedures of the PRISM 4.0 model checker [261]. An experimental implementation employing parallelisation on GPU that has been implemented as a follow-up work by Petr Pilař can be also used as a back-end for robustness analysis. However, a tool that would implement the specific support for robustness analysis is still missing.
^ttps: / / github.com/sybila/parasim
123
Chapter 7
Case Studies
In this chapter, we present a selection of case studies applying the methods presented in previous chapters to several models used in systems biology. Most of the case studies cover a large part of the workflow scheme presented in Figure 2.9 and in most of the cases include reconstruction (or reformulation) of the models. Some of the presented case studies (i.e., Section 7.2 and Section 7.5) present several new insights into the studied biological problem.
7.1 Overview
The presented case studies cover most of the model classes defined in Chapter 2. In particular, biodegradation of 1,2,3-trichloropropane (Section 7.2), regulation of Gi/S cell cycle transition (Section 7.3), mammalian cells signalling pathways (Section 7.4), and population models (Section 7.6) employ the framework of cCRNMs, cRINMs, or cINMs. The study of regulation of Gi/S cell cycle transition also includes a version realised in the framework of sCRNMs. Two-component signalling pathway study (Section 7.5) is done entirely in the framework of sCRNMs.
The case studies in Section 7.2, Section 7.3, and Section 7.4 address the problem of parameter synthesis. Parameter exploration problem is considered in Section 7.3 and robustness analysis is tackled in Section 7.5 and Section 7.6.
124
7.2 Biodegradation of 1,2,3-trichloropropane in E. coli
In this case study, we target a real problem solved in synthetic biology of metabolic pathways. The work has been realised in collaboration with Loschmidt laboratories of the Faculty of Science. The results demonstrate the applicability of the methodology provided in Chapter 4 in systems and synthetic biology workflows. The tool Pythia described in Section 4.4 is employed to perform the analysis presented in this section.
7.2.1 Problem Description
A synthetic pathway for conversion of the highly toxic 1,2,3-trichloropropane (TCP) to glycerol (GLY) in Escherichia coli was assembled as described in [256]. TCP is an emerging toxic groundwater pollutant and suspected carcinogen which spreads to the environment mainly due to improper waste management. According to [256] no naturally occurring bacterial pathway is capable of degradation of TCP. A synthetic reaction network consisting of five intermediates with glycerol as a final product and utilising enzymes from other bacterial species was assembled (Figure 7.1). The individual reactions in this pathway are positively influenced (catalysed) by enzymes which we consider as parameters of the respective model (they are considered constant as they reflect the energetical environment for the pathway).
The employed enzymes are haloalkane dehalogenase (DhaA) from Rhodococcus rhodochrous and haloalcohol dehalogenase (HheC), epoxide hydrolase (EchA) from Agrobacterium radiobacter. They have a major role in this pathway. In order to achieve an efficient implementation of the pathway it is important to quantitatively characterise mutual interplay and optimal concentration levels of these enzymes. In general, the higher is the enzyme concentration the higher is the flux rate. Especially, if a substrate and its intermediates are more or less toxic to a host cell such a requirement becomes critical.
Unfortunately, the solution is not straightforward because each of the enzymes has a distinct rate and some of the intermediate products are much more toxic than the others. Additionally, since these enzymes are not natural proteins in E. coli, they have to be produced at the expense of other substances. This is called a metabolic burden. In other words, there must be a balance in concentrations of these enzymes in order to degrade TCP as fast as possible while not killing the host by enervation. Therefore, we employ the workflow of parameter synthesis to give preliminary results targeting the complex problem of fine-tuning optimal enzyme concentration levels.
125
DhaA HheC EchA HheC EchA TCP -> DCP -► ECH -> CPD -► GDL -► GLY
d[TCP] _ k1-DhaA-[TCP] d[CPD] _ k3-EchA-[ECH] fc4-HheC'■[CPD]
dt ~ KmA + [TCP] dt ~ Km<3 + [ECH} KmA + [CPD]
d[DCP] _ k1-DhaA--[TCP] _ k2-HheC-[DCP] d[GDL] _ fc4-HheC'-[CPD] _ ks-HheC-[GDL]
dt ~ Kmil + [TCP] Km_2 + [DCP] dt ~ KmA + [CPD] Km,s + [GDL]
d[ECH] _ k2-HheC-[DCP] _ k3-EchA-[ECH] d[GLY] _ ks-HheC-[GDL] dt ~ Km<2 + [DCP} Km<3 + [ECH} dt ~ KmS + [GDL]
fci = 1.05, k2 = 0.751, kz = 14.37, k4 = 2.38, k5 = 3.96, atm,i = 1.79, Km,2 = 1.00, xm,3 = 0.09, KmA = 0.86, Km,5 = 3.54
Figure 7.1: (top) An abstract scheme of the original system. Note that enzymes HheC and EchA are employed twice on the pathway. The reverse mass flow is considered negligible and abstracted away, (bottom) The mathematical model. Enzyme concentrations are considered as constant (and unknown) parameters. Units: kx(s^1), Km,x(mM).
7.2.2 Model Encoding
First we denote the individual reactions in the pathway respectively gi,...,q5. Every catalytic reaction is modelled by using Michaelis-Menten kinetics [286] at the place of the respective regulated kinetic function:
where ki,Km^ are fixed as shown in Figure 7.1. For each Qi the kinetic function coefficient vcRiN(Qi) represents the (constant) concentration of the particular enzyme: vcrin{qi) represents DhaA; VcRIn{Q2) = ^cRIn{qa) represent HheC, and vcrin(Q3) = ^cRIn(Q5) represent EchA. There is a shared parameter for the sets of reactions {^2, £4} arid {£3, £5} and therefore three parameters are considered uncertain in total.
7.2.3 Analysis Procedure and Results
The original model taken from [155] was reduced in order to minimise the dimensionality of the system. Redundant reactions are eliminated based on their rates and catalytic efficiency (defined as ^x > see Table 7.1). In general, greater catalytic efficiency means a faster reaction flux towards generation of the product. Since we need to preserve the number of unknown parameters, the very first reaction cannot be omitted. Reaction towards CPD is undeniably the fastest reaction of the model not just due to the best catalytic efficiency but also because of the highest affinity which is an alternative interpretation of the reciprocal Michaelis constant. Therefore this
126
Table 7.1: Model reactions including enzymes, reaction constants and additional information about catalytic efficiency.
reaction enzyme rate (k) Michaelis const. (Km) cat. efficiency
TCP^DCP DhaA 1.050 1.79 0.587
DCP^ECH HheC 0.751 1.00 0.751
ECH^CPD EchA 14.370 0.09 159.670
CPD^GDL HheC 2.380 0.86 2.767
GDL^GLY EchA 3.960 3.54 1.119
reaction can be omitted in our model. The reaction towards GDL has the second fastest flux and since it is much faster than the last reaction it can be omitted as well. Finally, we have reduced the model to only three reactions which significantly helps to reduce the model state space while making the investigation of the three uncertain parameters tractable.
The desired property is defined verbally as "complete degradation of TCP as fast as possible with the least accumulated toxicity". The notion of toxicity is based on the inhibitory concentration of particular molecules. Our framework is designed for manipulation with differential expressions rather than with numerical assignments. Hence we are not able to directly observe the actual amount of toxicity. But the toxicity has a direct connection to the concentrations of intermediates. To this end, we translate the desired property as "TCP completely degrades and the concentration of intermediates does not exceed given bounds". The bounds are based on experimental data of the original model (Figure 7.2) with the default setting of parameters (DhaA = 0.003, HheC = 0.0036, EchA = 0.0029 (mM)) and initial concentrations ([TCP] = 2 mM, [other species] = 0 mM). Constants are shown in Figure 7.1. The presented data reveal that the considered boundary is reasonable for the concentration 0.5 mM or less. In consequence, we proceed by testing various combinations of bounds for GDL and DCP in the interval [0,0.5] of mM.
It has been mentioned that the concentration of enzymes cannot be unlimited due to the metabolic burden (which is not the object of investigation in this paper). According to the default values the initial constraints for these parameters are therefore set to the interval [0.0, 0.02] of mM. The parameter synthesis workflow employing the discrete abstraction of the cRINM as presented in Section 2.4.6 is employed with the CTL CMC procedure (Section 4.3) to find parameter values satisfying the desired property. The template of CTL formula expressing the property (denoted as p) is a combination of several smaller subformulae:
P! = (AG [TCP] < y), p2 = (A([TCP] > x)U(AF pt)), p3 = (AG [GLY] > x), p4 = (A([GLY] < y)U(AF p3)),
127
Time (min)
Figure 7.2: Experimental data from the original model [256]. We are interested just in the progress of TCP, DCP, GDL and GLY taken as variables of our reduced model.
ip6 = (AG [DCP] 5 A ipe), p> = {p>s/\p>7),
where x, y, v and w are estimated values making a particular instance of this property. Here x = 1.9 (according to [256] where authors use the value 2 mM), y = 0.01 (cannot be zero), v e {0.5, 0.3, 0.1} and w e {0.5, 0.25, 0.1} (variations based on an observation of the experimental data in Figure 7.2).
The result of parameter synthesis is the set of initial states (satisfying ip) each accompanied with a set of respective values of the parameters (DhaA, HheC, EchA). Results are encoded as a formula in the SMT-LIB format 2.5 [39]. Consequently, to compare and visualise satisfactory parameter values in a human-readable form some post-processing is necessary. In this case, we run a script that systematically samples and visualises the parameter space encoded by the formula (by calling the SMT solver iteratively). The result is the graphical representation of the parameter subspace constraint by ip and the initial parameter constraints. In Figure 7.3 the results are shown for a specific initial state.
Note that due to the global nature of our algorithm all states satisfying the property have been found. The concentration of all variables in this case study has been restricted to the interval [0.0, 5.0] mM. Our framework reveals parameter values satisfying tp also for initial states beyond the singular initial concentration of particular species considered in [256]. The most interesting are the initial states that increase the upper limit of TCP concentration wrt ip (Figure 7.4).
128
0 0.005 0.01 0.015 0.02
DhaA
° ^ i i r- ° ~i i i n
0 000 0 005 0 010 0 015 0 020 0 000 0.005 0 010 0.015 0.020
DhaA HheC
Figure 7.3: A sample of the resulting parameter space for a particular initial state: TCP G [1.9,1.9586], DCP G [0.448898, 0.5], GDL G [0.0, 0.0669138], GLY G [0.0, 0.01]. Dotted area corresponds to ip (v = 0.5, w = 0.25). Upper left figure the 3D space sampled with 400 points per a layer. Remaining figures display projections of the 3D plot for every combination of unknown parameters. All values are shown in mM.
7.2.4 Performance
Owing to the global nature of the enumerative CTL model checking algorithm all the subformulae are investigated during the process. However, the computation is time and space demanding and the utility of parallel algorithm has to be employed. The computation took more than one day on a single node while less than 2 hours on twenty nodes (each node equipped with a common hardware - Intel Xeon quad-core 2GHz and 16 GB of RAM).
129
Time (s)
Figure 7.4: (left) Resulting parameter space for a specific initial state: TCP G [3.84186,5.0], DCP £ [0.0, 0.448898], GDL e [0.0, 0.0669138], GLY G [0.0, 0.01]. The red dot shows the selected point for parameters values: DhaA = 0.001, HheC = 0.005, EchA = 0.015. (right) Numerical simulation for the selected point. All values are in mM. Simulation was obtained in BIOCHAM.
130
7.3 Regulation oiG\/S cell cycle transition
This case study is focused on demonstrating that the methodologies presented in Chapter 4 and Chapter 5 can provide a valuable (fully automatised) alternative to existing numerical and analytical techniques used for exploring how the system reacts to changes in parameters and what are the critical points in the parameter space that cause significant qualitative changes in systems behaviour.
d\pRB] - frl \e2f1] J„ _ r nn,
dt ~ ^ Kml + [e2f1] Jn + lpRB] lpRBlPn-r>\
a = 0.04, fci = 1, k2 = 1.6, kp = 0.05, -fpRB = 0.005 7£2Fi = 0.1, Jn = 0.5, J12 = 5, Kml = 0.5, Km2 = 4
Figure 7.5: An ODE model of the G±/S transition regulatory network presented in Figure 2.2 (right).
We address the model describing regulatory interactions controlling a transition between two phases of a mammalian cell cycle [345]. In particular, the model explains the core mechanism behind the irreversible decision for cell division described by a two-gene regulatory network of interactions between the tumour suppressor protein pRB and the central transcription factor E2F1. The respective influence network is depicted in Figure 2.2 (right) with the corresponding ODE model shown in Figure 7.5.
7.3.1 Problem Description
For suitable parameter valuations of the two parameters, two distinct stable attractors may exist (the so-called bistability). The problem is typically targeted by the so-called bifurcation analysis that is performed by using analytical and numerical methods. In [345] an analytical bifurcation analysis of E2F1 stable concentration depending on the degradation parameter of pRB ('Jprb) has been provided. Note that traditional methods for bifurcation analysis hardly scale to more than a single model parameter.
7.3.2 Model Encoding
The model does not fit the Definition 2.14 due to the occurence of kp as a summant in the right-hand side of the second differential equation. It represents a constant inflow (basal transcription) of the protein E2F\ and cannot be neglected. To that end, the model is encoded in terms of a
131
cRINM considering the following reactions underlying the influence network given in Figure 2.2 (right):
Qi : E2b 1 —> q% : pRB —>
q2 : h E2F1 g4 : ->• pRB
The regulated kinetic functions are defined for the reactions Q2 and g4 in the following way:
i < \ i i < \ a2 + y2 Ji2
Kmi +y Ju+x
where x = pRB, y = E2F\ and all other parameters are fixed to the values shown in Figure 7.5.
The parameterisation of the model is given by the parameterisation set X = {^2, £4} where the kinetic function coefficients are given as the parameters vcrin{Q2) = h and vcrin{qa) = 1prb-
7.3.3 CTL-based Analysis
In this section we employ the algorithm presented in Section 4.3 to parameter synthesis of the above mentioned model. The technique gives an alternative way to perform bifurcation analysis typically solved by analytical methods limited to restricted classes of system dynamics. In particular, we focus on the synthesis of values of two interdependent parameters.
Problem Settings
The property of bistability expresses that the system is able to settle in two distinct stable states (i.e., levels of concentration) for specific initial conditions and particular parameter values. It implies existence of a decisionmaking point (or area) in the system.
The main outcome of the original analysis is shown in Figure 7.6 (left) (produced by numerical analysis) displaying the dependency of stable concentration of E2F\ on value of 7p_rb (degradation rate). The most interesting area called unstable (for ^prb £ [0.007,0.027]) determines feasible values of 7Pi?s wrt the above property. For ^prb < 0.007 the system converges to a lower-concentration stable equilibrium whereas for ^prb > 0.027 it converges to a higher-concentration stable equilibrium.
The CTL representation of the property in consideration is tpi = (EF[AG[ta]j A EF[AG[high]]) where low = (0.5 < E2F1 < 2.5) (representing safe cell behaviour) and high = (4 < E2F\ < 7.5) (representing
132
0.005 0.01 0.015 0.02 0.03 0.035 0.005 0.01 0.015 0.02
IpRB IpRB
Figure 7.6: (left) Equilibrium diagram reproducing the results achieved in [345]. (right) Visualisation of the parameter synthesis results. The high and low stable regions are represented by the red and blue coloured areas, respectively. The yellow areas denote the states in which the bistable switch formula (p\ is satisfied.
excessive cell division). During the single run of our algorithm all subfor-mulae of tpi have been analysed. Let tp2 = (AG[low]) and ?3 = (AG[high]) as the most interesting.
Analysis Procedure and Results
With respect to the three properties, we have investigated perturbations of a single parameter ^prb with the initial constraint ^prb £ [0.001,0.025]. According to Section 2.4.6 we have first created the PWMA approximation of the original cINM (Figure 7.5 (right)) by approximating each non-linear function in the right-hand side of ODEs with a sum of optimal sequence of piece-wise affine ramp functions (the precision has been set to 70 automatically generated segments per each non-linear function). For such a setting the verification process took less than 10 seconds on twenty nodes. The obtained results are visualised in Figure 7.6 (right). The plot intentionally depicts the same space as the Figure 7.6 (left) to show obvious similarities of these results. The blue area stands for stable concentration of E2F1 (y-axis) with a particular value of ^prb (x-axis) satisfying the property tp2, whereas the red area satisfies the property tp^. The yellow area (in the middle) stands for possibility of reaching both stable concentrations. Due to the mixing of existential and universal quantifiers (see Sec. 4.3.2), the results achieved for tpi cannot be exactly interpreted. On the contrary, the results for tp2 and ?3 are guaranteed due to the conservativeness of the abstraction.
Although the algorithm based on interval-based encoding (Section 2.4.6) performs fast, it is limited to independent parameters only. To overcome this limitation, we have employed the SMT-based encoding (Sec-
133
Figure 7.7: (top left) The resulting parameter space merged for all initial concentrations. Each area corresponds to a different property: tpi (yellow), (f2 (blue) and ?3 (red), (top right) The same parameter space magnified and projected to 7P^s-axis. The framed region agrees with the original numerical bifurcation analysis performed in [345] for ^prb- (bottom) Landscapes of the parameter space according to the quantitative satisfaction degree computed by BIOCHAM for ip2 (left) and ips (right), respectively.
tion 2.4.6) to explore two uncertain mutually dependent parameters. The method is computationally more demanding (about one order of magnitude for each pair of dependent parameters). The goal of our extended analysis is to explore the mutual effect of the degradation parameter of pRB (IpRb) and the production parameter of pRB (ki) on the bistability. Additionally, we perform post-processing of achieved results by employing additional constraints on the parameter space (e.g., imposing a lower and upper bound on the production/degradation parameter ratio) and show an alternative way of presenting the results.
In particular, we involve the SMT-based tool Symba [266] to obtain an approximated interval of the bounds on valid parameter values. Since the considered parameters are linearly dependent, the resulting intervals can-
134
not be simply combined to display the two-dimensional validity area in the parameter space. To this end, we have employed Symba to explore the ratio of the two parameters. By combining initial parameter constraints with the bounds on the parameter ratio, a more accurate parameter subspace is acquired. Such an outcome has been used with the initial constraint IpRB G [0.001,0.1] and kx G [0.001,10] (Figure 7.7 (top left)). Additionally, we have explored a refined parameter space {^prb £ [0.001,0.025] and k\ G [0.001,2]) where a one-dimensional projection on the 7pi?73-axis is highlighted for k\ k, 1, the default value of k\ (Figure 7.7 (top right)).
The analysis took 8 minutes on twenty nodes (excluding postprocessing). The obtained results can be used as a base for further analysis. We employ the feature of BIOCHAM [94] to compute the landscape function that allows investigation of quantitative satisfaction degree of the properties explored (Figure 7.7 (down)). LTL reformulation of tp2 and ?3 has been used (tpi cannot be expressed in LTL). The lighter is the colour the higher the satisfaction degree.
7.3.4 CSL-based Analysis
It is important to note that the cINM described in previous section targets the dynamics from the perspective of the average population view. However, it is well known [195] that in cases of low molecular numbers the stochasticity of the dynamics can affect the behaviour of the system significantly. Especially, in the case of networks addressing interactions among proteins through the transcription/translation apparatus (the so-called gene regulation), the stochasticity has to be taken into account due to the fact that the proteins interact with a single copy of DNA. This fact means that the chance of a successful transcription-inducing interaction between a given regulating protein and a given regulated gene to occur in a certain time is affected by the cellular noise.
Model Reformulation
To address the problem of bistability discussed in the previous section, we have reconstructed the model in the stochastic framework of sCRNMs. This is done because we need to fit within the stochastic mass action kinetics. In particular, we have reformulated the original CIN as a CRN that gives an approximate mechanistic view of the mechanism. The resulting CRN is shown in Figure 7.8. The correspondence with the model presented in the previous section is the following: A denotes the protein pRB, a stands for a gene encoding pRB, B denotes the protein E2F1, and b stands for a gene encoding E2F1. Since the detailed knowledge of elementary chemical reactions occurring in the process of transcription and translation is incomplete, we use the simplified form as suggested in [159]. In
135
the minimalistic setting, the reformulation requires addition of rate parameters describing the transcription factor-gene promoter interaction while neglecting cooperativeness of transcription factors activity. Our pasteurisation is based on time-scale orders known for the individual processes [366] (parameters considered in Moreover, we assume the numbers of A
and B are bounded by 10 molecules. Upper bounds for A and B are set with respect to behaviour of an ensemble of stochastic simulations. We consider minimal population number distinguishing the two stable modes. All other species are bounded by the initial number of DNA molecules (genes a and b) which is conserved and set to 1. The considered parameterisation is limited to the degradation coefficients ^a,7b assigned to the degradation reactions of A and B, respectively. In this settings, the corresponding parameterised CTMC has 1078 states.
Gene a interactions Gene b interactions
a —» a + A aB ->• aB + A 1 1 b^b + B bB ->bB + B 0.05 1
A + a o aA B + a o aB 100;10 100;10 A + b ^ bA B + b^bB 100;10 100;10
Protein degradation
A -> 1A B -> ib
Figure 7.8: Chemical reaction network reformulation of the G\/S regulatory circuit - a, b represent genes, aA,aB,bA,bB represent transcription factor-gene promoter complexes.
Analysis Procedure and Results
We consider three hypotheses: (1) stabilisation in the low mode where B < 3, (2) stabilisation in the high mode where B > 5, (3) stabilisation in the high mode where B > 7 ((3) is more focused than (2)). All the hypotheses are expressed within time horizon 1000 seconds reflecting the time scale of gene regulation response. We employ two alternative CSL formulations to express each of the three hypothesis. According to [345], we consider the parameter space £ [0.005, 0.5] and 75 to be fixed to the default value.
First, we express the property of being inside the given bound during the time interval / = [500,1000] using globally operator: (la) P^?[G7 (B < 3)], (2a) P^?[G7 (B > 5)] and (3a) P^?[G7 (B > 7)}. The interval starts from 500 seconds in order to bridge the initial fluctuation region and let the system stabilise. Since the stochastic noise causes molecules to repeatedly escape the requested bound, the resulting probability is significantly lower than expected. Namely, in cases (2a) and (3a) the resulting probability is close to 0 for the whole parameter space. Moreover, the selection of an ini-
136
tial state has only a negligible impact on the result. Therefore, in Figure 7.9 only the resulting probability for case (la) and a single selected initial state is visualised.
Property (lb) (2b) (3b) (la)
Figure 7.9: Landscape functions of properties (la,lb,2b,3b) for 7^4 e [0.005,0.5] (in s-1) and initial states #0, #997 and #1004. The left Y-axis scale corresponds to (la), the right to (lb,2b,3b).
Second, we use a cumulative reward property to capture the fraction of the time the system has the required number of molecules within the time interval [0,1000]: (16) R^?[C^](5 < 3), (26) R^?[C^](5 > 5), (36) R,v?[C^](£ > 7) where t = 1000 and R^?[C^*](£ ~ X) denotes that state reward p is defined such that Vs £ S.p(s) = 1 iff B ~ X in s. The result is visualised for three selected initial states in Figure 7.9.
Figure 7.9 also illustrates inaccuracy of our approach with respect to the absolute error bound ERR = 0.01 by means of small rectangles depicting approximations of the resulting probabilities and expected rewards. The analyses predict that the distribution of the low steady mode interferes with the distribution of the high steady mode. It confirms bi-stability predicted in [345] but in contrast to the deterministic analysis our method shows how the population of cells distributes around the two stable states. Results of computations including the number of iterations performed during parameterised uniformisation, numbers of resulting subspaces and execution times in hours, are presented in Figure 7.10.
Finally, to see how degradation rates of A and B cooperate in affecting property (36), we explore two-dimensional parameter space (7,4,7s) £ [0.005, 0.1] x [0.05, 0.1]. The computation also required 4.0 • 106 iterations of the parameterised uniformisation, the parameter decomposition resulted in 143 subspaces for ERR = 0.1 and the overall execution took 14 hours. Figure 7.11 illustrates the computed upper bound of the landscape func-
137
Property # iter. #subsp. time[h]
(la) 1.2-106 153 9
(2a) 2.0-106 69 5.5
(3a) 2.0-106 66 4.5
(lb) 4.0-106 159 10.5
(2b) 4.0-106 132 8
(3b) 4.0-106 80 5
Figure 7.10: Computation performance results.
Upper bound of lanscape function Error of computation
0.005 0.02375 0.0525 0.07125 0.1
Figure 7.11: Landscape function for property (36), initial state #0 (A = 0,B = 0,a = 0,6 = 0,aA = 0,aB = l,bA = 0,bB = 1) and two-dimensional parameter space (7,4,7b) £ [0.005,0.1] x [0.05,0.1] (represented in s^1 by X and Y axes, respectively). On the left, the upper bound of the landscape function is illustrated. On the right, the absolute error given as difference between computed upper and lower bounds is depicted. In both cases the color scale is used.
tion for initial state #0 and the absolute error. The result predicts antagonistic relation between the degradation rates which is in agreement with the study provided in [345].
138
7.4 Sustained vs Transient Modes of Cell Signalling
In this section we employ the CTL-based parameter synthesis methods described in Chapter 4 to differentiate models that display or do not display a particular studied dynamical behaviour expressed in the form of a temporal property. Details on the modelling and analysis provided here have been published in [210].
7.4.1 Problem Description
Signalling pathways represent one of the most important biochemical mechanisms studied in current systems biology. In particular, they provide a complex cellular information processing machinery that evaluates input stimuli and transfers them into genome by means of regulation of specific genes expression. A special emphasis is given to distinguishing between monotonous (sustained) and non-monotonous (transient) time-course behaviour of signalling pathways [324, 365]. It is believed that transition between these two modes may cause a significant change of the nominal cell behaviour leading to serious anomalies of internal cellular processes control.
7.4.2 Model Encoding
We consider three cBNM models falling to the class of cINMs. These models describe a general shape of signalling pathways at a high level of abstraction. In particular, we focus on three topologies of influence networks differing in the presence/absence of feedback mechanisms (Figure 7.12). We use Hill kinetics employing sigmoidal functions to describe the response of a signalling component with respect to the input signal (see equations in Figure 7.12). In particular, our models consist of the species a = {A, R, TP} where R is a receptor, A an adapter, and TP a target protein. The adapter forms the main dynamical entity of the model. It is activated by the receptor and inhibited by the target protein. Receptor concentration is considered constant, hence the positive influence of the adapter from the side of the receptor is considered as constantly active and its dynamics is not considered in the model.
We consider three model variants differing in influences of the adapter. Model 1 reflects the topology with no inhibition (the dashed interaction is removed and Equation 7.3 is employed). Note that the constant activity of the receptor influence is reflected by a constant regulation function (a\(x) = 1). Model 2 describes independent inhibition (the dashed interaction is considered to be superposed with the constant activation of A by R, reflected in Equation 7.4). In particular, there is a separate regulation function for each of the influences of A. Model 3 describes dependent
139
inhibition (the effect of the inhibition is multiplied with the activation by R, reflected in Equation 7.5) using a regulation function multiplying the constant activation with the negative influence modelled as a sigmoidal function (Equation 7.2). In this case, there is a single regulation function that combines both influences of A. Target protein dynamics is modelled with a positive regulation function (Equation 7.1) of the adapter in all three models (Equation 7.6).
R A TP
1?
h(X,KM,n) h~(X,KM,n)
[xy
KMn + [X\n ' KM" + [X]>
(7.1)
(7.2)
Figure 7.12: Influence networks representing the considered variants of signalling pathways. Dashed line represents optional inhibition (left). Positive and negative regulation function employing Hill kinetics (right).
d[A] dt
dt
d[TP] dt
d[A] dt
= VA-yA[A\
= VA + VMax.a • h (TP, KM_A, nA) - va[A] d[A]
VA-h-(TP,KM.A,nA)-yA[A}
= Vmax-TP ■ h(A, Km.tp, nTP) - vtp[TP]
(7.3)
(7.4)
(7.5)
(7.6)
Parameter V'A is defined as Va ■ Vmax_a- Default parameters have been set to: V = V'A = Vmax-A = Vmaxjtp = 0.001, Kmj = Km.tp = Va = Vtp = 0.1, i%a = riTP = 2. Initial concentrations of all entities have been set to 0: A(0) = TP(0) = 0.
Parameterisation of the models is set to the production rate coefficients so that the following coefficients are considered as parameters: Va in Model 1, Vmax_a m Model 2, VA in Model 3 and Vmaxjtp in all three models. These parameters are examined in the range [0.0001,10] and other constants are fixed at default values with the only exception of va, Vtp (set to 0.5009) and VA in Model 2 (set to 0.001).
7.4.3 Analysis Procedure and Results
In order to prepare the model for model checking analysis, we have performed the steps described in Section 2.4.6. Firstly, the piece-wise affine approximation (PWMA) of the original non-linear continuous models has
140
been constructed by applying the automatised approximation procedure introduced in [205]. In particular, we have approximated each non-linear function appearing in the right-hand side of the model equations with a sum of ten piece-wise affine ramp functions. Secondly, the abstraction procedure described in Section 2.4.6 has been applied to the PWMA model. The only extension to the procedure is the addition of a new information stored with the transitions. In particular, every transition in the abstracted Kripke structure k,ahst is assigned a label representing the index of the model variable affected by the transition and the corresponding direction (+,—) in which the change occurs.
Properties are formulated in terms of UCTL formulae (the basic CTL logic extended with action-label predicates [346]). The usage of branching time logic is motivated by the fact that the rectangular abstraction produces a non-deterministic system. Combination of action and state predicates is necessary to express local patterns of the dynamics (state predicates) and the character of the transitions (action predicates).
We employ the parameter synthesis based on coloured CTL model checking extended to deal with UCTL operators. Since we restrict ourselves to particular operators as needed in the properties of our interest, the extension is a direct refinement of the algorithm described in Section 4.3. The UCTL operators employed are the following:
• EX{j_}? requiring existence of at least one state satisfying ip to be reached by a transition that decreases the variable i,
• AX{i+}(/;, EX{i+}(/j requiring all states (resp. existence of at least one state) reached by a transition that increases the variable i to satisfy ip,
• AF{^_j(/j, EF{^j_}? requiring all paths (resp. existence of a path) to a state satisfying ip provided that no transition decreases the value of variable i — satisfying paths must be non-strictly increasing,
• EF{^+}(/3 requiring existence of a path to a state satisfying tp on which the variable i is non-strictly decreasing.
The parameter synthesis procedure gives us a global result saying for which initial states and respective parameter values the formula is satisfied. It is important to note that the over-approximating abstraction affects interpretation of the parameter synthesis results. On the one hand, satisfaction is guaranteed for universally quantified formulae. On the other hand, falsification is guaranteed for existentially quantified formulae. Quantifiers cannot be alternated. In both cases of guaranteed results, the obtained parameter values are under-approximated.
To express combined characteristics of the two behaviours of TP, we have employed the following properties in the form of UCTL formulae:
141
• tpi = Init A AX|Tp+}(AF|^Tp-} Stable),
• tp2 = Init A EF(EX{Tp+}(EF{^Tp_}(EF{^TP+}(EX{Tp_} True)))),
• (p3 = Init A EF(EX{Tp+}(EF{^Tp_}(EF{^Tp+}(EX{Tp_} NotUp)))),
where Init stands for the set of initial states (with TP constrained in the range [0.0,0.0001]), NotUp for a state in which the abstracted vector field forbids an increase in TP concentration, Stable for an equilibrium state with both species being stable and, finally, True for any state. Intuitively, property tpi specifies the sustained behaviour and its satisfaction guarantees presence of this behaviour in a given model for given initial states. Formally, this formula restricts all admissible runs to start in Init and to increase TP concentration in the very next step and not to decrease it before reaching Stable. The properties ?2 arid ?3 represent necessary conditions for presence of the transient behaviour (formulae are ordered with respect to the strength of the particular condition). Both restrict any feasible run to start in Init and to increase TP concentration at least once on a run before decreasing it also at least once on a run. The parameterisations violating these properties guarantee the absence of the transient behaviour. All results are summarised in Table 7.2.
model type property initial concentration of A V* x Vmax.tp
Model 1 ^V3 [0.22,11.9] [0.22,12.0] [0.01,12.0] [0.11,5.96] x [0.0,6.81] [0.11,10.0]x [7.23,10.0] [0.23,10.0] x [6.26,10.0] U[5.96,10.0]x [0.0,10.0]
Model 2 fl ^2 ^f3 [0.01,11.5] [0.22,12.0] [0.0,12.0] [0.0,5.76] x [0.0,0.01] [2.29,10.0] x [7.23,10.0] [0.09,10.0] x [7.23,10.0] U[4.47,10.0]x [0.0,10.0]
Model 3 fl ^2 ^f3 [0.01,11.5] [0.22,12.0] [0.0,12.0] [0.01,5.76] x [0.0,0.01] [2.31,10.0] x [7.23,10.0] [0.11,10.0]x [7.23,10.0] U [4.49,10.0] x [0.0,10.0]
Table 7.2: The results obtained for given properties in different models. Initial concentration of A is defined as a union of concentration values in states where the formula holds for stated parameters. V* is the production parameter of A that represents Va in Model 1, Vmax_a in Model 2, V'A in Model 3, respectively. Each parameter interval determines a range of all satisfying parameter values across all states of the respective model where the particular property holds.
It is important to note that the abstraction together with UCTL logic allow us to exactly characterise the inevitability of the sustained behaviour.
142
However, as has been reported in [85], the state predicate Stable is exactly preserved in the abstraction only if a particular equilibrium that must exist in the respective state rectangle is hyperbolic. In consequence, we are not able to fully cover the transient behaviour. More specifically, the transient behaviour might asymptotically converge to an equilibrium that is asymptotically stable. Such a property is not preserved by the abstraction and therefore we limit ourselves only to refuting absolutely transient behaviour (transient behaviour without oscillations around the target equilibrium).
143
7.5 Robustness of Elemental Signalling Pathways
In this section we present a case study utilising the stochastic robustness analysis with CSL described in Section 6.3. The study bring new insights into the mechanism of two-component signalling pathways occuring in bacteria.
Signalling pathways make the main interface between cells and their environment. Their main role is to monitor biochemical conditions outside the cell and to transfer this information into the internal logical circuits (gene regulation) of the cell. Since signal processing is carried out by several dedicated protein complexes (signalling components), it is naturally amenable to intrinsic noise in these protein populations caused by stochas-ticity of transcription/translation processes. Robust input-output signal mapping is crucial for cell functionality. Many models and experimental studies have been conducted attempting to explain mechanisms of robust signal processing in prokaryotic cells, e.g., [47, 330].
7.5.1 Model Reconstruction
In order to construct robust signalling circuits in synthetically modified prokaryotic cells, Steuer et al. [338] has suggested and analysed a modification of a well-studied two-component signalling pathway that is insensitive to signalling components concentration fluctuations. The study was conducted using a simplified model consisting of the two signalling components each considered in both phosphorylated and unphosphorylated forms. The first component, the histidine kinase H, is a membrane-bound receptor phosphorylated by an external signalling ligand S. In its phosphorylated form Hp, the histidine kinase transfers the phospho-group onto the second component - the response regulator R. That way it activates the response regulator by transforming it into the phosphorylated form Rp, which is diffusible and functions as the internal signal for the cell. The basic topology of the pathway is depicted in Figure 7.13A. The modification suggested by Steuer et al. is depicted in Figure 7.13B. The difference is in the addition of catalytic activation of Rp dephosphorylation by the unphosphorylated histidine kinase H. In [338] it has been rigorously proven that under the deterministic setting this modification leads to globally robust steady-state response of the signalling pathway, which is not achievable with the basic topology.
We reformulate the model in the stochastic setting (in terms of a sCRNM) and employ the stochastic robustness analysis method described in Section 6.3 to provide detailed analysis of the input-output signal response under fluctuations in population of both signalling components. In contrast to [338], where the average steady-state population is analysed with respect to fluctuations in signalling components, our analysis refines
144
the steady population in terms of distributions. That way we obtain for a stable input signal a detailed view of distribution of the output response. In particular, instead of studying the effect of perturbations on the average population, we see how perturbations affect the distribution, i.e., the variance (fluctuation) in the output response. That way the stochastic framework gives a more detailed insight into the input-output signal response mechanism.
The biochemical model of both topology variants is given in Figure 7.13C. The input signal S is considered to be fixed and therefore it makes a constant parameter of the model. The signalling components in both phosphorylated and unphosphorylated forms make the model variables H, Hp, R, and Rp.
7.5.2 Problem Description
Depending on which topology is chosen, the original deterministic model [338] exhibits different relationships between the steady-state concentrations of the input signal S and the output signal Rp:
Rp steady-state in model 1 Rp steady-state in model 2
[Rp] = h-ism m = hL[S]
In particular, it can be seen that the steady-state concentration of the output signal [Rp] in model 1 is affected not only by the input signal S but also by the number of unphosphorylated receptors H, which can be interpreted in such a way that the concentration of the signalling components should be kept stable in order to obtain a robust output. This is, however, not an issue in model 2 where Rp depends only on S. Since the steady-state analysis has been carried out under the deterministic setting additionally imposing assumptions of conserved total amounts of H + Hp and R + Rp, it is appropriate only for high molecular populations.
The question we want to answer is "Is there a difference in the way the two models handle noise (fluctuations) for low molecular numbers of signalling components?" In such conditions, populations of H + Hp and R + Rp cannot be considered conserved, since the proteins are subject to degradation and production. Production of proteins from genes, as well as degradation, is inherently noisy as demonstrated in the previous case study. Different levels of noise can be affected by, e.g., regulatory feedback loops or varying numbers of gene copies. Even for a noiseless output signal S these internal fluctuations of protein concentrations transfer noise to Rp.
145
General signal transmission reactions
H + S A. Hp + S Hp + R -^-> H + Rp k, = 0.1 k2 = 0.1
Basic topology (model 1) || Modified topology (model 2)
Rp^_>R k3, = 1.0 Rp + H^»R + H k32 = 15.0
Signalling components degradation
H 0 Hp^-> 0 R 0 Rp A» 0 kd = 0.01 K = 0.01
Signalling components synthesis
0i>R 0i»H kp = 0.3
Figure 7.13: Model of a two-component signalling pathway. (A) Basic topology of the two-component signalling pathway. (B) Modified topology of the two-component signalling pathway, additionally, histidine kinase H catalyses dephosporylation of the response regulator R. (C) Reactions specifying the biochemical model of the two considered topologies of the two-component signalling pathway.
146
7.5.3 Post-Processing Functions
To analyse the robustness of signal response, we need to characterise the level of noise (fluctuations) observed on the component Rp in a given time. This can be achieved by computing the mean quadratic deviation of the distribution of Rp in time.
Standard reward operators do not allow to additionally process the result during the transient analysis. For the purpose of computing the mean quadratic deviation of the distribution of Rp in time, such a procedure is needed. To that end, we extend CSL with an operator of so-called post-processing functions denoted E=?[l*] and defined over probability density vectors 7rC'X°'*. For the mean quadratic deviation of noise the postprocessing function is denoted Post^ir) and defined as Post(-7TC,Xo,t) = Exex\#(A) ~ rnean(-KC^\ A)\2 ■ 7rC'X°'*(X) where gives the pop-
ulation of species A in state X and mean(irc,Xo,t, A) is the mean of the distribution 7rc'x°'* defined as meara(7rC'Xo'*, A) = Y.xex#(A) ' ^C,Xo,t{X). The formula E=?[l*] stands for "The mean quadratic deviation of the distribution of species A at time instant t."
7.5A Analysis Procedure and Results
For the model to have low numbers of molecules exhibiting stochastic fluctuations and to enable responses to varying levels of S, we have chosen kp = 0.3 molecules-s~x and kj = 0.01 which leads to an average total population of 30 molecules for both H + Hp and R + Rp. To make the analysis straightforward we assume the same speed of degradation of phosphorylated and unphosphorylated variants of each protein.
To reduce the size of the state space we have truncated total populations to 25 < H + Hp < 35 and 25 < R + Rp < 35, which leads to 116281 states in total. The initial state is considered with populations so = (H = 30, Hp = 0,R = 30, Rp = 0).
Modelling Noise in Signalling Components
In order to control fluctuations in protein production we extend our model with two populations of genes, one for H and one for R, respectively, and for each of the genes we introduce an autoregulatory negative feedback loop via binding the proteins to their corresponding genes. That way we restrict the protein production. By modifying the number of gene copies in the cell and the rate of protein-gene binding, we are able to regulate the overall noise in the transcription. This approach, however, significantly increases the state space size because it introduces new variables representing genes and protein-gene complexes. To make the analysis feasible, we abstract from details of the underlying autoregulatory mechanism and
147
model it using a sigmoid production function, which mimics the desired behaviour accordingly. Using numerical analysis, we have verified that such an approximation can be employed in the stochastic framework. The function is defined in the following way:
0 st9^n) X sig(kp, n) = --' K
1 + (.30/
where n is the so-called Hill coefficient controlling the steepness of the sigmoid (caused by cooperativity of transcription factors in protein-gene interactions) and kp is the maximal production rate. We use this approach for modelling the production of both species H and R by sigmoid coefficients denoted uh and tir, respectively. The sigmoid function regulates the population by enabling production when it is below average and repressing it when the population is above the average. Larger n is leads to steeper sigmoid functions, which leads to stronger regulation and lower noise. The case n = 0 corresponds to an unregulated model and when increased to n = 20 it corresponds to over 10 copies of each gene in the fully modelled feedback loop mechanism.
Transient Analysis of Noise in Signalling Components
To see the long-term effects of intrinsic noise we decided to examine the system in the situation when the output response is stabilised. Since the min-max approximation method cannot be employed with steady-state computation, transient analysis in a suitable time horizon has been performed instead. To estimate the closest time t when the system's behaviour can be observed as stable, we have computed values of output response noise for the unregulated variant of the model (n = 0) using standard numerical steady state analysis (we employed the tool PRISM [261]) and compare it to probability distributions obtained by transient analysis in t = 20, t = 50 and t = 100 seconds. Consequently, we have compared the probability distribution in the steady state with the probability distribution in t = 100 seconds. The results clearly show that the difference in distributions is negligible and the transient distribution can be considered stable after t = 100.
To further speed up the computation, we have precomputed the distribution of H and R in the time horizon t = 100 without enabling phosphorylation reactions. This has led to a significant reduction to 121 states. Starting with the achieved probability distribution, we have subsequently computed the transient analysis with enabled phosphorylation reactions in the next 5 seconds. The rationale behind is that the protein production and degradation are two orders of magnitude slower than phosphorylation. Therefore, the total populations of H and R dictate the time at which
148
the system is nearly stable and thus the next 5 seconds are sufficient for the fast-scale phosphorylation to stabilise the fractions and .
To compute the noise (variance) in Rp we employ the mean quadratic deviation post-processing function for state space distributions. Our goal is to compare the levels of Rp noise in both models for different levels of the output signal S and for different values of intrinsic noise appearing in protein production (controlled by sigmoid coefficients nu and tir). After computing lower and upper bounds of the state space distributions, we have computed the lower and upper bounds of the post-processing function using the parameterised uniformisation algorithm discussed in Chapter 5. Consequently, we obtain robustness values for the output response Rp over the respective perturbation subspaces in the form average ± error. Finally, we define the perturbation space of the interest. In particular, for the signal we choose the value interval S £ [2.0,20.0] and for sigmoid coefficients nH,nR G [0.1,10.0].
Since the full computation over the 3-dimensional perturbation space has turned out to be intractable, we have to find a way to reduce its dimension. To this end, we focus on a subspace S = 15.0, (n#, njA e [3.0, 4.0] x [3.0,4.0] where both models have symmetric sensitivity to both sigmoid production coefficients uh, riR. This symmetry allows us to merge riH,nR into a single coefficient n. Results of this experiment are visualised in Figure 7.14, where it can be seen that in Model 1 the influence of nu and riR is almost perfectly symmetrical with nu being slightly more influential. In Model 2 the influence is evidently stronger in riR but the response seems to be symmetrical enough to justify the sigmoid coefficients merging. An interesting property of the parameterised uniformisation and the perturbation space decomposition algorithm can be seen in Figure 7.14, where the decomposition of the perturbation spaces around both sigmoid coefficients set to 3.1 is very dense. This is due to the non-linearity of the sigmoid production functions, which leads to the non-monotonicity of probability inflow/outflow differences in states during parameterised uniformisation. In order to preserve the conservativeness of estimates we have to locally over/under approximate these inflow/outflow rates thus gaining an increase of error. To obtain the desired level of accuracy, we dynamically refine all those subspaces where this has occurred.
Response of Noise to Different Signal Levels
Finally, we inspect selected subintervals of the perturbation space given by five exclusive intervals of the input signal value domain, S 6 [2, 3] U [6, 7] U [10,11] U [14,15] U [19, 20], and three distinct levels of production noise represented by sigmoid coefficient n G {0.1,4.0,10.0}. The results of this main experiment can be seen in Figure 7.15 and Figure 7.16. The trends that can be seen in Figure 7.15 are that for lower signals up to S = 10. Model 2 has
149
A Model 1: Rp noise w.r.t nH and nR, S = 15
Robustness average Robustness error
nH nH
Figure 7.14: Influence of genetic regulation on noise in model 1 and 2. In the upper part, the Rp noise in model 1 is displayed over perturbations of both sigmoid production constants uh and ur in [3.0,4.0] x [3.0,4.0]. The upper and lower bounds on noise (mean quadratic deviation of the resulting probability distribution projected onto populations of Rp) axe recomputed into the form average ± error, the average values are shown on the left and errors are shown on the right. In the lower part, the Rp noise in model 2 is displayed.
150
Rp noise w.r.t. different levels of signal S and genetic regulation n
2.395 ±0.057
2.396 ±0.058
2.397 ±0.058
7.645 ±0.198 7.527 ±0.196 7.357 ±0.191
5.811 ±0.160 5.818 ±0.161 5.820 ±0.162
5.850 ±0.324 5.608 ±0.300 5.307 ±0.287
5.740 ±0.307 5.753 ±0.239 5.759 ± 0.244
Signal S
11.468 ±0.330 11.418 ±0.334 11.385 ±0.334
5.324 ±0.286 5.972 ± 0.275 5.564 ± 0.262
18.729 ±0.622 18.525 ±0.639 18.286 ±0.616
•A
coefficient n Model 1 Model 2
9.636 ± 0.243 9.176 ±0.234 8.664 ±0.221
0.1 4.0 10.0
15
16
Figure 7.15: Comparison of models by Rp noise robustness. Robustness Rp noise in both models has been computed with respect to perturbations of signal S over five selected intervals of the input signal.
encountered lower noise in Rp than Model 1 but in the higher signal region it is outperformed by Model 1, which quickly converges to values between 8 and 10. However, Rp noise produced in Model 2 linearly increases with increasing value of the input signal S. For most of the inspected subspaces a stronger regulation of H and R production by the sigmoid coefficient n leads to a reduction of Rp noise. An exception to this observation can be seen in Model 2 at the signal interval [19.0, 20.0] where this trend is inverted. To show that this is an emergent behaviour arising from non-trivial interaction between phosphorylation and dephosphorylation reactions not present in the production and degradation of components H and R, their respective influences are displayed in Figure 7.16. There we can see that in Model 1 both H and R follow an initial increase of noise with increasing S but then the noise stabilises. This leads us to a hypothesis that the regulation of noise in signalling components dynamics loses its influence as signal S increases. This is however due to the fact that more S leads to faster phosphorylation of H, which effectively reduces the population of H thus also reducing its absolute noise. In the case of Model 2 the situation is different since we can observe a permanent increase of noise in both H and R populations.
7.5.5 Performance
The parameter space decomposition procedure has iterated through several thousands of perturbation subsets that where required to achieve reasonable precision. In order to speedup the computation we analysed the
151
Model 1 & 2: H noise w.r.t signal S
coefficient n Model 1 Model 2
9.685 ± 0.225 9.524 ±0.214 8.016 ±0.193 7.931 ±0.184 6.353 ±0.154 6.340 ±0.148
0.1 4.0 10.0
11.049 ±0.303 10.123 ±0.260 9.758 ±0.271 9.140 ±0.237 8.457 ±0.236 8.128 ±0.212
12.439 ±0.437 11.360 ± 0.308 10.264 ±0.284
10.056 ±0.367 9.456 ± 0.331 8.809 ± 0.306
27.268 ± 0.873 24.926 ± 0.822 22.672 ± 0.728 1
114.689 ±0.423 ■ 13.475 ±0.394 ,12.309 ±0.360
8.639 ± 0.266
8.482 ±0.217 8.241 ±0.212l
Signal S
Model 1 & 2: R noise w.r.t signal S
coefficient n Model 1 Model 2
0.1 4.0 10.0
12.579 ±0.286 11.103±0.258 10.653 ±0.247 9.124 ±0.217 8.718 ±0.205 7.163 ±0.173
14.544 ±0.393 14.462 ±0.369 12.562 ±0.347 12.789 ±0.331 10.595 ±0.294 11.079 ± 0.287
| 17.564 ±0.607 . 15.561 ±0.419 I 13.575 ±0.376
I 13.615 ±0.496 I 12.277 ±0.427 I 10.898 ±0.375
Signal S
33.005 ± 1.084 29.737 ± 1.006
26.600 ± 0.872
I 20.736 ± 0.590 ■ 18.528 ±0.538 16.395 ±0.478
I 12.425 ±0.378 I 11.345 ±0.346 1 10.234 ±0.313
11.309 ±0.287 10.442 ±0.265 9.550 ± 0.242
18
19
Figure 7.16: Noise in populations or H and R in both models Noise in H (A) and R (B) in both models is displayed with respect to perturbations of signal S over five selected intervals and for three distinct levels of inherent production noise.
152
subsets in parallel using a high-performance multi-core workstation were the analysis took several hours.
To further improve the accuracy of the robustness analysis without decreasing the performance, piece-wise linear approximation can be employed. It is computed by linearly interpolating the grid points in which the upper and lower bounds of the evaluation function may be computed more precisely as the minimum resp. maximum of the values from all parameter subintervals sharing boundary grid points. It allows us to obtain more precise results without increasing the number of perturbation sets, yet it does not guarantee conservative error bounds.
153
7.6 Robustness of Population Dynamics
By employing the Parasim tool discussed in Section 6.4 we have conducted two case studies on two basic population dynamics models. The experiments have also served us to evaluate the method presented in Section 6.2 (in the setting of the Parasim tool).
7.6.1 SIR Model
First, we demonstrate the robustness analysis on the model simulating an outbreak of an infectious disease in a population [236]. The simulated population is divided into three categories: susceptible (S), infected (I) and recovered (R). A susceptible individual can become infected by contact with another infected individual and an infected individual may recover. The considered variant of a SIR model is represented as a CRN in the following way:
S + I-%1
iAr
where a is the contact rate parameter which correlates to probability of disease transmission, while /3, the recovery rate parameter, takes into account the standard length of recovery. The corresponding parameterised cRNM is given by means of the following system of differential equations:
dS dl dR
—— = —aSI — = aSI — pi —— = pi
at at at
A typical simulation of this model (see Figure 7.17a) includes a rapid increase in infected individuals, which is then followed by their gradual recovery.
In this case study, we compare robustness analysis based on a formula containing value-freezing with respect to a freezing-free formula analysis exploiting a similar behavioural pattern. In particular, we consider the following formulae:
STL : Vl = F[li5](J > 50) STL* : 50 A * G[0.25,5](/* > /))
Both formulae require the number of infected individuals to be greater than 50 at some time in the interval [1, 5], while tp2 also requires this number to be the local maximum (the number of infected individuals is required to decrease after reaching this maximum).
The robustness with respect to both properties was analysed on perturbations of both contact rate and recovery rate. Results are presented in Figure 7.18.
154
50
(a)
(b)
Figure 7.17: (a) Typical development of SIR model, showing the number of susceptible (green), infected (red) and recovered (blue) individuals, (b) Typical development of populations in predator-prey model, showing number of prey (green) and predator (red).
While the satisfaction sets of tpi and tp2 (delineated by positive robustness) are essentially identical, the actual robustness values show a significant difference. Generally, when they are positive, the value of robustness with respect to tpi at given point is considerably greater than the corresponding value of robustness with respect to ?2- In Figure 7.18, this can be seen as lighter shade of green points in Figure 7.18b. Also, lower robustness causes the apparent increase in the number of points.
The reason for the rapid change in robustness comes from evaluation of the subformula * G[0.25,5](^* > I) that describes the local extreme. When evaluated in time t, robustness is proportional to the difference (/[£] — I[t + 0.25]) (by Definition 6.1). In practise, the difference is small provided that the descent of / is not extremely steep. This causes such formulae to have typically low robustness values on common signals.
7.6.2 Predator-Prey Model
In the second case study we analyse the predator-prey model [272, 360], which attains oscillating behaviour for a wide variety of parameters. We use a variant of the Lotka-Volterra model represented as a CRN in the fol-
155
° \-1-1-1-1-° h-1-1-1-1-
0.00 0.01 0.02 0.03 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05
contact rate contact rate
(a) Robustness wrt Y*)
1P2 = G[0j300] {{X > 1 A Y > 1 A F[0j50] *(F[0>75] (X* - X > 25) AF[0M(X-X* >25))))
The property V>i requires that for each time point £ 6 [0, 300], there is a subsequent time point £' 6 [£, £ + 100] such that population of prey in £' is greater than population of predators in £. According to Definition 6.1 its corresponding robustness can be expressed as follows:
, n . x[t']-y[t]
p(p,n = mm max -
te [0,300] t'e[t,t+ioo] 2
where X[t'] and Y[t] denote values of s associated with given species at given time. The robustness value is maximized with respect to £' and minimized with respect to £, therefore, it uses maximal values of both X and Y. Consequently, this property can be interpreted as maximum population of prey being greater then maximum population of predators (restricted to given intervals).
Formula V>2 is based on the similar principle. While rejecting aberrant behaviour where population of one of the species drops below one individual, intuitively, it requires that there always is time in the future when population of prey can increase or decrease by 25 individuals, which is stated by the following subformula:
F[0,50] * (F[0,75] (X* - X > 25) A F[0i75] [X - X* > 25)) .
Therefore, V is satisfied when the difference between maximal and minimal prey population is greater than 50 and the associated robustness is proportional to this difference. Again, we have avoided use of the extreme property, which would adversely affect robustness value.
Results of this analysis are presented in Figure 7.19. Here, we should point out that small prey natality produced behaviour where predator population approached zero and period of oscillations was greatly increased. For such behaviour, intervals used in ipi and V>2 were shorter than one period.
Apparently, satisfaction of ip\ is not affected by predation rate. More interestingly, when prey natality increases, predator population exceeds that of prey (see Figure 7.19 (top)). Figure 7.19 (bottom) shows that amplitude of prey population oscillation is affected by both prey natality and predation rate.
7.6.3 Performance
Performance of robustness analysis is summarized in Table 7.3. All results have been obtained by executing the algorithm implementation on a 4 core
157
d \-1-1-1-1-1-1-1—1
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
prey natality
Figure 7.19: Robustness of predator-prey model with respect to Vi (up) and ip2 (bottom) for variable prey natality and predation rate. Robustness was positive in green points and negative in orange points. Darker colour represents greater absolute value of robustness.
2 GHz CPU with 4 GB RAM. Each computation has been arranged into 8 threads. For each analysis we have set an optimal resolution of the trajectories (number of simulated points). The number of simulated trajectories has been bounded by the number of refinement iterations in the Parasim parameter space sampling procedure.
It is worth noting that all analysed properties consist only of *F and *G operators for which the procedure is optimized by employing Lemire
158
queues in the same way as proposed in [149]. This is based on an optimal streaming algorithm for computing maxima (resp. minima) of a numerical sequence and allows to reduce the quadratic complexity wrt formula size to linear.
Property (model) formula size # signals # points per a signal time
2 has been slowed down due to insufficient memory.
7.7 Publications Summary
The case study presented in Section 7.2 has been published in [139]. The work has been initiated by Jiří Damborský (Faculty of Science, MU) and it makes a part of a larger synthetic biology project on which I have worked with a doctoral student Martin Demko. My contribution to the presented part of the work is in setting up the problem, formalising the workflow employing the parameter synthesis technology, adapting the techniques to the concrete problem, and assisting in individual steps of the underlying in silico experiments.
The problem studied in Section 7.3 has provided a continuous case study we have used for calibration of our techniques. The CTL-based results have been published in [82] (independent parameters) and [56] (dependent parameters). The CSL-based analysis has been published in [89]. My contribution is in setting up the problem, formalising the workflow employing the parameter synthesis/exploration technology, adapting the concrete problem to the technical framework, and guiding individual tasks of the conducted experiments.
The study presented in Section 7.4 has been published in [210]. The work makes a part of a larger systems biology project lead by Pavel Krejčí (Faculty of Medicine, MU) on which I have worked with a doctoral student Matej Hajnal. My contribution to the presented part of the work is in setting up the computational modelling problem, adapting the existing workflow to the settings of UCTL, and assisting in individual steps of the underlying in silico experiments.
159
The robustness analysis case study presented in Section 7.5 has been motivated by discussions with Ralf Steuer (Humboldt University Berlin) during the joint work on the CyanoTeam project. The work on the individual analysis tasks and results visualisation has been realised in collaboration with Milan Češka and Sven Dražan. My contribution is in setting up the problem, adapting the models to the settings of the used techniques, guiding individual steps of the underlying experiments, and interpreting the achieved results.
The analysis of population models described in Section 7.6 has been published in [90]. The work has been realised in collaboration with my master-level student Tomáš Vejpustek. My contribution is in setting up the problem and guiding individual steps of the underlying workflow.
7.8 Discussion
Most of the case studies are focused to application of a single selected technique targeting the nature of the model employed. However, the case study of G\/S cell cycle transition (Section 7.3) shows a combination of techniques. This can be done due to the fact that the model structure is formulated by means of a CRN and therefore the two different semantics provided by a cCRNM and a sCRNM can be both used. It can be seen that the stochastic model analysis gives refined insight into the dynamics that capture the modelled decision at the level of an individual cell. This feature is not captured in the deterministic continuous model. However, the resulting behaviour of the transient distribution is in agreement with the deterministic settings. Additionally, it can be seen that the computational demands are much more on the side of the parameterised uniformisation in spite of the manual reductions performed on the model.
In general, the parameterised uniformisation method is much more demanding than the coloured model checking procedure. This is due to the fact the problem is enormously more complex in the amount of quantitative information that needs to be processed while meeting a required level of precision.
Both methods - the coloured model checking as well as parameterised uniformisation - require a non-trivial amount of work provided by the user manually. This is easily seen namely in the case studies provided on larger models where several model reduction steps had to be done.
Additionally, the CMC employed for the metabolic network in Section 7.2 has shown a disadvantage of not supporting observable external variables, the so-called assignments, that aggregate the values captured in a current state. This feature would allow direct modelling of (and reasoning about) the metabolic burden. In the case of parameterised uniformisation the power of bounded-time fragment CSL has been extended with
160
Case Study Model Logic Problem # par am Time
Lotka-Volterra cCRNM STL* robustness 2 8-13 s (m-c)
SIR cCRNM STL* robustness 2 2-5 min (m-c)
Sus/Trans Sig. cINM UCTL synthesis 2 < 5 min (m-c)
d/S cRINM CTL synthesis 2 < 8 min (m-c)
TCP-biodeg. cRINM ACTL synthesis 3 < 2 hrs (m-c)
d/S sCRNM CSL exploration 2 4-8 hrs
elem. sig. sCRNM CSL+pp robustness 2 < 10 hrs
Table 7.4: Performance summary of the individual case studies. The times denoted 'm-c' were achieved with implementations utilising multi-core computing. They reflect optimal settings of the parallelisation. CSL+pp denotes usage of CSL with post-processing functions. The numbers of parameters show the maximal numbers of unknown parameters considered in a single experiment.
the utility of post-processing functions. They seem to be very important for such kind of analysis since they support the possibility of aggregating and observing the information captured in states that was missing in the metabolic pathway case study.
The robustness analysis of population models described in Section 7.6 has the purpose of demonstrating the features of the method rather than inferring new insights at the level of population modelling. It can be seen that a region separating the satisfying parameter valuations from unsatisfying ones has been identified in both population models. This method did not require any significant intervention from the side of the user. However, the models are small in size and they do not incorporate significantly different time-scales for which numerical simulation might become challenging.
Performance summary where the individual case studies are ordered by the computation times is shown in Table 7.4. Time intervals cover the minimal and maximal times achieved during particular experiments with a given model (e.g., varying the analysed formula).
Finally, it is important to mention that the presented experiments do not cover usage of LTL-based methods. Several examples using the algorithms presented in Section 4.2 are described in [244] (for dINMs) and in [33] (for cRINMs and dINMs). The models considered there are comparable in complexity and size to models presented in this chapter. In particular, in [33] we have analysed the bistability of Gi/S model by using LTL formulae. It is apparent that the analysis of bistability with CTL as presented in Section 7.3 gives more precise results due to the fact that branching-time operators and state-based approach can capture the bistability phenomena quite well. This cannot be achieved with the path-centric view of LTL. The computation times achieved with LTL are comparable to the case of CTL.
161
Conclusion
A concrete summary and a discussion of possible future work have been included into every chapter of the previous text. In this section, however, we briefly summarise the contents of the work once again and we also sum up and generalise the possible directions for future research.
7.9 Summary
In this thesis, we have addressed the problems of parameter synthesis and robustness analysis from the perspective of methods based on temporal logics. The models that have been considered throughout the thesis make a selection that reflects well the current standards used in systems biology.
First, we have considered several variants of continuous-time deterministic models used to model biological phenomena at the macroscopic level where the average behaviour of the population of species (concentration of molecules) is considered as representative. Regarding the parameter synthesis, the first group of techniques we have provided for these models works at the level of a finite discrete abstraction of the continuous dynamics. The quality of the results obtained thus depends on the quality of the abstraction. At this place, the particular shape of the model can also affect the extent of spurious results caused by the abstraction. However, regardless the quality of abstraction, the method is suitable for global parameter synthesis as it allows to capture compactly the continuum of parameter values and to provide guaranteed results for synthesised sets at the level of piece-wise multi-affine representations of the continuous models. However, due to the highly over-approximating characteristics of the abstraction we have not found the abstraction practicable to be used as a based for robustness analysis.
To address the problem of robustness for cBNMs, we have exploited the method of analytical evaluation of the robustness measure on finitely sampled bounded signals obtained for example by numerical simulation. Our method employs the value-freezing extension of signal temporal logic thus lifting the utility of robustness analysis to complex phenomena that cannot be formulated in the plain version of the logic. An important example of
162
such properties are various forms of oscillation that are very relevant in the settings of biological models. These results have a significant impact in the general field of cyber-physical systems and run-time verification.
Second, we have addressed continuous-time stochastic models that provide a detailed (mesoscopic) view of the molecular dynamics. In fact, the models at this level of abstraction fit well into the general framework of time-homogeneous continuous-time Markov processes. Being motivated by the inverse problem in biology in the context of these models, we have introduced a novel general method that is applicable to continuous-time Markov chains with parameterised transition rates. The method works with a limited fragment of the expressive continuous stochastic logic that suffices for typical cases of studying a given mechanism in such a detailed level. The quantitative nature of stochastic models providing probability as a well-defined measure has lead us to extend the method to robustness analysis of stochastic systems. Our implementation of robustness uses probability as the measure of systems performance quantifying a given analysed functionality.
Third, our method of parameter synthesis works well with discrete models of influence networks. Although in this thesis the main focus has been put to quantitative models, our work has shown that the technique of coloured model checking is highly-relevant for Boolean networks and other models fitting the framework of dINMs. These models are typically employed to reveal properties of interwoven non-linear interactions at a large-scale level. The challenge is thus to develop methods scalable for large models consisting of hundreds or thousands of species (e.g., E. coli consists of approximately 4500 different genes).
7.10 Future Work
Optimisation of Existing Techniques
An important task for future work is to improve the methods from two main perspectives. The first aspect is the functionality that has to give precise results or approximate results supplied with a numerically characterised error. The second aspect is the need to address the challenge of applicability of the methods in settings of large-scale models.
The first issue targets especially the framework of cBNMs where the quality of the obtained results strongly depends on settings of the approximation/ abstraction method. The methods work quite well in cases of untimed analysis of restricted classes of systems that are very close to piece-wise multi-affine dynamics. However, models with complicated non-linear dynamics with multidimensional regulation functions can be highly distorted by using the concept of PWMA approximation. More-
163
over, the abstraction method applied to the PWMA does not work well for liveness properties such as oscillations where the abstracted (and highly-overapproximated) timing aspects play a crucial role.
The issue of applicability to large-scale problems addresses the entire spectrum of considered model classes. In the case of cBNMs, the need for explicit representation of the rectangular abstraction limits the applicability to larger models. To that end, techniques combining global abstraction methods with (relatively scalable) methods over-approximating the local behaviour (flow-pipes [109, 25]) can be potentially investigated for that purpose. Moreover, the methods of ^-decision algorithms [187, 157] that have been significantly improved very recently, have also a good potential to be combined with global model abstraction techniques.
The method of parameterised uniformisation follows the typical problems of formal analysis of continuous-time stochastic models regarding the scalability. This is a serious issue even in the non-parametric case. Although several improvements to the parameter uniformisation method have been introduced in the follow-up work [103] including its parallelisa-tion [234], the method can be practically applied only for CTMCs with relatively low-dimensional spaces of unknown parameters with limited numbers of states. Combining the method with hybridisation and moment-closure methods addressed in Section 5.7 makes a research direction that has yet to be investigated.
A promising method to address the large-scale challenge for dINMs is static analysis. An important feature of static analysis is the ability to characterise behaviour of a complex network model just by looking into the network structure. We have already exploited several preliminary scenarios of static analysis that work well for reachability analysis of parameterised dINMs [248]. However, the challenge is to combine static analysis techniques with (coloured) model checking in order to bring better scalability to parameter synthesis methods by using the knowledge obtained from network structure.
Filling Existing Gaps
Several gaps remain regarding parameter synthesis and robustness analysis of models fitting in the modelling framework considered in this thesis. First, we have not exploited yet the LTL coloured model checking method with the SMT-encoded parameter sets. In spite of the fact that the method scales significantly better with interval-based encoding (as shown in Section 7.3.3), it is a valuable exercise to see how the explicit-state/symbolic-parameters representation method works with LTL. Second, the method of robustness analysis has not yet been considered in the settings of discrete abstraction of cBNMs. However, assuming there exists a quantitative measure of the error introduced during the approximation step, it would be in-
164
teresting to incorporate such quantity into the systems states and paths and allow quantitative interpretation of temporal logics on discrete abstractions up-to the given error. Third, in spite of the powerful utility of symbolic model checking, it seems valuable to bring the CTL-based coloured model checking technique to dINMs. Especially this is promising in combination with the static analysis methods discussed above.
Another aspect to be considered for future work is an extension of the entire parameter synthesis framework to the settings of hybrid logics and algorithms utilising analysis of attractors in terms of strongly connected components. These techniques seem to be highly relevant for improving the methods of digital bifurcation analysis already targeted from the perspective of CTL-based model checking and discrete abstractions of cBNMs in [58]. These methods need to be brought into the settings of dINMs and even cRNMs where they are not yet established at all.
Software Development and Methods Applications
Finally, there is a lot of improvement to be done on the side of prototype implementations of the discussed techniques. A challenge is to create a unified framework for all the model classes discussed in this thesis. The ability to automatically transfer from high-level models such as dINMs to the quantitative setting of cINMs is an example of model refinement that is highly relevant in synthetic biology and molecular programming. Methods allowing that are extensively studied in other research.
A valuable feedback for software development comes from applications of the techniques in systems biology workflows. A grand challenge is to deploy the developed software to the online comprehensive modelling platform currently developed for cyanobacteria models [245,353].
From the case studies presented in Chapter 7 it becomes evident that in most cases the methods and tools could not be performed fully automatically without any intervention from the user. For example, the approximation/ abstraction procedures of cBNMs have to be fine-tuned to allow good and correct approximation of the original model. Another example can be seen in Sections 7.3.3 and 7.5 where manual reductions and simplifications of the models were needed in order to allow tractability of the algorithms. Development of methods that in some cases will allow to (semi-)automa-tise such model reduction tasks will contribute to getting back to the basic philosophy behind model checking - a push-button technology working without the need of manual tasks requiring detailed knowledge of the underlying technology.
165
Bibliography
[1] H. Abbas, A. Rodionova, E. Bartocci, S. A. Smolka, and R. Grosu. Quantitative regular expressions for arrhythmia detection algorithms. In Computational Methods in Systems Biology (CMSB 2017), volume 10545 of Lecture Notes in Computer Science, pages 23-39. Springer, 2017.
[2] Z. Ahmed, D. Benque, S. Berezin, A. C. E. Dahl, J. Fisher, B. A. Hall, S. Ishtiaq, J. Nanavati, N. Piterman, M. Riechert, and N. Skoblov. Bringing ltl model checking to biologists. In Verification, Model Checking, and Abstract Interpretation, pages 1-13. Springer, 2017.
[3] U. Alon. Network motifs: theory and experimental approaches. Nature Reviews Genetics, 8(6):450, 2007.
[4] U. Alon, M. Surette, N. Barkai, and S. Leibler. Robustness in bacterial chemotaxis. Nature, 6715(397):168-171,1999.
[5] M. Althoff. Reachability analysis of nonlinear systems using conservative polynomialization and non-convex sets. In Hybrid Systems: Computation and Control (HSCC 2013), pages 173-182. ACM, 2013.
[6] M. Althoff and J. M. Dolan. Online verification of automated road vehicles using reachability analysis. IEEE Transactions on Robotics, 30(4):903-918,2014.
[7] R. Alur, C. Courcoubetis, and D. Dill. Model-checking in dense realtime. Information and Computation, 104:2-34,1993.
[8] R. Alur, T. Feder, and T. A. Henzinger. The benefits of relaxing punctuality. /. ACM, 43(1):116-146,1996.
[9] R. Alur and T. A. Henzinger. A really temporal logic. /. ACM, 41(l):181-203,1994.
[10] R. Alur, T. A. Henzinger, G. Lafferriere, and G. J. Pappas. Discrete abstractions of hybrid systems. Proceedings of the IEEE, 88(7):971-984, 2000.
166
[11] R. Alur, K. Mamouras, and D. Ulus. Derivatives of quantitative regular expressions. In Models, Algorithms, Logics and Tools - Essays Dedicated to Kim Guldstrand Larsen on the Occasion of His 60th Birthday, volume 10460, pages 75-95. Springer, 2017.
[12] A. Andreychenko, L. Mikeev, D. Spieler, and V. Wolf. Parameter Identification for Markov Models of Biochemical Reactions. In Computer Aided Verification (CAV 2011), Lecture Notes in Computer Science, pages 83-98. Springer, 2011.
[13] A. Andreychenko, L. Mikeev, and V. Wolf. Model reconstruction for moment-based stochastic chemical kinetics. ACM Trans. Model. Corn-put. Simul, 25(2):12:1-12:19, 2015.
[14] Y. S. R. Annapureddy, C. Liu, G. E. Fainekos, and S. Sankara-narayanan. S-TaLiRo: A Tool for Temporal Logic Falsification for Hybrid Systems. In Tools and Algorithms for the Construction and Analysis of Systems, volume 6605 of Lecture Notes in Computer Science, pages 254-257. Springer, 2011.
[15] M. Antoniotti, A. Policriti, N. Ugel, and B. Mishra. Model building and model checking for biochemical processes. Cell Biochemistry and Biophysics, 38:271-286,2003.
[16] C. Areces and B. ten Cate. Hybrid logics. In Handbook of Modal Logic. Elsevier, 2007.
[17] G. Arellano, J. Argil, E. Azpeitia, M. Benitez, M. Carrillo, P. Gongora, D. A. Rosenblueth, and E. R. Alvarez-Buylla. "Antelope": a hybrid-logic model checker for branching-time Boolean GRN analysis. BMC Bioinformatics, 12(1):1-15,2011.
[18] E. Asarin, T. Dang, and A. Girard. Hybridization methods for the analysis of nonlinear systems. Acta Informatica, 43:451-476,2007.
[19] E. Asarin, A. Donze, O. Maler, and D. Nickovic. Parametric identification of temporal properties. In Runtime Verification, pages 147-160. Springer, 2012.
[20] A. Aziz, K. Sanwal, V. Singhal, and R. Brayton. Verifying continuous time Markov chains. In Computer Aided Verification (CAV 1996), volume 1102 of Lecture Notes in Computer Science, pages 269-276. Springer, 1996.
[21] S. Baarir, M. Beccuti, D. Cerotti, M. De Pierro, S. Donatelli, and G. Franceschinis. The greatspn tool: recent enhancements. ACM SIGMETRICS Performance Evaluation Review, 36(4):4-9, 2009.
167
[22] M. Backenkohler, L. Bortolussi, and V. Wolf. Moment-based parameter estimation for stochastic reaction networks in equilibrium.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(4):1180-1192, 2018.
[23] C. Baier, B. Haverkort, H. Hermanns, and J.-P. Katoen. Model Checking Continuous-Time Markov Chains by Transient Analysis. In Computer Aided Verification (CAV 2000), volume 1855 of Lecture Notes in Computer Science, pages 358-372. Springer, 2000.
[24] C. Baier and J.-P. Katoen. Principles of Model Checking. MIT Press, 2008.
[25] S. Bak and M. Caccamo. Computing reachability for nonlinear systems with hycreate, 2013. HSCC 2013. Poster Presentation.
[26] A. Bakhirkin, T. Ferrěre, O. Maler, and D. Ulus. On the quantitative semantics of regular expressions over real-valued signals. In Formal Modeling and Analysis of Timed Systems, pages 189-206. Springer, 2017.
[27] P. Ballarini, M. Forlin, T. Mazza, and D. Prandi. Efficient parallel statistical model checking of biochemical networks. In Parallel and Distributed Methods in verification (PDMC), volume 14 of Electronic Proceedings in Theoretical Computer Science, pages 47-61, 2009.
[28] B. Barbot, M. Z. Kwiatkowska, A. Mereacre, and N. Paoletti. Estimation and verification of hybrid heart models for personalised medical and wearable devices. In Computational Methods in Systems Biology (CMSB 2015), volume 9308 of Lecture Notes in Computer Science, pages 3-7. Springer, 2015.
[29] R. Barbuti, G. Caravagna, A. Maggiolo-Schettini, P. Milazzo, and S. Tini. Foundational aspects of multiscale modeling of biological systems with process algebras. Theor. Comput. Sci., 431:96-116,2012.
[30] N. Barkai and S. Leibler. Robustness in simple biochemical networks. Nature, (387):913-917,1997.
[31] J. Barnat, N. Beneš, L. Brim, M. Demko, M. Hajnal, S. Pastva, and D. Šafránek. Detecting attractors in biological models with uncertain parameters. In Computational Methods in Systems Biology (CMSB 2017), volume 10545 of Lecture Notes in Computer Science, pages 40-56. Springer, 2017.
[32] J. Barnat, L. Brim, and J. Chaloupka. Parallel breadth-first search ltl model-checking. In Automated Software Engineering (ASE 2003), pages 106-115, 2003.
168
[33] J. Barnat, L. Brim, A. Krejčí, A. Streck, D. Šafránek, M. Vejnár, and T. Vejpustek. On Parameter Synthesis by Parallel Model Checking.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3):693-705, 2012.
[34] J. Barnat, L. Brim, and P. Ročkai. Divine multi-core — a parallel ltl model-checker. In Proceedings of the 6th International Symposium on Automated Technology for Verification and Analysis, ATVA '08, pages 234-239. Springer, 2008.
[35] J. Barnat, L. Brim, I. Černá, S. Dražan, J. Fabriková, J. Láník, D. Šafránek, and H. Ma. BioDiVinE: A Framework for Parallel Analysis of Biological Models. In Computational Models for Cell Processes (COMPMOD), volume 6 of Electronic Proceedings in Theoretical Computer Science, pages 31-45, 2009.
[36] J. Barnat, L. Brim, I. Černá, P. Moravec, P. Ročkai, and P. Šimeček. DiVinE - A Tool for Distributed Verification. In Computer Aided Verification (CAV 2006), volume 4144 of Lecture Notes in Computer Science, pages 278-281. Springer, 2006.
[37] J. Barnat, L. Brim, and D. Šafránek. High-Performance Analysis of Biological Systems Dynamics with the DiVinE Model Checker. Briefings in Bioinformatics, 11(3):301-312,2010.
[38] J. Barnat, L. Brim, D. Šafránek, and M. Vejnar. Parameter scanning by parallel model checking with applications in systems biology. In Parallel and Distributed Methods in Verification, 2010 Ninth International Workshop on, and High Performance Computational Systems Biology, Second International Workshop on, pages 95-104. IEEE, 2010.
[39] C. Barrett, P. Fontaine, and C. Tinelli. The SMT-LIB Standard: Version 2.5. Technical report, Department of Computer Science, The University of Iowa, 2015.
[40] E. Bartocci, L. Bortolussi, L. Nenzi, and G. Sanguinetti. On the Robustness of Temporal Properties for Stochastic Models. ArXiv e-prints, 2013.
[41] E. Bartocci, L. Bortolussi, and G. Sanguinetti. Data-driven statistical learning of temporal logic properties. In Formal Modeling and Analysis of Timed Systems, pages 23-37. Springer, 2014.
[42] E. Bartocci, F. Corradini, M. R. Di Berardini, E. Merelli, and L. Te-sei. Shape calculus, a spatial mobile calculus for 3d shapes. Scientific Annals of Computer Science, 20:1, 2010.
169
[43] E. Bartocci, F. Corradini, E. Merelli, and L. Tesei. Detecting synchronisation of biological oscillators by model checking. Theoretical Computer Science, 411(20):1999 - 2018, 2010.
[44] E. Bartocci, J. Deshmukh, A. Donzé, G. Fainekos, O. Maler, D. Ničkovič, and S. Sankaranarayanan. Specification-Based Monitoring of Cyber-Physical Systems: A Survey on Theory, Tools and Applications, pages 135-175. Springer, 2018.
[45] E. Bartocci and P. Lio. Computational modeling, formal analysis, and tools for systems biology. PLOS Computational Biology, 12(l):l-22, 2016.
[46] E. Bartocci, P. Liö, E. Merelli, and N. Paoletti. Multiple verification in complex biological systems: The bone remodelling case study. In Transactions on Computational Systems Biology XIV, pages 53-76. Springer, 2012.
[47] E. Batchelor and M. Goulian. Robustness and the cycle of phosphorylation and dephosphorylation in a two-component regulatory system. Proceedings of the National Academy of Sciences, 100(2):691-696, 2003.
[48] D. G. Bates and C. Cosentino. Validation and invalidation of systems biology models using robustness analysis. IET Systems Biology, 5(4):229-244, 2011.
[49] G. Batt, C. Belta, and R. Weiss. Model checking genetic regulatory networks with parameter uncertainty. In Hybrid Systems: Computation and Control (HSCC), volume 4416 of Lecture Notes in Computer Science, pages 61-75. Springer, 2007.
[50] G. Batt, D. Bergamini, H. de Jong, H. Garavel, and R. Mateescu. Model checking genetic regulatory networks using gna and cadp. In Model Checking Software, pages 158-163. Springer, 2004.
[51] G. Batt, M. Page, I. Cantone, G. Gössler, P. Monteiro, and H. de Jong. Efficient parameter search for qualitative models of regulatory networks using symbolic model checking. Bioinformatics, 26(18):603-610, 2010.
[52] G. Batt, D. Ropers, H. de Jong, J. Geiselmann, R. Mateescu, M. Page, and D. Schneider. Validation of qualitative models of genetic regulatory networks by model checking: analysis of the nutritional stress response in escherichia coli. Bioinformatics, 21(suppl_l):il9-i28, 2005.
170
[53] G. Batt, R. B. Salah, and O. Maler. On timed models of gene networks. In Formal Modeling and Analysis of Timed Systems (FORMATS), Lecture Notes in Computer Science, pages 38-52. Springer, 2007.
[54] G. Batt, B. Yordanov, R. Weiss, and C. Belta. Robustness analysis and tuning of synthetic gene networks. Bioinformatics, 23(18):2415-2422, 2007.
[55] C. Belta and L. Habets. Controlling a class of nonlinear systems on rectangles. IEEE Transactions on Automatic Control, 51(11):1749-1759, 2006.
[56] N. Beneš, L. Brim, M. Demko, S. Pastva, and D. Šafránek. Parallel SMT-based parameter synthesis with application to piecewise multi-affine systems. In ATVA, volume 9936 of Lecture Notes in Computer Science, pages 1-17. Springer, 2016.
[57] N. Beneš, L. Brim, M. Demko, S. Pastva, and D. Šafránek. Pithya: A parallel tool for parameter synthesis of piecewise multi-affine dynamical systems. In Computer Aided Verification (CAV 2017), pages 591-598. Springer, 2017.
[58] N. Beneš, L. Brim, M. Demko, S. Pastva, and D. Šafránek. A model checking approach to discrete bifurcation analysis. In FM 2016, volume 9995 of Lecture Notes in Computer Science, pages 85-101. Springer, 2016.
[59] D. Benque, S. Bourton, C. Cockerton, B. Cook, J. Fisher, S. Ishtiaq, N. Piterman, A. Taylor, and M. Y. Vardi. Bma: Visual tool for modeling and analyzing biological networks. In Computer Aided Verification (CAV 2012), pages 686-692. Springer, 2012.
[60] S. Berman, A. Halász, and V. Kumar. MARCO: a reachability algorithm for multi-affine systems with applications to biological systems. In Hybrid Systems: Computation and Control (HSCC 2007), Lecture Notes in Computer Science, pages 76-89. Springer, 2007.
[61] F. Bernardini, C. Biggs, J. Derrick, M. Gheorghe, M. Niranjan, and G. Sanguinetti. Parameter Estimation and Model Checking in a Model of Prokaryotic Autoregulation. Technical report, University of Sheffield, 2007.
[62] G. Bernot, J.-P. Comet, A. Richard, and J. Guespin. Application of formal methods to biological regulatory networks: extending thomas asynchronous logical approach with temporal logic. Journal of Theoretical Biology, 229(3):339-347,2004.
171
[63] M. L. Blinov, J. R. Faeder, B. Goldstein, and W. S. Hlavacek. Bionet-gen: software for rule-based modeling of signal transduction based on the interactions of molecular domains. Bioinformatics, 20(17):3289-3291, 2004.
[64] R. Blossey, L. Cardelli, and A. Phillips. A compositional approach to the stochastic dynamics of gene networks. In Transactions on Computational Systems Biology IV, pages 99-122. Springer, 2006.
[65] E. Boczko, W. D. Kalies, and K. Mischaikow. Polygonal approximation of flows. Topology and its Applications, 154(13):2501 - 2520, 2007.
[66] S. Bogomolov, M. Giacobbe, T. A. Henzinger, and H. Kong. Conic abstractions for hybrid systems. In FORMATS 2017, pages 116-132. Springer, 2017.
[67] S. Bogomolov, C. Schilling, E. Bartocci, G. Batt, H. Kong, and R. Grosu. Abstraction-based parameter synthesis for multiaffine systems. In Hardware and Software: Verification and Testing, volume 9434 of Lecture Notes in Computer Science, pages 19-35. Springer, 2015.
[68] G. Bombara, C.-I. Vasile, F. Penedo, H. Yasuoka, and C. Belta. A decision tree approach to data classification using signal temporal logic. In Hybrid Systems: Computation and Control (HSCC 2016), HSCC 2016, pages 1-10. ACM, 2016.
[69] B. Bonet, P. Haslum, S. Hickmott, and S. Thiebaux. Directed Unfolding of Petri Nets, pages 172-198. Springer, 2008.
[70] N. Bonzanni, E. Krepska, K. A. Feenstra, W. Fokkink, T. Kielmann, H. E. Bal, and J. Heringa. Executing multicellular differentiation: quantitative predictive modelling of C.elegans vulval development. Bioinformatics, 25(16):2049-2056, 2009.
[71] L. Bortolussi, L. Cardelli, M. Kwiatkowska, and L. Laurenti. Approximation of probabilistic reachability for chemical reaction networks using the linear noise approximation. In Quantitative Evaluation of Systems (QEST 2016), volume 9826, pages 72-88. Springer, 2016.
[72] L. Bortolussi, D. Milios, and G. Sanguinetti. U-check: Model checking and parameter synthesis under uncertainty. In Quantitative Evaluation of Systems (QEST 2015), pages 89-104. Springer, 2015.
[73] L. Bortolussi, D. Milios, and G. Sanguinetti. Smoothed model checking for uncertain continuous-time markov chains. Information and Computation, 247:235 - 253, 2016.
172
[74] L. Bortolussi and R. Paškauskas. Mean-field approximation and quasi-equilibrium reduction of markov population models. In Quantitative Evaluation of Systems (QEST 2014), volume 8657 of Lecture Notes in Computer Science, pages 106-121. Springer, 2014.
[75] L. Bortolussi and A. Policriti. Hybrid systems and biology. In Formal Methods for Computational Systems Biology, 8th International School on Formal Methods for the Design of Computer, Communication, and Software Systems, SFM 2008, Lecture Notes in Computer Science, pages 424-448. Springer, 2008.
[76] L. Bortolussi and A. Policriti. Modeling biological systems in stochastic concurrent constraint programming. Constraints, 13(l-2):66-90, 2008.
[77] D. Bosnacki, H. M. M. ten Eikelder, M. N. Steijaert, and E. P. de Vink. Stochastic analysis of amino acid substitution in protein synthesis. In Computational Methods in Systems Biology (CMSB 2008), pages 367-386, 2008.
[78] P. Boutillier, M. Maasha, X. Li, H. F. Medina-Abarca, J. Krivine, J. Feret, I. Cristescu, A. G. Forbes, and W. Fontána. The kappa platform for rule-based modeling. Bioinformatics, 34(13):i583-i592, 2018.
[79] L. Brim, N. Beneš, D. J., S. Pastva, and D. Šafránek. Facetal abstraction of non-linear dynamical systems, 2018. Submitted.
[80] L. Brim, I. Černá, P. Krčál, and R. Pelánek. Distributed ltl model checking based on negative cycle detection. In FST TCS 2001: Foundations of Software Technology and Theoretical Computer Science, pages 96-107. Springer, 2001.
[81] L. Brim, I. Černá, P. Moravec, and J. Simša. Accepting predecessors are better than back edges in distributed LTL model-checking. In
Formal Methods in Computer Aided Design (FMCAD), volume 4144 of Lecture Notes in Computer Science, pages 352-366. Springer, 2004.
[82] L. Brim, M. Češka, M. Demko, S. Pastva, and D. Šafránek. Parameter synthesis by parallel coloured CTL model checking. In CMSB, volume 9308 of Lecture Notes in Computer Science, pages 251-263. Springer, 2015.
[83] L. Brim, M. Češka, S. Dražan, and D. Šafránek. On Robustness Analysis of Stochastic Biochemical Systems by Probabilistic Model Checking. ArXiv e-prints, 2013.
173
[84] L. Brim, M. Češka, and D. Šafránek. Model checking of biological systems. In Formal Methods for Dynamical Systems, pages 63-112. Springer, 2013.
[85] L. Brim, M. Demko, S. Pastva, and D. Šafránek. High-performance discrete bifurcation analysis for piecewise-affine dynamical systems. In Hybrid Systems Biology, pages 58-74. Springer, 2015.
[86] L. Brim, P. Dluhoš, D. Šafránek, and T. Vejpustek. STL*: Extending signal temporal logic with signal-value freezing operator. Information and Computation, 236:52-67, 2014.
[87] L. Brim, J. Fabriková, S. Drazan, and D. Šafránek. On approximative reachability analysis of biochemical dynamical systems. Trans. Computational Systems Biology, 7625(14):77-101, 2012.
[88] L. Brim, J. Nižnan, and D. Šafránek. Compact representation of photosynthesis dynamics by rule-based models. Electronic Notes in Theoretical Computer Science, 316:17-27,2015.
[89] L. Brim, M. Češka, S. Dražan, and D. Šafránek. Exploring parameter space of stochastic biochemical systems using quantitative model checking. In Computer Aided Verification (CAV 2013), volume 8044 of Lecture Notes in Computer Science, pages 107-123. Springer, 2013.
[90] L. Brim, T. Vejpustek, D. Šafránek, and J. Fabriková. Robustness analysis for value-freezing signal temporal logic. arXiv preprint arXiv.1309.0867, 2013.
[91] L. Brim, K. Yorav, and J. Zídková. Assumption-based distribution of ctl model checking. International journal on Software Tools for Technology Transfer, 7(l):61-73,2005.
[92] M. Calder, V. Vyshemirsky, D. Gilbert, and R. Orton. Analysis of signalling pathways using continuous time markov chains. In Transactions on Computational Systems Biology VI, volume 4220 of Lecture Notes in Computer Science, pages 44-67. Springer, 2006.
[93] L. Calzone, N. Chabrier-Rivier, F. Fages, and S. Soliman. Machine learning biochemical networks from temporal logic properties. In Transactions on Computational Systems Biology VI, Lecture Notes in Computer Science, pages 68-94. Springer, 2006.
[94] L. Calzone, F. Fages, and S. Soliman. BIOCHAM: an environment for modeling biological systems and formalizing experimental knowledge. Bioinformatics, 22(14):1805-1807, 2006.
174
[95] D. Campagna and C. Piazza. Hybrid automata in systems biology: How far can we go? In From Biology to Concurrency and Back (FBTC), volume 229 of Electronic Notes in Theoretical Computer Science, pages 93 -108, 2009.
[96] F. Camporesi, J. Feret, and K. Q. Lý. Kade: A tool to compile kappa rules into (reduced) ODE models. In Computational Methods in Systems Biology (CMSB 2017), volume 10545 of Lecture Notes in Computer Science, pages 291-299. Springer, 2017.
[97] G. Caravagna and J. Hillston. Modeling biological systems with delays in Bio-PEPA. In Proceedings Fourth Workshop on Membrane Computing and Biologically Inspired Process Calculi 2010, volume 40 of Electronic Proceedings in Theoretical Computer Science, pages 85-101, 2010.
[98] L. Cardelli. Brane calculi. In Computational Methods in Systems Biology (CMSB 2005), pages 257-278. Springer, 2005.
[99] L. Cardelli. Morphisms of reaction networks that couple structure to function. BMC Systems Biology, 8(1):84, 2014.
[100] L. Cardelli, M. Češka, M. Fránzle, M. Kwiatkowska, L. Laurenti, N. Paoletti, and M. Whitby. Syntax-Guided Optimal Synthesis for Chemical Reaction Networks, pages 375-395. Springer, 2017.
[101] R. Carter and E. M. Navarro-López. Dynamically-driven timed automaton abstractions for proving liveness of continuous systems. In FORMATS 2012, volume 7595 of Lecture Notes in Computer Science, pages 59-74. Springer, 2012.
[102] A. Casagrande and C. Piazza. Unwinding biological systems. Theoretical Computer Science, 587:26 - 48, 2015.
[103] M. Češka, P. Pilař, N. Paoletti, L. Brim, and M. Kwiatkowska. Prism-psy: Precise gpu-accelerated parameter synthesis for stochastic systems. In Tools and Algorithms for the Construction and Analysis of Systems, pages 367-384. Springer, 2016.
[104] A. Champneys and K. Tsaneva-Atanasova. Dynamical Systems Theory, Bifurcation Analysis, pages 632-637. Springer New York, 2013.
[105] C. Chaouiya. Petri net modelling of biological networks. Briefings in Bioinformatics, 8(4):210-219, 2007.
[106] C. Chaouiya, E. Remy, B. Mossé, and D. Thieffry. Qualitative analysis of regulatory graphs: A computational tool based on a discrete formal framework. In Positive Systems, volume 294 of LNCIS, pages 830-832. Springer, 2003.
175
[107] T. Chatain, S. Haar, L. Jezequel, L. Paulevé, and S. Schwoon. Characterization of reachable attractors using petri net unfoldings. In Computational Methods in Systems Biology (CMSB 2014), volume 8859 of Lecture Notes in Computer Science, pages 129-142. Springer, 2014.
[108] T. Chatain, S. Haar, and L. Paulevé. Boolean networks: Beyond generalized asynchronicity. In Cellular Automata and Discrete Complex Systems, pages 29-42. Springer, 2018.
[109] X. Chen, E. Abrahám, and S. Sankaranarayanan. Flow*: An analyzer for non-linear hybrid systems. In Computer Aided Verification (CAV 2013), volume 8044 of Lecture Notes in Computer Science, pages 258-263. Springer, 2013.
[110] A. Chutinan and B. H. Krogh. Computational techniques for hybrid system verification. IEEE Transactions on Automatic Control, 48(1):64-75, 2003.
[Ill] A. Cimatti, E. Clarke, E. Giunchiglia, F. Giunchiglia, M. Pistore, M. Roveri, R. Sebastiani, and A. Tacchella. NuSMV 2: An Open-Source Tool for Symbolic Model Checking. In Computer Aided Verification (CAV 2002), volume 2404 of Lecture Notes in Computer Science, pages 359-364. Springer, 2002.
[112] A. Cimatti, E. Clarke, F. Giunchiglia, and M. Roveri. NuSMV: a new symbolic model checker. /. Softw. Tools Technol. Transf., 2(410-425), 2000.
[113] F. Ciocchetta, S. Gilmore, M. L. Guerriero, and J. Hillston. Integrated simulation and model-checking for the analysis of biochemical systems. Electronic Notes in Theoretical Computer Science, 232:17-38,2009.
[114] F. Ciocchetta and J. Hillston. Bio-PEPA: A framework for the modelling and analysis of biological systems. Theor. Comput. Set, 410(33-34):3065-3084, 2009.
[115] E. Clarke, O. Grumberg, and D. Peled. Model Checking. MIT Press, 2000.
[116] E. Clarke and P. Zuliani. Statistical Model Checking for Cyber-Physical Systems. In Automated Technology for Verification and Analysis (ATVA), volume 6996 of Lecture Notes in Computer Science, pages 1-12. Springer, 2011.
[117] E. M. Clarke, E. A. Emerson, and A. P. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Trans. Program. Lang. Syst., 8(2):244-263,1986.
176
[118] F.H.Clarke. Optimization and Nonsmooth Analysis. S.I.A.M., 1983.
[119] P. Collins and A. Goldsztejn. The reach-and-evolve algorithm for reachability analysis of nonlinear dynamical systems. Electronic Notes in Theoretical Computer Science, 223:87 - 102,2008.
[120] P. Collins, L. C. Habets, J. H. van Schuppen, I. Černá, J. Fabriková, and D. Šafránek. Abstraction of biochemical reaction systems on polytopes. In Proceedings of the 18th IFAC World Congress, volume 18, pages 14869-14875, 2011.
[121] G. Craciun, Y. Tang, and M. Feinberg. Understanding bistability in complex enzyme-driven reaction networks. Proceedings of the National Academy of Sciences, 103(23):8697-8702,2006.
[122] A. Crudu, A. Debussche, and O. Radulescu. Hybrid stochastic simplifications for multiscale gene networks. BMC Systems Biology, 3(1):89, 2009.
[123] P. Cull. Difference equations as biological models. Sci. Math. Jpn., (2):217-233, 2006.
[124] B. Daigle, M. Roh, L. Petzold, and J. Niemi. Accelerated Maximum Likelihood Parameter Estimation for Stochastic Biochemical Systems. BMC Bioinformatics, 13(1):68-71,2012.
[125] A. E. Dalsgaard, A. Laarman, K. G. Larsen, M. C. Olesen, and J. van de Pol. Multi-core reachability for timed automata. In Formal Modeling and Analysis of Timed Systems, pages 91-106. Springer, 2012.
[126] T. Dang, A. Donzé, O. Maler, and N. Shalev. Sensitive state-space exploration. In IEEE Conference on Decision and Control, pages 4049-4054,2008.
[127] T. Dang, T. Dreossi, and C. Piazza. Parameter synthesis through temporal logic specifications. In FM 2015: Formal Methods - 20th International Symposium, Oslo, Norway, June 24-26, 2015, Proceedings, volume 9109 of Lecture Notes in Computer Science, pages 213-230. Springer, 2015.
[128] T. Dang, C. L. Guernic, and O. Maler. Computing reachable states for nonlinear biological models. Theor. Comput. Sci., 412(21):2095-2107, 2011.
[129] T. Dang, C. Le Guernic, and O. Maler. Computing reachable states for nonlinear biological models. In CMSB 2009, volume 5688 of Lecture Notes in Computer Science, pages 126-141. Springer, 2009.
177
[130] V. Danos and C. Laneve. Formal molecular biology. Theor. Comput. Scz.,325(l):69-110, 2004.
[131] R. Darling and J. Norris. Differential equation approximations for markov chains. Probab. Surveys, 5:37-79,2008.
[132] A. Darlington, J. Kim, and D. Bates. Robustness analysis of a synthetic translational resource allocation controller. IEEE Control Systems Letters, August 2018.
[133] A. David, D. Du, K. G. Larsen, A. Legay, M. Mikucionis, D. B. Poulsen, and S. Sedwards. Statistical model checking for stochastic hybrid systems. In Hybrid Systems and Biology (HSB 2012), volume 92 of Electronic Proceedings in Theoretical Computer Science, pages 122-136, 2012.
[134] H. de Jong. Modeling and Simulation of Genetic Regulatory Systems: A Literature Review. Journal of Computational Biology, 9(1):67-103, 2002.
[135] H. de Jong, J. Gouzé, C. Hernandez, M. Page, T. Sari, and J. Geiselmann. Qualitative simulations of genetic regulatory networks using piecewise linear models. Bull. Math. Biol, 66:301-340,2004.
[136] T. Děd, D. Šafránek, M. Troják, M. Klement, J. Salagovič, and L. Brim. Formal biochemical space with semantics in kappa and bngl. Electronic Notes in Theoretical Computer Science, 326:27-49,2016.
[137] C. Dehnert, S. Junges, J.-P. Katoen, and M. Volk. A storm is coming: A modern probabilistic model checker. In Computer Aided Verification (CAV 2017), pages 592-600. Springer, 2017.
[138] L. Dematté, C. Priami, and A. Romanel. The beta workbench: a computational tool to study the dynamics of biological systems. Briefings in Bioinformatics, 9(5):437-449, 2008.
[139] M. Demko, N. Beneš, L. Brim, S. Pastva, and D. Šafránek. High-performance symbolic parameter synthesis of biological models: A case study. In CMSB 2016, volume 9859 of Lecture Notes in Computer Science, pages 82-97. Springer, 2016.
[140] J. V. Deshmukh, A. Donzé, S. Ghosh, X. Jin, G. Juniwal, and S. A. Seshia. Robust online monitoring of signal temporal logic. In Runtime Verification, pages 55-70. Springer, 2015.
[141] J. V. Deshmukh, A. Donzé, S. Ghosh, X. Jin, G. Juniwal, and S. A. Seshia. Robust online monitoring of signal temporal logic. Formal Methods in System Design, 51(l):5-30, 2017.
178
[142] F. Didier, T. A. Henzinger, M. Mateescu, and V. Wolf. Fast Adaptive Uniformization for the Chemical Master Equation. In Parallel and Distributed Methods in Verification and High Performance Computational Systems Biology (HiBi/PDMC 2009), pages 118-127. IEEE Computer Society, 2009.
[143] P. Dluhoš, L. Brim, and D. Safránek. On expressing and monitoring oscillatory dynamics. In Hybrid Systems and Biology (HSB 2012), volume 92 of Electronic Proceedings in Theoretical Computer Science, pages 73-87, 2012.
[144] A. Doi, S. Fujita, H. Matsuno, M. Nagasaki, and S. Miyano. Constructing Biological Pathway Models with Hybrid Functional Petri Nets. In Silico Biology, 4(3):271-291,2004.
[145] A. Donzé. Breach, A Toolbox for Verification and Parameter Synthesis of Hybrid Systems. In Computer Aided Verification (CAV 2010), volume 6174 of Lecture Notes in Computer Science, pages 167-170. Springer, 2010.
[146] A. Donzé, G. Clermont, and C. Langmead. Parameter synthesis in nonlinear dynamical systems: Application to systems biology. Journal of Computational Biology, 17(3):325-336, 2010.
[147] A. Donzé, G. Clermont, A. Legay, and C. J. Langmead. Parameter synthesis in nonlinear dynamical systems: Application to systems biology. In Research in Computational Molecular Biology, pages 155-169. Springer, 2009.
[148] A. Donzé, E. Fanchon, L. M. Gattepaille, O. Maler, and P. Tracqui. Robustness analysis and behavior discrimination in enzymatic reaction networks. PLoS ONE, 6(9):e24246, 2011.
[149] A. Donzé, T. Ferrére, and O. Maler. Efficient robust monitoring for stl. In Computer Aided Verification (CAV 2013), volume 8044 of Lecture Notes in Computer Science, pages 264-279. Springer, 2013.
[150] A. Donzé and O. Maler. Systematic simulation using sensitivity analysis. In Hybrid Systems: Computation and Control, pages 174-189. Springer, 2007.
[151] A. Donzé and O. Maler. Robust satisfaction of temporal logic over real-valued signals. In Formal Modeling and Analysis of Timed Systems, volume 6246 of Lecture Notes in Computer Science, pages 92-106. Springer, 2010.
179
[152] T. Dreossi. Sapo: Reachability computation and parameter synthesis of polynomial dynamical systems. In Hybrid Systems: Computation and Control (HSCC 2017), pages 29-34. ACM, 2017.
[153] T. Dreossi, C. Piazza, and T. Dang. Reachability Computation and Parameter Synthesis for Polynomial Dynamical Systems. PhD thesis, PhD thesis, 2016.
[154] D. Drusinsky. Monitoring temporal rules combined with time series. In Computer Aided Verification (CAV 2003), pages 114-117. Springer, 2003.
[155] P. Dvorak. Engineering of the synthetic metabolic pathway for biodegrada-tion of environmental pollutant. PhD thesis, Masaryk University, 2014.
[156] A. Eggers, M. Franzle, and C. Herde. SAT Modulo ODE: A Direct SAT Approach to Hybrid Systems. In ATVA 2008, pages 171-185. Springer, 2008.
[157] A. Eggers, N. Ramdani, N. S. Nedialkov, and M. Franzle. Improving the SAT modulo ODE approach to hybrid systems analysis by combining different enclosure methods. Software and System Modeling, 14(1):121-148, 2015.
[158] S. Eker, M. Knapp, K. Laderoute, P. Lincoln, J. Meseguer, and K. Son-mez. Pathway logic: Symbolic analysis of biological signaling. In Pacific Symposium on Biocomputing, pages 400-412,2002.
[159] H. El Samad, M. Khammash, L. Petzold, and D. Gillespie. Stochastic Modelling of Gene Regulatory Networks. Int. }. of Robust and Nonlinear Control, 15(15):691-711, 2005.
[160] H. W. Engl, C. Flamm, P. Kugler, J. Lu, S. Muller, and P. Schuster. Inverse problems in systems biology. Inverse Problems, 25(12):123014, 2009.
[161] J. Faeder, M. Blinov, and W. Hlavacek. Rule-based modeling of biochemical systems with bionetgen. Methods Mol Biol., 500:113-167, 2009.
[162] F. Fages and A. Rizk. On the analysis of numerical data time series in temporal logic. In Computational Methods in Systems Biology (CMSB 2007), pages 48-63. Springer, 2007.
[163] F. Fages and A. Rizk. On temporal logic constraint solving for analyzing numerical data time series. Theor. Comput. Sci., 408(l):55-65, 2008.
180
[164] F. Fages and A. Rizk. From model-checking to temporal logic constraint solving. In Principles and Practice of Constraint Programming -CP 2009, pages 319-334. Springer, 2009.
[165] F. Fages and S. Soliman. Formal cell biology in Biocham. In 8th International School on Formal Methods for the Design of Computer, Communication and Software Systems: Computational Systems Biology SFM08, volume 5016, pages 54-80, 2008.
[166] F. Fages and S. Soliman. On robustness computation and optimization inbiocham-4. In Computational Methods in Systems Biology (CMSB 2018), pages 292-299. Springer, 2018.
[167] F. Fages, S. Soliman, and N. Chabrier-Rivier. Modelling and querying interaction networks in the biochemical abstract machine biocham. Journal of Biological Physics and Chemistry, 4(2):64-73,2004.
[168] G. Fainekos and G. Pappas. Robustness of temporal logic specifications for continuous-time signals. Theoretical Computer Science, 410(42):4262-4291, 2009.
[169] M. Feinberg and F. J. Horn. Dynamics of open chemical systems and the algebraic structure of the underlying reaction network. Chemical Engineering Science, 29(3):775 - 787,1974.
[170] J. Feret and K. Q. Ly. Local traces: An over-approximation of the behavior of the proteins in rule-based models. IEEE/ACM Trans. Corn-put. Biology Bioinform., 15(4):1124-1137, 2018.
[171] J. Feret and K. Q. Ly. Reachability analysis via orthogonal sets of patterns. Electr. Notes Theor. Comput. Sci., 335:27-48, 2018.
[172] J. Fisher, D. Harel, and T. A. Henzinger. Biology as reactivity. Communications of the ACM, 54(10):72-82, 2011.
[173] J. Fisher and T. Henzinger. Executable cell biology. Nature biotechnology, 25(11):1239-1249, 2007.
[174] J. Fisher and N. Piterman. Model Checking in Biology, pages 255-279. Springer, 2014.
[175] L. F. Fitime, O. Roux, C. Guziolowski, and L. Pauleve. Identification of bifurcation transitions in biological regulatory networks using answer-set programming. Algorithms Mol Biol, 12(19), 2017.
[176] B. L. Fox and P. W. Glynn. Computing Poisson Probabilities. CACM, 31(4):440^45,1988.
181
[177] G. Frehse, S. Jha, and B. Krogh. A counterexample-guided approach to parameter synthesis for linear hybrid automata. In Hybrid Systems: Computation and Control (HSCC 2008), volume 4981 of Lecture Notes in Computer Science, pages 187-200. Springer, 2008.
[178] G. Frehse, C. Le Guernic, A. Donze, S. Cotton, R. Ray, O. Lebeltel, R. Ripado, A. Girard, T. Dang, and O. Maler. SpaceEx: Scalable Verification of Hybrid Systems. In Computer Aided Verification (CAV 2011), pages 379-395. Springer, 2011.
[179] F. Fröhlich, F. Theis, and J. Hasenauer. Uncertainty analysis for non-identifiable dynamical systems: Profile likelihoods, bootstrapping and more. In CMBS, volume 8859 of Lecture Notes in Computer Science, pages 61-72. Springer, 2014.
[180] F. Frölich, P. Thomas, A. Kazeroonian, F. J. Theis, R. Grima, and J. Hasenauer. Inference for stochastic chemical kinetics using moment equations and system size expansion. PLoS Computational Biology, 12:1-28, 2016.
[181] J. Fromentin, D. Eveillard, and O. Roux. Hybrid modeling of biological networks: mixing temporal and qualitative biological properties. BMC Systems Biology, 4(1):79, 2010.
[182] A. Funahashi, M. Morohashi, H. Kitano, and N. Tanimura. Celldesigner: a process diagram editor for gene-regulatory and biochemical networks. BIOSILICO, 1(5):159 - 162, 2003.
[183] A. Gabor and J. R. Banga. Improved parameter estimation in kinetic models: Selection and tuning of regularization methods. In CMSB 2014, volume 8859 of Lecture Notes in Computer Science, pages 45-60. Springer, 2014.
[184] E. Gallet, M. Manceny, P. Le Gall, and P. Ballarini. Formal Methods and Software Engineering: 16th International Conference on Formal Engineering Methods, ICFEM 2014, Luxembourg, Luxembourg, November 3-5, 2014. Proceedings, chapter An LTL Model Checking Approach for Biological Parameter Inference, pages 155-170. Springer, 2014.
[185] V. Galpin, J. Hillston, and L. Bortolussi. HYPE Applied to the Modelling of Hybrid Biological Systems. Electronic Notes in Theoretical Computer Science, 218:33-51,2008.
[186] S. Gao, J. Avigad, and E. M. Clarke, ^-complete decision procedures for satisfiabilityoverthe reals. In Automated Reasoning, pages 286-300. Springer, 2012.
182
[187] S. Gao, S. Kong, and E. M. Clarke. dReal: An SMT solver for nonlinear theories over the reals. In CADE-24, volume 7898 of Lecture Notes in Computer Science, pages 208-214. Springer, 2013.
[188] H. Garavel, F. Lang, and R. Mateescu. CADP 2006: A toolbox for the construction and analysis of distributed processes. In Computer Aided Verification (CAV 2007), volume 4590 of Lecture Notes in Computer Science, pages 153-163. Springer, 2007.
[189] H. Gamier and L. Wang. Identification of Continuous-time Models from Sampled Data. Springer, 1st edition, 2008.
[190] S. Gay, F. Fages, T. Martinez, S. Soliman, and C. Solnon. On the subgraph epimorphism problem. Discrete Applied Mathematics, 162:214-228, 2014.
[191] S. Gay, S. Soliman, and F. Fages. A graphical method for reducing and relating models in systems biology. Bioinformatics, 26(18):i575-i581, 2010.
[192] M. Giacobbe, C. C. Guet, A. Gupta, T. A. Henzinger, T. Paixao, and T. Petrov. Model checking the evolution of gene regulatory networks. Acta Inf., 54(8):765-787,2017.
[193] D. Gilbert, R. Breitling, M. Heiner, and R. Donaldson. An introduction to biomodel engineering, illustrated for signal transduction pathways. In Membrane Computing, volume 5391 of Lecture Notes in Computer Science, pages 13-28. Springer, 2009.
[194] D. Gilbert, M. Heiner, F. Liu, and N. Saunders. Colouring space -a coloured framework for spatial modelling in systems biology. In Application and Theory of Petri Nets and Concurrency, pages 230-249. Springer, 2013.
[195] D. T. Gillespie. Exact Stochastic Simulation of Coupled Chemical Reactions. Journal of Physical Chemistry, 81(25):2340-2381,1977.
[196] D. T. Gillespie. A rigorous derivation of the chemical master equation. Physica A: Statistical Mechanics and its Applications, 188(13):404 -425,1992.
[197] D. T. Gillespie. Stochastic Simulation of Chemical Kinetics. Annual Review of Physical Chemistry, 58(l):35-55, 2007.
[198] A. Girard, C. Le Guernic, and O. Maler. Efficient computation of reachable sets of linear time-invariant systems with inputs. In Hybrid Systems: Computation and Control (HSCC 2006), volume 3927 of Lecture Notes in Computer Science, pages 257-271. Springer, 2006.
183
[199] S. V. Goethem, J.-M. Jacquet, L. Brim, and D. Šafránek. Timed modelling of gene networks with arbitrary expression level discretization. In Interactions between Computer Science and Biology, Electronic Notes in Theoretical Computer Science. Elsevier, 2013. in press.
[200] A. Golightly and D. J. Wilkinson. Bayesian Parameter Inference for Stochastic Biochemical Network Models Using Particle Markov Chain Monte Carlo. Interface Focus, l(6):807-820, 2011.
[201] W. Grassmann. Transient Solutions in Markovian Queueing Systems. Computers & Operations Research, 4(1):47 - 53,1977.
[202] D. Gratie and I. Petre. Full structural model refinement as type refinement of colored petri nets. In In Biological Processes & Petri Nets (BPPN 2015), volume 1373 of CEUR Workshop Proceedings, pages 70-84. CEUR-WS.org, 2015.
[203] L. Grieco, L. Calzone, I. Bernard-Pierrot, F. Radvanyi, B. Kahn-Peries, and D. Thief fry. Integrative modelling of the influence of mapk network on cancer cell fate decision. PLoS Computational Biology, 9(10):1-15, 2013.
[204] G. Grofimann, C. Kyriakopoulos, L. Bortolussi, and V. Wolf. Lumping the approximate master equation for multistate processes on complex networks. In Quantitative Evaluation of Systems (QEST 2018), pages 157-172. Springer, 2018.
[205] R. Grosu, G. Batt, F. H. Fenton, J. Glimm, C. Le Guernic, S. A. Smolka, and E. Bartocci. From cardiac cells to genetic regulatory networks. In
Computer Aided Verification (CAV 2011), volume 6806 of Lecture Notes in Computer Science, pages 396^11. Springer, 2011.
[206] C. M. Guldberg and P. Waage. Studies concerning affinity. Forhan-delinger: Videnskabs-Selskabet i Christiana, 1864.
[207] R. Guzzi, T. Colombo, and P. Paci. Inverse Problems in Systems Biology: A Critical Review, pages 69-94. Springer, 2018.
[208] J. E. Haag, A. Vande Wouwer, and M. Remy. A general model of reaction kinetics in biological systems. Bioprocess and Biosystems Engineering, 27(5):303-309,2005.
[209] L. Habets and J. H. van Schuppen. A control problem for affine dynamical systems on a full-dimensional polytope. Automatica, 40(1):21 - 35, 2004.
[210] M. Hajnal, D. Šafránek, M. Demko, S. Pastva, P. Krejčí, and L. Brim. Toward modelling and analysis of transient and sustained behaviour
184
of signalling pathways. In Hybrid Systems Biology (HSB 2016), pages 57-66. Springer, 2016.
[211] K. Hamaguchi, H. Hiraishi, and S. Yajima. Formal verification of speed-dependent asynchronous circuits using symbolic model checking of branching time regular temporal logic. Lecture Notes in Computer Science, 575:410-421,1991.
[212] T. Han, J. Katoen, and A. Mereacre. Approximate parameter synthesis for probabilistic time-bounded reachability. In 2008 Real-Time Systems Symposium, pages 173-182, 2008.
[213] K. Havelund and G. Rou. Monitoring Java programs with Java pathexplorer. Electronic Notes in Theoretical Computer Science, 55(2):200 -217, 2001.
[214] B. R. Haverkort, A. Bell, and H. C. Bohnenkamp. On the efficient sequential and distributed generation of very large Markov chains from stochastic Petri nets. In Petri Net and Performance Models (PNPM), pages 12-21. IEEE Computer Society Press, 1999.
[215] M. Heiner, D. Gilbert, and R. Donaldson. Petri nets for systems and synthetic biology. In Formal methods for the design of computer, communication, and software systems 8th international conference on Formal methods for computational systems biology (SFM), volume 5016 of Lecture Notes in Computer Science, pages 215-264. Springer, 2008.
[216] M. Heiner, M. Herajy, F. Liu, C. Rohr, and M. Schwarick. Snoopy - a unifying petri net tool. In Application and Theory of Petri Nets, pages 398-407. Springer, 2012.
[217] T. Henzinger. The theory of hybrid automata. In Logic in Computer Science (LICS), pages 278 -292. IEEE Computer Society, 1996.
[218] T. Henzinger and H. Wong-Toi. Using hytech to synthesize control parameters for a steam boiler. In In Formal Methods for Industrial Applications: Specifying and Programming the Steam Boiler Control, Lecture Notes in Computer Science 1165, pages 265-282. Springer, 1996.
[219] T. A. Henzinger, P. W. Kopke, A. Puri, and P. Varaiya. What's decid-able about hybrid automata? In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing, pages 373-382. ACM, 1995.
[220] T. A. Henzinger, M. Mateescu, and V. Wolf. Sliding Window Abstraction for Infinite Markov Chains. In Computer Aided Verification (CAV 2009), volume 5643 of Lecture Notes in Computer Science, pages 337-352. Springer, 2009.
185
[221] H.-M. Ho, J. Ouaknine, and J. Worrell. Online monitoring of metric temporal logic. In Runtime Verification, pages 178-192. Springer, 2014.
[222] G. J. Holzmann. The model checker spin. IEEE Trans. Softw. Eng., 23(5):279-295,1997.
[223] R. Honorato-Zimmer et al. Chromar, a Language of Parametrised Objects. Theoretical Computer Science, 2017.
[224] S. Hoops, S. Sahle, R. Gauges, C. Lee, J. Pahle, N. Simus, M. Singhal, L. Xu, P. Mendes, and U. Kummer. Copasi - a complex pathway simulator. Bioinformatics, 22(24) :3067-3074,2006.
[225] F. Horn and R. Jackson. General mass action kinetics. Archive for Rational Mechanics and Analysis, 47:81-116,1972.
[226] B. Hoxha, A. Dokhanchi, and G. Fainekos. Mining parametric temporal logic properties in model-based design for cyber-physical systems. International journal on Software Tools for Technology Transfer, 20(l):79-93, 2018.
[227] B. Hoxha, N. Mavridis, and G. Fainekos. Vispec: A graphical tool for elicitation of mtl requirements. In 2015IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3486-3492,2015.
[228] Hucka, M. et al. The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models. Bioinformatics, 19(4)524-531, 2003.
[229] C. A. Hunt, G. E. Ropella, S. Park, and J. Engelberg. Dichotomies between computational and mathematical models. Nature biotechnology, 26(7):737, 2008.
[230] M. A. Islam, G. Byrne, S. Kong, E. Clarke, R. Cleaveland, F. Fenton, R. Grosu, P. Jones, and S. Smolka. Bifurcation analysis of cardiac alternans using ^-decidability. In CMSB, volume 9859 of Lecture Notes in Computer Science, pages 132-146,2016.
[231] C. Jarzynski. Stochastic and Macroscopic Thermodynamics of Strongly Coupled Systems. Physical Review X, 7(1):011008, 2017.
[232] S. K. Jha, E. M. Clarke, C. Langmead, A. Legay, A. Platzer, and P. Zu-liani. A bayesian approach to model checking biological systems. In Computational Methods in Systems Biology (CMSB 2009), volume 5688 of Lecture Notes in Computer Science, pages 218-234. Springer, 2009.
[233] B. Joshi and A. Shiu. Atoms of multistationarity in chemical reaction networks. Journal of Mathematical Chemistry, 51(1):153-178, 2013.
186
[234] M. C. Jr., F. Dannenberg, N. Paoletti, M. Kwiatkowska, and L. Brim. Precise parameter synthesis for stochastic biochemical systems. Acta Inf., 54(6):589-623,2017.
[235] N. Kashtan, A. E. Mayo, T. Kalisky, and U. Alon. An analytically solvable model for rapid evolution of modular structure. PLoS Computational Biology, 5(4):1-14, 2009.
[236] W. O. Kermack and A. G. McKendrick. A Contribution to the Mathematical Theory of Epidemics. Proceedings of the Royal Society of London. Series A, 115(772):700-721,1927.
[237] Z. Khalis, J. Comet, A. Richard, and G. Bernot. The smbionet method for discovering models of gene regulatory networks. Genes, Genomes and Genomics, 3(1), 2009.
[238] M. Khammash. An engineering viewpoint on biological robustness. BMC Biology, 14(1):22, Mar 2016.
[239] B. N. Kholodenko. Cell-signalling dynamics in time and space. Nature Molecular Cell Biology, 7:165-176, 2006.
[240] H. Kitano. Computational systems biology. Nature, 420:206-210, 2002.
[241] H. Kitano. Systems biology: A brief overview. Science, 295(5560):1662-1664, 2002.
[242] H. Kitano. Towards a theory of biological robustness. Molecular Systems Biology, 3(137), 2007.
[243] H. Klarner, A. Streck, and H. Siebert. Pyboolnet: a python package for the generation, analysis and visualization of boolean networks. Bioinformatics, 33(5):770-772,2017.
[244] H. Klarner, A. Streck, D. Šafránek, J. Kolčák, and H. Siebert. Parameter identification and model ranking of thomas networks. In Computational Methods in Systems Biology (CMSB 2012), Lecture Notes in Computer Science, pages 207-226. Springer, 2012.
[245] M. Klement, D. Šafrňek, T. Děd, A. Pejznoch, L. Nedbal, R. Steuer, J. Červený, and S. Mueller. A Comprehensive Web-based Platform For Domain-Specific Biological Models . Electronic Notes in Theoretical Computer Science, 299(0):61 - 67, 2013.
[246] C. H. Koh, M. Nagasaki, A. Saito, C. Li, L. Wong, and S. Miyano. MIRACH: efficient model checker for quantitative biological pathway models. Bioinformatics, 27(5): 734-735,2011.
187
[247] C. H. Koh, S. Palaniappan, P. Thiagarajan, and L. Wong. Improved Statistical Model Checking Methods for Pathway Analysis. BMC Bioinformatics, 13(Suppl 17):S15, 2012.
[248] J. Kolčák, D. Šafránek, S. Haar, and L. Paulevé. Parameter space abstraction and unfolding semantics of discrete regulatory networks. Theoretical Computer Science, 2018.
[249] M. Komorowski, M. J. Costa, D. A. Rand, and M. P. H. Stumpf. Sensitivity, robustness, and identifiability in stochastic chemical kinetics models. Proceedings of the National Academy of Sciences, 108(21):8645-8650,2011.
[250] H. Kong, E. Bartocci, S. Bogomolov, R. Grosu, T. A. Henzinger, Y. Jiang, and C. Schilling. Discrete abstraction of multiaffine systems. In Hybrid Systems Biology (HSB 2016), volume 9957 of Lecture Notes in Computer Science, pages 128-144, 2016.
[251] H. Kong, E. Bartocci, and T. A. Henzinger. Reachable set over-approximation for nonlinear systems using piecewise barrier tubes. In Computer Aided Verification (CAV 2018), pages 449-467. Springer, 2018.
[252] S. Kong, S. Gao, W. Chen, and E. Clarke. dReach: ^-Reachability Analysis for Hybrid Systems. In TACAS 2015, volume 9035 of Lecture Notes in Computer Science, pages 200-205. Springer, 2015.
[253] S. Konur and M. Gheorghe. A property-driven methodology for formal analysis of synthetic biology systems. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 12(2):360-371, 2015.
[254] R. Koymans. Specifying real-time properties with metric temporal logic. Real-Time Systems, 2:255-299,1990.
[255] J. A. Kromer, S. Márcker, S. Lange, C. Baier, and B. M. Friedrich. Decision making improves sperm chemotaxis in the presence of noise. PLoS computational biology, 14(4):el006109,2018.
[256] N. P. Kurumbang et al. Computer-assisted engineering of the synthetic pathway for biodegradation of a toxic persistent pollutant. ACS synthetic biology, 3(3):172-181, 2013.
[257] M. Kvasnica, P. Grieder, and M. Baotič. Multi-Parametric Toolbox (MPT), 2004.
[258] M. Kwiatkowska and J. Heath. Biological pathways as communicating computer systems. Journal of Cell Science, 122(16):2793-2800,2009.
188
[259] M. Kwiatkowska, G. Norman, and A. Pacheco. Model Checking Expected Time and Expected Reward Formulae with Random Time Bounds. Compu. Math. Appl., 51(2):305 - 316,2006.
[260] M. Kwiatkowska, G. Norman, and D. Parker. Stochastic model checking. In Formal Methods for Performance Evaluation: 7th International School on Formal Methods for the Design of Computer, Communication, and Software Systems (SFM), volume 4486 of Lecture Notes in Computer Science, pages 220-270. Springer, 2007.
[261] M. Kwiatkowska, G. Norman, and D. Parker. PRISM 4.0: Verification of probabilistic real-time systems. In Computer Aided Verification (CAV 2011), volume 6806 of Lecture Notes in Computer Science, pages 585-591. Springer, 2011.
[262] M. Lapin, L. Mikeev, and V. Wolf. Shave: Stochastic hybrid analysis of markov population models. In Hybrid Systems: Computation and Control (HSCC 2011), HHSCC 2011, pages 311-312. ACM, 2011.
[263] K. G. Larsen, P. Pettersson, and W. Yi. Uppaal in a nutshell. International journal on software tools for technology transfer, 1(1-2):134-152, 1997.
[264] N. Le Novere, M. Hucka, H. Mi, S. Moodie, F. Schreiber, A. Sorokin, E. Demir, K. Wegner, M. I. Aladjem, S. M. Wimalaratne, et al. The systems biology graphical notation. Nature biotechnology, 27(8):735, 2009.
[265] C. Li, M. Donizelli, N. Rodriguez, H. Dharuri, L. Endler, V. Chel-liah, L. Li, E. He, A. Henry, M. I. Stefan, J. L. Snoep, M. Hucka, N. Le Novere, and C. Laibe. BioModels Database: An enhanced, cu-rated and annotated resource for published quantitative kinetic models. BMC Systems Biology, 4:92, 2010.
[266] Y. Li, A. Albarghouthi, Z. Kincaid, A. Gurfinkel, and M. Chechik. Symbolic optimization with SMT solvers. In POPL '14, pages 607-618. ACM, 2014.
[267] L. Lindemann. Robust and Abstraction-free Control of Dynamical Systems under Signal Temporal Logic Specifications. PhD thesis, KTH, 2018.
[268] P. Lid, N. Paoletti, M. A. Moni, K. Atwell, E. Merelli, and M. Viceconti. Modelling osteomyelitis. BMC bioinformatics, 13(14):S12, 2012.
[269] B. Liu, S. Kong, S. Gao, P. Zuliani, and E. M. Clarke. Parameter synthesis for cardiac cell hybrid models using ^-decisions. In Computational Methods in Systems Biology (CMSB 2014), pages 99-113. Springer, 2014.
189
[270] X. Liu, J. Jiang, O. Ajayi, X. Gu, D. Gilbert, and R. Sinnott. Bionessie: a grid enabled biochemical networks simulation environment. Studies in Health Technology and Informatics, 138:147-157,2008.
[271] C. M. Lloyd, M. D. Halstead, and P. F. Nielsen. CellML: its future, present and past. Progress in biophysics and molecular biology, 85(2-3):433-450, 2004.
[272] A. J. Lotka. Elements of Physical Biology. Nature, 116:461^61,1925.
[273] A. Luck and V. Wolf. Generalized method of moments for estimating parameters of stochastic reaction networks. BMC Systems Biology, 10(1):98, 2016.
[274] C. Madsen, C. Myers, N. Roehner, C. Winstead, and Z. Zhang. Utilizing Stochastic Model Checking to Analyze Genetic Circuits. In IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pages 379-386,2012.
[275] C. Madsen, F. Shmarov, and P. Zuliani. BioPSy: An SMT-based tool for guaranteed parameter set synthesis of biological models. In Computational Methods in Systems Biology (CMSB 2015), volume 9308 of Lecture Notes in Computer Science, pages 182-194. Springer, 2015.
[276] O. Maler and G. Batt. Approximating continuous systems by timed automata. In Formal Methods in Systems Biology (FMSB), Lecture Notes in Computer Science, pages 77-89. Springer, 2008.
[277] O. Maler and D. Nickovic. Monitoring temporal properties of continuous signals. In Formal Modelling and Analysis of Timed Systems and Formal Techniques in Real-Time and Fault Tolerant System (FOR-MATS/FTRTFT), volume 3253 of Lecture Notes in Computer Science, pages 152-166. Springer, 2004.
[278] O. Maler and D. Nickovic. Monitoring properties of analog and mixed-signal circuits. International Journal on Software Tools for Technology Transfer, 15(3):247-268, 2013.
[279] E. D. Maria, F. Fages, A. Rizk, and S. Soliman. Design, optimization and predictions of a coupled model of the cell cycle, circadian clock, dna repair system, irinotecan metabolism and exposure control under temporal logic constraints. Theoretical Computer Science, 412(21):2108-2127, 2011.
[280] R. Mateescu, P. T. Monteiro, E. Dumas, and H. de Jong. CTRL: Extension of CTL with regular expressions and fairness operators to verify genetic regulatory networks. Theoretical Computer Science, 412(26):2854-2883, 2011.
190
[281] H. Matsuno, A. Doi, M. Nagasaki, and S. Miyano. Hybrid petri net representation of gene regulatory network. Pac Symp Biocomput., pages 341-352,2000.
[282] A. Mbodj, E. H. Gustafson, L. Ciglar, G. Junion, A. Gonzalez, C. Gi-rardot, L. Perrin, E. E. M. Furlong, and D. Thieffry. Qualitative dynamical modelling can formally explain mesoderm specification and predict novel developmental phenotypes. PLoS Computational Biology, 12(9):1-17, 2016.
[283] T. Melham, J. Bard, E. Werner, and D. Noble. Conceptual foundations of systems biology. Prog. Biophys. Mol. Biol., 2012.
[284] P. Mendes. Modeling large scale biological systems from functional genomic data: parameter estimation, pages 163 - 186. MIT Press, Cambridge, MA, 2001.
[285] L. Michaelis and M. Menten. Die kinetik der invertinwirkung. Biochem Z., pages 333-369,1913.
[286] L. Michaelis, M. Menten, K. Johnson, and R. Goody. The original michaelis constant: translation of the 1913 michaelis-menten paper. Biochemistry, 50(39):8264-8269, 2011.
[287] R. Milo, P. Jorgensen, U. Moran, G. Weber, and M. Springer. Bion-umbersthe database of key numbers in molecular and cell biology. Nucleic Acids Research, 38(suppl_l):D750-D753,2010.
[288] C. G. Moles, P. Mendes, and J. R. Banga. Parameter estimation in biochemical pathways: A comparison of global optimization methods. Genome Research, 13(ll):2467-2474, 2003.
[289] P. Monteiro, W. Abou-Jaoud, D. Thieffry, and C. Chaouiya. Model checking logical regulatory networks. IFAC Proceedings Volumes, 47(2):170-175, 2014.
[290] P. Monteiro, D. Ropers, R. Mateescu, A. Freitas, and H. de Jong. Temporal logic patterns for querying dynamic models of cellular interaction networks. Bioinformatics, 24(16):i227-i233, 2008.
[291] P. T. Monteiro, D. Ropers, R. Mateescu, A. T. Freitas, and H. de Jong. Temporal Logic Patterns for Querying Qualitative Models of Genetic Regulatory Networks. In ECAI, volume 178 of FAIA, pages 229-233. IOS Press, 2008.
[292] B. Munsky and M. Khammash. The finite state projection algorithm for the solution of the chemical master equation. The Journal of chemical physics, 124:044104, 2006.
191
[293] C. J. Myers. Engineering Genetic Circuits. CRC Press, 2009.
[294] A. Naldi, P. T. Monteiro, C. Mssel, the Consortium for Logical Models, Tools, H. A. Kestler, D. Thieffry, I. Xenarios, J. Saez-Rodriguez, T. Helikar, and C. Chaouiya. Cooperative development of logical modelling standards and tools with colomoto. Bioinformatics, 31(7):1154-1159, 2015.
[295] L. Nenzi, S. Silvetti, E. Bartocci, and L. Bortolussi. A robust genetic algorithm for learning temporal specifications from data. In Quantitative Evaluation of Systems (QEST 2018), pages 323-338. Springer, 2018.
[296] D. Ničkovič. Monitoring and measuring hybrid behaviors. In Runtime Verification, pages 378-402. Springer, 2015.
[297] D. Ničkovič, O. Lebeltel, O. Malér, T. Ferrére, and D. Ulus. Amt 2.0: Qualitative and quantitative trace analysis with extended signal temporal logic. In Tools and Algorithms for the Construction and Analysis of Systems, pages 303-319. Springer, 2018.
[298] D. Nickovic and O. Maler. Amt: A property-based monitoring tool for analog systems. In Formal Modeling and Analysis of Timed Systems, pages 304-319. Springer, 2007.
[299] J. R. Norris. Continuous-time Markov chains I, page 60107. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 1997.
[300] M. Ostrowski, L. Paulevé, T. Schaub, A. Siegel, and C. Guziolowski. Boolean network identification from multiplex time series data. In Computational Methods in Systems Biology (CMSB 2015), pages 170-181. Springer, 2015.
[301] B. Pal, A. Banerjee, P. Dasgupta, and P. Chakrabarti. Buspec: A framework for generation of verification aids for standard bus protocol specifications. Integration, 40(3):285-304,2007.
[302] M. Pedersen et al. A High-level Language for Rule-based Modelling. Plos One, 10:1-26, 2015.
[303] C. A. Petri and W. Reisig. Petri net. Scholarpedia, 3(4):6477, 2008.
[304] A. Phillips and L. Cardelli. Efficient, correct simulation of biological processes in the stochastic 7r-calculus. In Computational Methods in Systems Biology (CMSB 2007), Lecture Notes in Computer Science, pages 184-199. Springer, 2007.
192
[305] A. Platzer and J.-D. Quesel. Keymaera: A hybrid theorem prover for hybrid systems (system description). In Automated Reasoning, volume 5195 of Lecture Notes in Computer Science, pages 171-178. Springer, 2008.
[306] A. Pnueli. The temporal semantics of concurrent programs. Theoretical Computer Science, 13(1):45 - 60,1981.
[307] L. Popova-Zeugmann, M. Heiner, and I. Koch. Time Petri Nets for Modelling and Analysis of Biochemical Networks. Fundamenta Infor-matica, 67(1-3):149-162, 2005.
[308] S. Prajna. Barrier certificates for nonlinear model validation. Auto-matica, 42(1):117 - 126, 2006.
[309] C. Priami. Algorithmic systems biology. Communications of the ACM, 52(5):80-88, 2009.
[310] C. Priami, A. Regev, E. Shapiro, and W. Silverman. Application of a stochastic name-passing calculus to representation and simulation of molecular processes. Information Processing Letters, 80(1):25-31, 2001.
[311] J. P. Queille and J. Sifakis. Specification and verification of concurrent systems in cesar. In International Symposium on Programming, pages 337-351. Springer, 1982.
[312] V. Raman, A. Donze, D. Sadigh, R. M. Murray, and S. A. Seshia. Reactive synthesis from signal temporal logic specifications. In Hybrid Systems: Computation and Control (HSCC 2015), pages 239-248. ACM, 2015.
[313] S. Ratschan and Z. She. Safety verification of hybrid systems by constraint propagation-based abstraction refinement. ACM Transactions on Embedded Computing Systems, 6(1), 2007.
[314] A. Raue, J. Karlsson, M. P. Saccomani, M. Jirstrand, and J. Timmer. Comparison of approaches for parameter identifiability analysis of biological systems. Bioinformatics, 30(10):1440-1448, 2014.
[315] A. Regev, E. M. Panina, W. Silverman, L. Cardelli, and E. Shapiro. Bioambients: an abstraction for biological compartments. Theoretical Computer Science, 325(1):141-167, 2004.
[316] A. Regev, W. Silverman, and E. Y. Shapiro. Representation and Simulation of Biochemical Processes Using the 7r-Calculus Process Algebra. In Pacific Symposium on Biocomputing, pages 459-470, 2001.
193
[317] D. Reguera, J. M. Rubi, and J. M. G. Vilar. The mesoscopic dynamics of thermodynamic systems. The Journal of Physical Chemistry B, 109(46):21502-21515, 2005.
[318] S. Reinker, R. Altman, and J. Timmer. Parameter Estimation in Stochastic Biochemical Reactions. IEEE Proc. Syst. Biol, 153(4):168-78, 2006.
[319] A. Rizk, G. Batt, F. Fages, and S. Soliman. A general computational method for robustness analysis with applications to synthetic gene networks. Bioinformatics, 25:169-178,2009.
[320] M. Rybihski, M. Lula, P. Banasik, S. Lasota, and A. Gambin. Tav4sb: integrating tools for analysis of kinetic models of biological systems. BMC Systems Biology, 6(1):25, 2012.
[321] A. Sackmann, M. Heiner, and I. Koch. Application of petri net based analysis techniques to signal transduction pathways. BMC Bioinformatics, 7(1):482, 2006.
[322] K. Sanft, D. Gillespie, and L. Petzold. Legitimacy of the stochastic michaelis-menten approximation. Systems Biology, IET, 5(l):58-69, 2011.
[323] S. Sankaranarayanan and A. Tiwari. Relational abstractions for continuous and hybrid systems. In Computer Aided Verification (CAV 2011), volume 6806 of Lecture Notes in Computer Science, pages 686-702. Springer, 2011.
[324] S. Sasagawa, Y.-i. Ozaki, K. Fujita, and S. Kuroda. Prediction and validation of the distinct dynamics of transient and sustained erk activation. Nature cell biology, 7(4):365-373,2005.
[325] M. Schaub, T. Henzinger, and J. Fisher. Qualitative networks: a symbolic approach to analyze biological signaling networks. BMC Systems Biology, 1(1):4, 2007.
[326] F. Schlögl. Chemical Reaction Models for Non-Equilibrium Phase Transitions. Zeitschrift fur Physik, 253:147-161,1972.
[327] I. Schomburg, L. Jeske, M. Ulbrich, S. Placzek, A. Chang, and D. Schomburg. The brenda enzyme information systemfrom a database to an expert system. Journal of Biotechnology, 261:194 - 206, 2017.
[328] M. Schwarick, C. Rohr, and M. Heiner. MARCIE - Model checking And Reachability analysis done effiCIEntly . In Quantitative Evaluation of SysTems (QEST 2011), pages 91-100. IEEE Computer Society, 2011.
194
[329] K. Sen, M. Viswanathan, and G. Agha. Vesta: A statistical model-checker and analyzer for probabilistic systems. In Quantitative Evaluation of Systems (QEST 2005), pages 251-252, 2005.
[330] G. Shinar, R. Milo, M. R. Martnez, and U. Alon. Inputoutput robustness in simple bacterial signaling systems. Proceedings of the National Academy of Sciences, 104(50):19931-19935, 2007.
[331] F. Shmarov and P. Zuliani. Probreach: Verified probabilistic delta-reachability for stochastic hybrid systems. In Hybrid Systems: Computation and Control (HSCC 2015), HSCC 2015, pages 134-139. ACM, 2015.
[332] I. Shmulevich and W. Zhang. Binary analysis and optimization-based normalization of gene expression data. Bioinformatics, 18(4):555-565, 2002.
[333] H. Siebert and A. Bockmayr. Incorporating time delays into the logical analysis of gene regulatory networks. In Computational Methods in Systems Biology (CMSB 2006), volume 4210 of Lecture Notes in Computer Science, pages 169-183. Springer, 2006.
[334] A. Singh and R. Grima. The linear-noise approximation and moment-closure approximations for stochastic chemical kinetics. arXiv preprint arXiv:1711.07383, 2017.
[335] A. Singh and J. a. P. Hespanha. Stochastic hybrid systems for studying biochemical processes. Physical and Engineering Sciences, 368(1930):4995-5011, 2010.
[336] M. W. Sneddon, J. R. Faeder, and T. Emonet. Efficient modeling, simulation and coarse-graining of biological complexity with nfsim. Nature Methods, 8(2):177-183, 2011.
[337] M. I. Stefan and N. Le Novre. Cooperative binding. PLoS Computational Biology, 9(6):l-6, 2013.
[338] R. Steuer, S. Waldherr, V. Sourjik, and M. Kollmann. Robust signal processing in living cells. PLoS computational biology, 7(ll):el002218, 2011.
[339] W. J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton University Press, 1995.
[340] A. Streck. Toolkit for Reverse Engineering of Molecular Pathways via Parameter Identification. PhD thesis, Free University of Berlin, Dahlem, Germany, 2016.
195
[341] A. Streck, J. Kolcák, H. Siebert, and D. Šafránek. Esther: Introducing an online platform for parameter identification of boolean networks. In CMSB, pages 257-258. Springer, 2013.
[342] A. Streck, K. Thobe, and H. Siebert. Analysing cell line specific egfr signalling via optimized automata based model checking. In Computational Methods in Systems Biology (CMSB 2015), pages 264-276. Springer, 2015.
[343] S. Streif, K.-K. K. Kim, P. Rumschinski, M. Kishida, D. E. Shen, R. Findeisen, and R. D. Braatz. Robustness analysis, prediction and estimation for uncertain biochemical networks. IFAC Proceedings Volumes, 46(32):1 - 20, 2013.
[344] O. Stursberg, S. Kowalewski, I. Hoffmann, and J. Preufiig. Comparing timed and hybrid automata as approximations of continuous systems. In Hybrid Systems IV, pages 361-377. Springer, 1997.
[345] M. Swat, A. Kel, and H. Herzel. Bifurcation analysis of the regulatory modules of the mammalian Gl/S transition. Bioinformatics, 20(10):1506-1511,2004.
[346] M. H. ter Beek, A. Fantechi, S. Gnesi, and F. Mazzanti. A state/event-based model-checking approach for the analysis of abstract system properties. Sci. Comput. Program., 76:119-135, 2011.
[347] D. Thieffry and R. Thomas. Dynamical behaviour of biological regulatory networks—ii. immunity control in bacteriophage lambda. Bulletin of Mathematical Biology, 57(2):277-297,1995.
[348] R. Thomas. Boolean formalization of genetic control circuits. Journal of Theoretical Biology, 42(3):563 - 585,1973.
[349] R. Thomas. Regulatory networks seen as asynchronous automata: A logical description. Journal of Theoretical Biology, 153(l):l-23,1991.
[350] A. Tiwari. Abstractions for hybrid systems. Formal Methods in System Design, 32(l):57-83, 2008.
[351] P. Traynard, F. Fages, and S. Soliman. Trace simplifications preserving temporal logic formulae with case study in a coupled model of the cell cycle and the circadian clock. In Computational Methods in Systems Biology (CMSB 2014), pages 114-128. Springer, 2014.
[352] P. Traynard, A. Faur, F. Fages, and D. Thieffry. Logical model specification aided by model-checking techniques: application to the mammalian cell cycle regulation. Bioinformatics, 32(17):i772-i780, 2016.
196
[353] M. Troják, D. Šafránek, J. Hrabec, J. Salagovic, F. Romanovská, and J. Červený. E-cyanobacterium.org: A web-based platform for systems biology of cyanobacteria. In Computational Methods in Systems Biology (CMSB 2016), volume 9859 of Lecture Notes in Computer Science, pages 316-322. Springer, 2016.
[354] J. J. Tyson. Classification of instabilities in chemical reaction systems. The Journal of Chemical Physics, 62(3):1010-1015,1975.
[355] D. Ulus, T. Ferrěre, E. Asarin, and O. Maler. Timed pattern matching. In Formal Modeling and Analysis of Timed Systems, pages 222-236. Springer, 2014.
[356] T. van Dijk, A. Laarman, and J. van de Pol. Multi-core bdd operations for symbolic reachability. In 11th International Workshop on Parallel and Distributed Methods in verification, PDMC 2012, Electronic Notes in Theoretical Computer Science, pages 127-143. ELSEVIER, 2012.
[357] A. P. A. van Moorsel and W. H. Sanders. Adaptive uniformization.
ORSA Communications in Statistics: Stochastic Models, vol. 10, no. 3, pages 619-648,1994.
[358] M. Y. Vardi and P. Wolper. An automata-theoretic approach to automatic program verification (preliminary report). In Proceedings of the Symposium on Logic in Computer Science (LICS '86), Cambridge, Massachusetts, USA, June 16-18,1986, pages 332-344. IEEE Computer Society, 1986.
[359] M. Češka, D. Šafránek, S. Dražan, and L. Brim. Robustness analysis of stochastic biochemical systems. PLoS ONE, 9(4):l-23, 2014.
[360] V. Volterra. Fluctuations in the abundance of a species considered mathematically. Nature, 118:558-560,1926.
[361] D. Šafránek, J. Červený, M. Klement, J. Pospíšilová, L. Brim, D. Lazar, and L. Nedbal. E-photosynthesis: Web-based platform for modeling of complex photosynthetic processes. Biosystems, 103(2):115-124, 2011.
[362] Q. Wang and E. M. Clarke. Formal modeling of biological systems. In 2026 IEEE International High Level Design Validation and Test Workshop (HLDVT), pages 178-184, 2016.
[363] L. H. Watanabe and C. J. Myers. Hierarchical stochastic simulation of genetic circuits. In Proceedings of the Symposium on Theory of Modeling & Simulation - DEVS Integrative, DEVS '14, pages 37:1-37:8. Society for Computer Simulation International, 2014.
197
[364] R. Wisniewski and C. Sloth. Abstraction of Dynamical Systems by Timed Automata. Modeling, Identification and Control, 32(2):79-90, 2011.
[365] S. Yamada, T. Taketomi, and A. Yoshimura. Model analysis of difference between egf pathway and fgf pathway. Biochemical and biophysical research communications, 314(4):1113-1120,2004.
[366] E. Yang, E. van Nimwegen, M. Zavolan, N. Rajewsky, M. Schroeder, M. Magnasco, and J. E. Darnell. Decay Rates of Human mRNAs: Correlation With Functional Characteristics and Sequence Attributes. Genome Research, 13(8):1863-1872,2003.
[367] H.-T. Yang and M. S. H. Ko. Stochastic modeling for the expression of a gene regulated by competing transcription factors. PLoS ONE, 7(3):e32376, 2012.
[368] B. Yordanov and C. Belta. Parameter synthesis for piecewise affine systems from temporal logic specifications. In Hybrid Systems: Computation and Control (HSCC 2008), pages 542-555. Springer, 2008.
[369] A. Zaslaver, A. E. Mayo, R. Rosenberg, P. Bashkin, H. Sberro, M. Tsalyuk, M. G. Surette, and U. Alon. Just-in-time transcription program in metabolic pathways. Nature Genetics, 36(5):486-491,2004.
[370] H. Zhang, J. Du, L. Cao, and G. Zhu. A full symbolic reachability analysis algorithm of timed automata based on bdd. In 2015 IEEE Twelfth International Symposium on Autonomous Decentralized Systems, pages 301-304,2015.
[371] J. Zhang, L. T. Watson, and Y. Cao. Adaptive aggregation method for the chemical master equation. International Journal of Computational Biology and Drug Design, 2(2):134-148, 2009.
[372] J. Zhou, R. Ramanathan, W.-F. Wong, and P. S. Thiagarajan. Automated property synthesis of odes based bio-pathways models. In Computational Methods in Systems Biology (CMSB 2017), volume 10545 of Lecture Notes in Computer Science, pages 265-282. Springer, 2017.
[373] P. Zuliani. Statistical model checking for biological applications. International Journal on Software Tools for Technology Transfer, 17(4):527-536, 2015.
198