Process algebras in systems biology
Federica Ciocchetta1
and Jane Hillston1
Laboratory for Foundations of Computer Science, The University of Edinburgh,
Edinburgh EH9 3JZ, Scotland
(fchiocche,jeh)@inf.ed.ac.uk
Abstract. In this work we present Bio-PEPA, a process algebra for the modelling
and the analysis of biochemical networks. It is a modiﬁcation of PEPA, originally
deﬁned for the performance analysis of computer systems, in order to handle
some features of biological models, such as stoichiometry and the use of general
kinetic laws. The domain of application is the one of biochemical networks.
Bio-PEPA may be seen as an intermediate, formal, compositional representation
of biological systems, on which diﬀerent kinds of analysis can be carried out.
Bio-PEPA is enriched with some notions of equivalence. Speciﬁcally, the isomorphism
and strong bisimulation for PEPA have been considered. Finally, we
show the translation of three biological models into the new language and we
report some analysis results.
1 Introduction
In recent years there has been increasing interest in the application of process algebras
in the modelling and analysis of biological systems [60, 26, 28, 58, 19, 49, 14]. Process
algebras have some interesting properties that make them particularly useful in describing
biological systems. First of all, they oﬀer compositionality, i.e. the possibility of
deﬁning the whole system starting from the deﬁnition of its subcomponents. Secondly,
process algebras give a formal representation of the system avoiding ambiguity. Thirdly,
biological systems can be abstracted by concurrent systems described by process algebras:
species may be seen as processes that can interact with each other and reactions
may be modelled using actions. Finally, diﬀerent kinds of analysis can be performed on
a process algebra model. These analyses provide conceptual tools which are complementary
to established techniques: it is possible to detect and correct potential inaccuracies,
to validate the model and to predict its possible behaviours.
The process algebra PEPA, originally deﬁned for the performance analysis of computer
systems, has been recently applied in the context of signalling pathways [14, 15].
Two diﬀerent approaches have been proposed: one based on reagents (the so-called
reagent-centric view) and another based on pathways (pathway-centric view). In both
cases the species concentrations are discretized into levels, each level abstracting an
interval of concentration values. In the reagent-centric view the PEPA sequential components
represent the diﬀerent concentration levels of the species. In this approach the
abstraction is “processes as species” and not “processes as molecules”, as in other process
algebras such as the π-calculus and Beta-binders [60, 58]. In the pathway-centric
approach we have a more abstract view: the processes represent sub-pathways. Here
multiple copies of components represent levels of concentration. The two views have
been shown to be equivalent [15].
Even though PEPA has proved useful in studying signalling pathways, it does not
allow us to represent all the features of biological networks. The main diﬃculties are
the deﬁnition of stoichiometric coeﬃcients (i.e. the coeﬃcients used to show the quantitative
relationships of the reactants and products in a biochemical reaction) and the
representation of kinetic laws. Indeed, stoichiometry is not represented explicitly and
the reactions are assumed to be elementary (with constant rate). The problem of extending
to the domain of kinetic laws beyond basic mass-action (hereafter called general
kinetic laws) is particularly relevant, as these kinds of reactions are frequently found in
the literature as abstractions of complex situations whose details are unknown. Reducing
all reactions to the elementary steps is complex and often impractical. This problem
impacts also on other process algebras. Indeed, generally they rely on Gillespie’s
stochastic simulation for analysis which considers only elementary reactions. Some recent
works have extended the approach of Gillespie to deal with complex reactions [1,
17] but these extensions are yet to be reﬂected in the work using process algebras. Previous
work concerning the use of general kinetic laws in process algebras and formal
methods was presented in [9, 20]. These are discussed in Section ??.
In this chapter we make a tutorial introduction to Bio-PEPA, a new language for the
modelling and analysis of biochemical networks. A preliminary version of the language
was proposed in [22], with full details presented in [?]. Here, in addition to deﬁning the
language we illustrate its use with a number of examples.
Bio-PEPA is based on the reagent-centric view in PEPA, modiﬁed in order to represent
explicitly some features of biochemical models, such as stoichiometry and the role
of the diﬀerent species in a given reaction. A major feature of Bio-PEPA is the introduction
of functional rates to express general kinetic laws. Each action type represents
a reaction in the model and is associated with a functional rate.
The idea underlying our work is represented schematically in the diagram in Fig. 1.
The context of application is biochemical networks. Broadly speaking, biochemical netBio−PEPA
SYSTEMBIOCHEMICAL NETWORKS
(Gillespie)
CTMC (with levels)
ODEs
(SBML, KEGG,...)
(model checking)
Stochastic simulation
PRISM
Fig. 1. Schema of the Bio-PEPA framework
works consist of some chemical species, which interact with each other through chemical
reactions. The dynamics of reaction are described in terms of some kinetic laws.
The biochemical networks can be obtained from databases such as KEGG [46, 45] and
BioModels Database [8, 56]. From the biological model, we develop the Bio-PEPA
speciﬁcation of the system. This is an intermediate, formal, compositional representation
of the biological model. At this point we can apply diﬀerent kinds of analysis,
including stochastic simulation [36], analysis based on ordinary diﬀerential equations,
numerical solution of continuous time Markov chains (CTMC) and stochastic model
checking using PRISM [61, 40]. It is worth noting that each of these analyses can help
in understanding the system. The choice of one or more methods depends on the context
of application [68]. There exist some relations between the diﬀerent kinds of analysis.
It is well-known that the results of stochastic simulations tend to the ODEs solution
when the number of elements is relatively high. Similarly, it is shown in [35] that the
numerical solution of the CTMC with levels (derived from the PEPA pathway-centric
view) tends to the solution of the ODEs when the number of levels increases.
The rest of the chapter is organised as follows. In the next section we outline our
modelling domain, biochemical networks, and discuss related work in which process
algebras are used to model such systems. The recent development of Bio-PEPA has
been informed by earlier work using the PEPA process algebra so this work is also
presented. In Section 3 we give a detailed acount of Bio-PEPA; its syntax, semantics,
equivalence relations and analysis techniques. The use of the language is illustrated in
Section 4, in which the translation of three biological models into Bio-PEPA and their
subsequent analysis is described. Finally, in Section 5 we present some conclusions and
future perspectives.
2 Background
In this section we present the application domain of this tutorial and we give an overview
on process algebras.
2.1 Application domain: biochemical networks
The application domain of this tutorial concerns biochemical networks, such as those
collected in the Biomodels Database [56] and KEGG [46].
A biochemical system M is composed of:
1. a set of compartments C. These represent the locations of the various species;
2. a set of chemical species S. These species may be genes, proteins, etc.;
3. a set of reactions R. We consider irreversible reactions. Reversible reactions are
decomposed as a pair of forward and inverse reactions.
The general form of an irreversible reaction j is given by:
κ1 jA1 + κ2 jA2 + .... + κnj jAnj
E1,E2,...I1,I2,...; fj
−−−−−−−−−−−−→ κ1 jB1 + κ2 jB2 + .... + κmj jBmj
(1)
where Ah, h = 1, ..., nj, are the reactants, Bl, l = 1, ..., mj, are the products, Ev are the
enzymes and Iu, the inhibitors. Enzymes and inhibitors are represented diﬀerently
from the reactants and products. Their role is to enhance or inhibit the reaction,
respectively. We call species that are involved in a reaction without changing their
concentration (i.e. enzymes and inhibitors) modiﬁers. The parameters κhj and κl j are
the stoichiometry coeﬃcients. These express the degree to which species participate
in a reaction. The dynamics associated with the reaction is described by a kinetic
law fj, depending on some parameters and on the concentrations of some species.
The best known kinetic law is mass-action: the rate of the reaction is proportional to the
product of the reactants’ concentrations. In published models it is common to ﬁnd general
kinetic laws, which describe approximations of sequences of reactions. They are
useful when it is diﬃcult to derive certain information from the experiments, e.g. the reaction
rates of elementary steps, or when there are diﬀerent time-scales for the reactions.
Generally these laws are valid under some conditions, such as the quasi-steady-state assumption
(QSSA). This describes the situation where one or more reaction steps may be
considered faster than the others and so the intermediate elements can be considered to
be constant. There is a long list of kinetic laws; for details see [65].
2.2 Overview on Process algebras
Process algebras are calculi that were introduced in computer science to specify formally
and model concurrent systems [51, 43]. They have been used to deal with complex
systems characterized by concurrency, communication, synchronization and nondeterminism.
Process algebras oﬀer several attractive features which are not necessarily
available in existing performance modelling paradigms. The most important of these
are compositionality, the ability to model a system as the interaction of its subsystems,
formality, giving a precise meaning to all terms in the language, and abstraction, the
ability to build up complex models from detailed components but disregarding internal
behaviour when it is appropriate to do so.
The most widely known process algebras are Milner’s Calculus of Communicating
Systems (CCS) [51] and Hoare’s Communicating Sequential Processes (CSP) [43].
Models in CCS and CSP have been used to establish the correct behaviour of complex
systems by deriving qualitative properties such as freedom from deadlock or livelock.
Process algebras are deﬁned by a simple syntax and semantics. The semantics is
given by axioms and inference rules expressed in an operational way [57]. A system
is deﬁned as a collection of agents which execute atomic actions. Some operators are
introduced for combining the primitives. For instance in CCS the main operators are:
– preﬁx a.P after action a the agent
– parallel composition P|Q agents P and Q proceed in parallel
– choice P + Q the agent behaves as P or Q
– restriction A\M the set of labels M is hidden
– relabelling A[a1/a0, ..] in this agent label a1 is renamed a0
– the null agent 0 this agent cannot act (deadlock)
One of the main peculiarities of process algebras is the possibility to express communication
between two processes. In some cases, such as CCS [51] and π-calculus
[52], a communication between two parallel processes is enabled if one process can
perform an action a (receive) and the other process can perform the complementary action
¯a (send). So the actions must be complementary (input-output) and share the same
name (a in the case considered). The resulting communication has the distinguished
label τ, that indicates an internal (invisible) action. A peculiarity of π-calculus with
respect to CCS is the possibility to represent name-passing: communicating processes
can exchange names over channels and consequently they may change their interaction
topology.
The communication mechanism in CSP is diﬀerent from the one described above
as there is no notion of complementary actions. In CSP two agents communicate by
simultaneously executing actions with the same label. Since during the communication
the joint action remains visible to the environment, it can be reused by other concurrent
processes so that more than two processes can be involved in the communication
(multiway synchronisation).
The analysis of the behaviours of the model, represented in a formal language, is
generally produced through a Labeled Transition System (LTS). From this a derivative
tree or graph in which language terms form the nodes and transitions are the arcs, may
be constructed. This structure is a useful tool for reasoning about agents and the systems
they represent. It is also the basis of the bisimulation style of equivalence. In this style
of equivalence, the actions of an agent characterise it, so two agents are considered to
be equivalent if they are observed to perform exactly the same actions. Strong and weak
forms of equivalence are deﬁned depending on whether the internal actions of an agent
are deemed to be observable.
In CCS and CSP, since the objective is qualitative analysis rather than quantitative,
time and uncertainty are abstracted away. In the last decade various suggestions for incorporating
time and probability into these formalisms have been investigated (see [54]
for an overview of process algebras with time). For example, Temporal CCS (TCCS)
[53] extends CCS with ﬁxed delays and wait-for synchronisation (asynchronous waiting).
Note that most of the timed extensions, including TCCS, retain the assumption
that actions are instantaneous and regard time progression as orthogonal to the activity
of the system. These assumptions make such models unsuitable for performance analysis.
Probabilistic extensions of process algebras, such as PCCS [44], allow uncertainty
to be quantiﬁed using a probabilistic choice combinator. In this case a probability is
associated with each possible outcome of a choice.
In the early 1990s several stochastic extensions of process algebra (stochastic process
algebras or SPA), have been introduced. SPA add quantiﬁcation to models making
them suitable for performance modelling. The general idea is to associate a random
variable with all the actions of the calculus. In most cases, the random variables are
assumed to be exponentially distributed and a rate is added to each preﬁx to represent
the parameter of the exponential distribution that describes the dynamic behaviour of
the associated action. A race condition has then been considered: all the activities that
are enabled in a given state compete and the fastest one succeeds. The choice of exponential
distribution permits to associate the process algebra model to a continuous time
Markov chain (CTMC) and therefore to carry out analysis based on this approach. Some
examples of SPA are TIPP [38], EMPA1
[?,5], PEPA [42], SPADE2
[67] and Stochastic
π-calculus [59].
2.3 Process algebras in systems biology
Recently, as a response to the need of modeling the dynamics of complex biological
systems, there have been some applications of process calculi in systems biology [60,
62, 28, 58, 27, 18, 19, 14]. These techniques are appropriate for describing formally and
analyzing a biological system as a whole and for reasoning about protein/gene interactions.
Indeed, there exists a strong correspondence between concurrent systems described
by process algebras and biological ones: biological entities may be abstracted as
processes that can interact each other and reactions may be modeled by using actions.
Process calculi have some properties that make them useful for studying biological
systems:
– they allow the formal speciﬁcation of the system;
– they oﬀer diﬀerent levels of abstraction for the same biological system;
– they can make interactions explicit, in particular biological elements may be seen
as entities that interact and evolve;
– they support modularity and compositionality;
– they provide well-established techniques for reasoning about possible behaviours.
They may be used not only for the simulation of the system, but also for the veriﬁcation
of formal properties and for behaviour comparison through equivalences.
Several process calculi have been proposed in biology. Each of them has diﬀerent properties
able to render diﬀerent aspects of biological phenomena. They may be divided
into two main categories:
– calculi deﬁned originally in computer science and then applied in biology, such as
the biochemical stochastic π-calculus, [60], CCS-R [28], PEPA [42];
– calculi deﬁned ad hoc by observing biological structures and phenomena, such as
BioAmbients [19], Brane Calculi [18], κ-calculus [27], and Beta-binders [58].
One of the ﬁrst process algebras used in systems biology is the biochemical πcalculus
[60], a variant of the π-calculus deﬁned to model biological systems. The underlying
idea of application of the π-calculus to biology is the molecule-as-computation
abstraction [63, 60]: each biological entity and interaction is associated with a speciﬁcation
in the calculus. Speciﬁcally, molecules are modeled as processes, interaction capabilities
as channels, interactions as communications between processes, modiﬁcations
as state and channel changes and, ﬁnally, compartments and membranes as restrictions.
Two stochastic simulation tools based on Gillespie [36] have been deﬁned (BIOSPI
[6] and SPIM [66]), various applications have been shown [50, 26, 49, 21] and some
modiﬁed versions have been proposed (e.g. SPICO[48] and Sp@ [69]).
1
Originally called simply MPA.
2
Originally called CCS+.
CCS-R [28] is a variant of CCS with new elements that allows the manage of reversibility.
The interactions are described in terms of binary synchronized communications,
similarly to π-calculus. The successor of CSS-R is the Reversible CCS (RCCS)
[31]. This calculus allows processes to backtrack if this is in agreement with a notion
of casual equivalence deﬁned in the paper.
Another language used for the modelling of biological systems is the κ-calculus [27],
based on the description of protein interactions. Processes describe proteins and their
compounds, a set of processes models solutions and protein behaviour is given by a
set of rewriting rules, driven by suitable side-conditions. The two main rules concern
activation and complexation. A stochastic simulator for κ-calculus is described in [30].
A few applications are reported, as in [29].
Beta-binders [58, 64] is an extension of the π-calculus inspired by biological phenomena.
This calculus is based on the concept of bio-process, a box with some sites
(beta-binders) to express the interaction capabilities of the element, in which π-like
processes (pi-processes) are encapsulated. Beta-binders enrich the standard π-calculus
with some constructs that allow us to represent biological features, such as the join
between two bio-processes, the split of one bio-process into two, the change of the bioprocess
interface by hiding, unhiding and exposing a site. The Beta Workbench [64] is
a collection of tools for the modelling, simulation and analysis of Beta-binders system.
The BetaWB simulator is based on a variant of Gillespie’s algorithm.
In most of the calculi considered it is not possible to represent all the features
of biochemical networks. Generally the kinetic laws are assumed to be mass-action
and reactions can have at most two reactants. Indeed these calculi refer to the standard
Gillespie’s algorithm for the analysis and this assumes elementary reactions (i.e.
monomolecular or bimolecular) with constant rates. Furthermore, biological reactions
are abstracted as communications/interactions between agents and in some process algebras
such as π-calculus, CCS and Beta-binders these actions are pairwise. Therefore
multiple-reactant multiple-product reactions cannot be modelled in these calculi.
In order to represent multiple-reactant multiple-product reactions π-calculus and Betabinders
have been enriched with transactions [24, 25].
A ﬁrst proposal to deal with general kinetic laws has been shown in [9]. The authors
of [9] present a stochastic extension of Concurrent Constraint Programming (CCP) and
show how to apply it in the case of biological systems. Here each species is represented
by a variable and the reactions are expressed by constraints on these variables. The
domain of application is extended to any kind of reactions and the rate can be expressed
by a generic function.
The possibility to represent general kinetic laws is also oﬀered by BIOCHAM [20],
a programming environment for modeling biochemical systems, making simulations
and querying the model in temporal logic. This language is not a process algebra, but it
is based on a rule-based language for modeling biochemical systems, in which species
are expressed by objects and reactions by reaction rules.
Finally, some calculi has been deﬁned to model compartments and membranes.
Here we describe brieﬂy Bio-ambients [19] and Brane calculi [18]. Bio-ambients is
centered on ambients, bounded places where processes are contained and where communication
may happen. Ambients can be nested and organized as a hierarchy. This
hierarchy may be modiﬁed by suitable operations that have a biological interpretation.
It is possible to have an enter n and an exit n primitives to move an ambient into or out of
another ambient or a merge+ n/merge- n for merging two ambients together. Ambients
contain compounds that interact via communications. Bio-ambients have been used to
model compartments in BIOSPI [6]. A stochastic semantics for Bio-ambients has been
formalized in [10]. There are few applications, for instance [3].
In Brane calculi [18] a system consists of nested membranes, which are collections
of actions. Membranes may shift, merge, break apart and may be replenished, leading
to very expressive models, in which actions occur on membranes. Membranes may be
seen as oriented objects that must obey to some restrictions on orientation. In particular
they must preserve bitonality, which requires nested membranes to have opposite
orientations.
2.4 PEPA and biological systems
PEPA is a stochastic process algebra originally deﬁned for the performance modelling
of systems with concurrent behaviour [42]. In PEPA each action is assumed to have a
duration, represented by a random variable with a negative exponential distribution. We
informally introduced the syntax of the language below. For more details see [42].
Preﬁx The basic term is the preﬁx combinator (α, r).S . It denotes a component which
has action of type α and an exponentially distributed duration with parameter r
(mean duration 1/r), and it subsequently behaves as S .
Choice The component S + R represents a system which may behave either as S or
as R. The activities of both S and R are enabled. The ﬁrst activity to complete
distinguishes one of them and the other is discarded.
Constant Constants are components whose meaning is given by a deﬁning equation
C
def
= S . They allow us to assign names to patterns of behaviour associated with
components.
Hiding In S/H the set H identiﬁes those activities which can be considered internal
or private to the component S .
Cooperation The term P L
Q denotes cooperation between P and Q over the cooperation
set L, that determines those activities on which the cooperands are forced
to synchronise. PEPA supports multiway synchronisation between components: the
result of synchronising on an activity α is thus another α, available for further synchronisation.
For action types not in L, the components proceed independently and
concurrently with their enabled activities. In the context of performance evaluation
the rate for the synchronised activities is the minimum of the rates of the synchronising
activities.
Recently, PEPA has been applied to the modelling and analysis of signalling pathways.
A ﬁrst study concerns the inﬂuence of the Raf Kinase Inhibitor Protein (RKIP)
on the Extracellular signal Regulated Kinase (ERK) [14], whereas in [15] the PEPA
system for Schoeberl’s model [32] involving the MAP kinase and EFG receptors is reported.
In [14] two diﬀerent modelling styles have been proposed, one based on the
reagent-centric view and the other on the pathway-centric view. The former focuses on
the variation in the concentrations of the reagents: the concentrations are discretized in
levels, each level representing an interval of concentration values. The level l can assume
values between 0 and Nmax (maximum level). The pathway-centric style provides
a more abstract view of the system and focuses on the subpathways. The two representations
were shown to be equivalent [14]. In addition to the standard analysis oﬀered by
process algebras, in [13] a mapping from reagent-centric PEPA models to a system of
ordinary diﬀerential equations (ODEs), has been proposed.
Note that modelling approaches proposed in PEPA are diﬀerent from other process
algebras such as the π-calculus [60] and Beta-binders [58]. Indeed PEPA uses the
abstraction “processes as species” instead of “processes as molecules”: components
capture a pattern of behaviour of a whole set of molecules rather than the identical
behaviour of molecules having to be represented individually.
From the applications discussed above PEPA has been shown to be appropriate for
the modelling of biological systems: it oﬀers a high level of abstraction for the model
and focuses on compositionality and on the interactions. Furthermore, by using PEPA
as a modelling language it is possible to apply diﬀerent kinds of analysis, not only
stochastic simulation, but also diﬀerential equations and model checking.
However, not all the features of biochemical networks can be expressed using the
present version of PEPA: the various kinetic laws are not considered and stoichiometry
is added by hand in the conversion of PEPA into ODEs. As observed above, with a few
exceptions (e.g. [9]) and a few cases (dimerization), these features cannot be represented
in other process algebras either.
3 Bio-PEPA: deﬁnition of the language
Our earlier experience using PEPA, and other stochastic process algebras, to model
biochemical networks developed insights which we then used in the deﬁnition of BioPEPA.
We felt it was important to have a language which can represent all reactions
in a straightforward way as well as handle stoichiometry and general kinetic laws. We
retained the reagent-centric view previously used in PEPA models of biochemical path-
ways.
We adopt a high level of abstraction similar to the one proposed in formalisms such
as SBML [7] 3
, which have been widely adopted by biologists. Furthermore we made
the following assumptions:
1. Compartments are static, i.e. compartments are not actively involved in the reactions
— they are simply containers. The transport of a species from one compartment
to another is modelled by introducing two distinct components for representing
the species. The translocation is abstracted by a transformation of one species
into another. Compartments are included in the deﬁnition of a Bio-PEPA system
because the volume of the containing compartment can impact on reactions of a
species.
2. Reactions are irreversible reactions. A reversible reaction is represented as a pair
of irreversible reactions.
3
This is a widely used XML-based format for representing models of biochemical reaction
networks. Many SBML models are collected in the BioModels Database [8, 56].
3.1 Discrete concentrations and granularity
Following the reagent-centric view, models are based not on individual molecules,
but on discrete levels of concentration within a species: each component represents
a species and it is parametric in terms of concentration level. Some advantages of this
view are:
– It allows us to deal with uncertainty/incomplete information in the exact number of
elements (semi-quantitative data);
– In a discrete state space representation the focus is on the concentration levels rather
than the number of elements: this means that the state space is reduced as there are
less states for each component.
– The population level view, in terms of continuously changing concentrations, and
the individual level view, counting molecules, are both easily recovered from this
abstract view.
This view was presented in [16]. The authors focused on the case of reactions with
mass-action kinetics and stoichiometry equal to one for all the reactants and products.
The granularity of the system has been expressed in terms of the number of levels,
representing concentration intervals. Furthermore they considered the same same step
size h and the same maximum level N for all the species.
In Bio-PEPA we adapt this approach to general kinetic laws, stoichiometry greater
than one and diﬀerent numbers of levels for the species. The granularity of the system
is deﬁned in terms of the step size h of the concentration intervals instead of the number
of levels. We deﬁne the same step size h for all the species4
. This is motivated by
the fact that, following the law of conservation of mass, there must be a “balance”
between the concentrations consumed (reactants) and the ones created (products). In the
case the stoichiometry is greater than one we need to consider concentration quantities
proportional to stoichiometric coeﬃcients. Given a species i, we can assume that it has
a maximum ﬁnite concentration Mi. The number of levels for the species i is given by
Ni + 1 where Ni = Mi
h (the integer value greater than or equal to Mi
h ). Each species can
assume the discrete concentration levels from 0 (concentration null) to Ni (maximum
concentration).
If li is the concentration level for the species i, the concentration is taken to be
xi = li × h.
When a ﬁnite state space CTMC is to be generated, for numerical analysis or
stochastic model checking, we must assume that there is a maximum concentration
for each species. However, we can have a species without a limiting value: we use a
maximum level to capture all values greater than a given (high) value.
3.2 The syntax
The syntax of Bio-PEPA is similar to that of PEPA but with some important diﬀerences.
As in PEPA a model is made up of a number of sequential components; here there is
4
There can be some exceptions to this assumption: 1) since modiﬁers remain constant during
reaction, we may deﬁne a diﬀerent step size for each species which is only a modiﬁer; 2) any
species which is involved on in creation/degradation reactions may have a diﬀerent step size.
one sequential component for each species. As we will see, the syntax of Bio-PEPA
is designed in order to collect the biological information that we need. For example,
instead of a single preﬁx combinator there are a number of diﬀerent operators which
capture the role that the species plays with respect to this reaction.
S ::= (α, κ) op S | S + S | C P ::= P L
P | S (l)
where op = ↓ | ↑ | ⊕ | | .
The component S is called sequential component (or species component) and represents
the species. The component P, called a model component, describes the system
and the interactions among components. We suppose a countable set of model components
C and a countable set of action types A. The parameter l ∈ N represents the
discrete level of concentration. The preﬁx term, (α, κ) op S , contains information about
the role of the species in the reaction associated with the action type α:
– (α, κ) is the activity or reaction, where α ∈ A is the action type and κ is the stoichiometry
coeﬃcient of the species in that reaction; information about the rate of
the reaction is deﬁned elsewhere (in contrast to PEPA);
– the preﬁx combinator “op” represents the role of the element in the reaction. Specifically,
↓ indicates a reactant, ↑ a product, ⊕ an activator, an inhibitor and a
generic modiﬁer.
The choice operator, cooperation and deﬁnition of constant are unchanged. In contrast
to PEPA the hiding operator is omitted.
In order to fully describe a biochemical network in Bio-PEPA we need to deﬁne
structures that collect information about the compartments, the maximum concentrations,
number of levels for all the species, the constant parameters and the functional
rates which specify the rates of reactions. In the following the function name returns
the names of the elements of a given Bio-PEPA component.
First of all we deﬁne the set of compartments.
Deﬁnition 1. Each compartment is described by “V: v unit”, where V is the compartment
name, “v” is a positive real number expressing the compartment size and the
(optional) “unit” denotes the unit associated with the compartment size. The set of
compartments is denoted V.
In Bio-PEPA compartments are static and they cannot change their structure/size.
The set of compartments must contain at least one element. When we have no information
about compartments we add a default compartment whose size is 1 and whose unit
depends on the model.
Deﬁnition 2. For each species we deﬁne the element C : H, N, M0, M, V, where:
– C is the species component name,
– H ∈ N is the step size,
– N ∈ N is the maximum level,
– M0 ∈ R+
∪ { } is the initial concentration,
– M ∈ R+
∪ { } is the maximum concentration,
– V ∈ name(V) ∪ { } is the name of the enclosing compartment.
The set of all the elements C : H, N, M0, M, V is denoted N.
In the deﬁnition the symbol “ ” denotes the empty string, indicating that the last three
components are optional. The initial concentration may added when we want to compare
our model results with the results in the literature. The maximum concentration is
used in the deﬁnition of the number of levels, but generally it can be derived from the
step size and the maximum number of levels. Finally, if there is only one compartment
for all the species in the model we can omit it in the deﬁnition of N.
In order to specify the dynamics of the system we associate a functional rate fαj
with each action αj. This function represents the kinetic law of the associated reaction.
For the deﬁnition of functional rates we consider mathematical expressions with simple
operations and operators involving constant parameters and components. All the kinetic
laws proposed in the book by Segel [65] can be deﬁned in this way. In addition, for
convenience, we include some predeﬁned functions to express the most commonly used
kinetic laws.
The predeﬁned kinetic laws considered are mass-action ( f MA), Michaelis-Menten
(f MM) and Hill kinetics ( f H). They depend only on some parameters; the components/species
are derived from the context5
. The functional rates are deﬁned externally
to the components and are evaluated when the system is derived. They are used to derive
the transition rates of the system. In the functional rates some parameter constants
can be used. These must be deﬁned in the model by means of the set of parameter
deﬁnitions K.
Deﬁnition 3. Each parameter is deﬁned by “kname = value unit”, where “kname C” is
the parameter name, “value” denotes a positive real number and the (optional) “unit”
denotes the unit associated with the parameter. The set of the parameters is denoted K.
Finally, we have the following deﬁnition for the set of sequential components:
Deﬁnition 4. The set Comp of sequential components is deﬁned as
Comp ::= {C
def
= S, where S is a sequential component }
We can deﬁne a Bio-PEPA system in the following way:
Deﬁnition 5. A Bio-PEPA system P is a 6-uple V, N, K, FR,Comp, P , where:
– V is the set of compartments;
– N is the set of quantities describing each species;
– K is the set of parameter deﬁnitions;
– FR is the set of functional rate deﬁnitions;
5
In the case of mass-action, the function f MA(r) is r ×
nj
i=1(Ci)κi , where Ci i = 1, ..., nj are the
nj distinct reactants involved in the reaction and κi is the associated stoichiometric coeﬃcients.
The information about the reactants are derived from the Bio-PEPA speciﬁcations of the system.
In the case of Michaelis-Menten, the function f MM(vM, KM) is vM × E × S/(KM + S ),
where E is the enzyme and S the substrate. Also in this case E and S are derived from the BioPEPA
speciﬁcations. In the case of Hill kinetics, the function f H(v, K, n) is v × Cn
/(K + Cn
),
where C is the element involved in the reaction.
– Comp is the set of deﬁnitions of sequential components;
– P is the model component describing the system.
In a well-deﬁned Bio-PEPA system each element has to satisfy some (reasonable)
conditions. Details can be found in [23]. In the remainder of the chapter we consider
only well-deﬁned Bio-PEPA systems. The set of such systems is denoted ˜P.
3.3 The semantics
The semantics of Bio-PEPA is deﬁned in terms of an operational semantics. We deﬁne
two relations over the processes. The former, called the capability relation, supports the
derivation of quantitative information and it is auxiliary to the latter which is called the
stochastic relation. The stochastic relation gives us the rates associated with each action.
The rates are obtained by evaluating the functional rate associated with the action,
divided by the step size of the species involved, using quantitative information derived
from the capability relation.
The capability relation is −→c ⊆ C × Θ × C, where the label θ ∈ Θ contains the
quantitative information needed to evaluate the functional rate. We deﬁne the labels θ
as:
θ := (α, w)
where w is deﬁned as w ::= [S : op(l, κ)] | w :: w, with S ∈ C, l the level and κ the
stoichiometry coeﬃcient of the components. The order of the components is not important.
The relation −→c is deﬁned as the minimum relation satisfying the rules reported in
Table 1.
The ﬁrst three axioms describe the behaviour of the three diﬀerent preﬁx terms. In
the case of a reactant, the level decreases; in the case of a product, the level increases;
whereas in the case of modiﬁers, the level remains the same. For reactants and products,
the number of levels increment or decrement depends on the stoichiometric coeﬃcient
κ. This expresses the degree to which a species (reactant or product) participates in
a reaction. Therefore some side conditions concerning the present concentration level
must be added to the rules. Speciﬁcally, for the reactants the level has to be greater
than or equal to κ, whereas for the products the level has to be less than or equal to
(N − κ), where N is the maximum level. The modiﬁers can have any possible value
between 0 and N. In all three cases the label θ records the level and the stoichiometry
of the associated component. The rules choice1 and choice2 have the usual meaning.
The rule constant is used to deﬁne the behaviour of the constant term, deﬁned by
one or more preﬁx terms in summation. The label contains the information about the
level and the stoichiometric coeﬃcient related to the action α. The last three rules report
the case of cooperation. The rules coop1 and coop2 concern the case when the action
enabled does not belong to the cooperation set. In this case the label in the conclusion
contains only the information about the component that ﬁres the action. The rule coop3
describes the case in which the two components synchronize and the label reports the
information from both the components. The concatenation operator of lists @ is used
for this purpose.
In order to associate the rates with the transitions we introduce the stochastic relation
−→s ⊆ ˜P × Γ × ˜P, where the label γ ∈ Γ is deﬁned as γ := (α, rα), with rα ∈ R+
.
prefixReac ((α, κ)↓S )(l)
(α,[S :↓(l,κ)])
−−−−−−−−−→c S (l − κ) κ ≤ l ≤ N
prefixProd ((α, κ)↑S )(l)
(α,[S :↑(l,κ)])
−−−−−−−−−→c S (l + κ) 0 ≤ l ≤ (N − κ)
prefixMod ((α, κ) op S )(l)
(α,[S :op(l,κ)])
−−−−−−−−−→c S (l) with op = , ⊕, and 0 ≤ l ≤ N
choice1
S1(l)
(α,w)
−−−→c S1(l )
(S1 + S2)(l)
(α,w)
−−−→c S1(l )
choice2
S2(l)
(α,w)
−−−→c S 2(l )
(S1 + S2)(l)
(α,w)
−−−→c S2(l )
constant
S (l)
(α,S :[op(l,κ)])
−−−−−−−−−−→c S (l )
C(l)
(α,C:[op(l,κ)])
−−−−−−−−−→c S (l )
with C
def
= S
coop1
P1
(α,w)
−−−→c P1
P1 L
P2
(α,w)
−−−→c P1 L
P2
with α L
coop2
P2
(α,w)
−−−→c P2
P1 L
P2
(α,w)
−−−→c P1 L
P2
with α L
coop3
P1
(α,w1)
−−−−→c P1 P2
(α,w2)
−−−−→c P2
P1 L
P2
(α,w1@w2)
−−−−−−−→c P1 L
P2
with α ∈ L
Table 1. Axioms and rules for Bio-PEPA.
In this deﬁnition rα represents the parameter of a negative exponential distribution. The
dynamic behaviour of processes is determined by a race condition: all activities enabled
attempt to proceed but only the fastest succeeds.
The relation −→s is deﬁned as the minimal relation satisfying the rule
Final
P
(αj,w)
−−−−→cP
V, N, K, F ,Comp, P
(αj,rα[w,N,K])
−−−−−−−−−−−→s V, N, K, F ,Comp, P
The second component in the label of the conclusion represents the rate associated with
the transition. The rate is calculated from the functional rate fα in the following way:
rα[w, N, K] =
fα[w, N, K]
h
where h is the step size and fα[w, N, K] denotes the function fα is evaluated over w,
N and K. Speciﬁcally, for each component Ci we derive the concentration as li × h.
Then we replace each free occurrence of Ci by (li × h)κij
, where κij is the stoichiometric
coeﬃcient of the species i with respect to the reaction Rj. The derivation of rates is
discussed in some more detail later.
A Stochastic Labelled Transition System can be deﬁned for a Bio-PEPA system.
Deﬁnition 6. The Stochastic Labelled Transition System (SLTS) for a Bio-PEPA system
is ( ˜P, Γ, −→s), where −→s is the minimal relation satisfying the rule Final.
The states of SLTS are deﬁned in terms of the concentration levels of the species
components and the transitions from one state to another represent reactions that cause
changes in the concentration levels of some components.
Note that using the relation −→c it is possible to deﬁne another labelled transition
system (LTS) as (C, Θ, −→c) which diﬀers only in the transition labels.
Derivation of rates In the SLTS the states represent levels of concentration and the
transitions cause a change in these levels for one or more species. The number of levels
depends on the stoichiometric coeﬃcients of the species involved.
Consider a reaction j described by a kinetic law fj and with all stoichiometric coeﬃcients
equal to one. Following [16], we can deﬁne the transition rate as (∆t)−1
, where ∆t
is the time to have a variation in the concentration of one step for both the reactants and
the products of the reaction. Let y be a variable describing one product of the reaction.
We can consider the rate equation for that species with respect to the given reaction.
This is dy/dt = fj(¯x(t)), where ¯x is the set (or a subset) of the reactants/modiﬁers of the
reaction. We can apply the Taylor expansion up to the second term and we obtain
yn+1 ≈ yn + f(¯xn) × (tn+1 − tn)
Now we can ﬁx yn+1 − yn = h and then derived the time interval (tn+1 − tn) = ∆t as
∆t ≈ h/ f(¯xn). From this we obtain the transition rate as f(¯xn)/h.
When the reaction has stoichiometric coeﬃcients diﬀerent from one, we can consider
an approach similar to the one above. Let y be a product of the reaction. The
approximation gives:
yn+1 ≈ yn + r × κ ×
nr
i=1
xκi
i,n × (tn+1 − tn)
where r is the reaction constant rate, κ is stoichiometric coeﬃcient of the product y, xi
i = 1, ..., nr are the reactants of the reaction, κi i = 1, ..., nr are the associated stoichiometric
coeﬃcients, nr is the number of distinct reactants.
Now we can ﬁx yn+1 − yn = κ × h and then derive the respective (tn+1 − tn) = ∆t as
∆t ≈ h/(r × nr
i=1 xκi
i,n). From this expression we can derive the rate as usual.
Note that this approach is based on an approximation; this depends on the time/concentration
steps. Moreover, we assume that the species can vary by one step size h (ﬁxed) in a time
interval (if all the reactants and products are not at zero or the maximum level). In particular,
reactants are assumed to decrease until zero is obtained and products increase
until a maximum value. This implies that the kinetic law has to satisfy some properties.
Speciﬁcally, it must be monotonic (non-decreasing in terms of the reactant concentration).
Mass-action, Hill-kinetics and Michaelis-Menten are all monotonic, as are many
other kinetic laws.
From biochemical networks to Bio-PEPA We deﬁne a translation, tr BM BP, from
a biochemical network M to a Bio-PEPA system P = V, N, K, FR,Comp, P , based
on the following abstraction:
1. Each compartment is deﬁned in the set V in terms of a name and an associated
volume. Recall that currently in Bio-PEPA, compartments are not involved actively
in the reactions and therefore are not represented by processes.
2. Each species i in the network is described by a species component Ci ∈ Comp. The
constant component Ci is deﬁned by the “sum” of elementary components describing
the interaction capabilities of the species. We suppose that there is at most one
term in each species component with an action of type α. A single deﬁnition can
express the behaviour of the species at any level.
3. Each reaction j is associated with an action type αj and its dynamics is described
by a speciﬁc function fαj
∈ FR. The constant parameters used in the function can
be deﬁned in K.
4. The model P is deﬁned as the cooperation of the diﬀerent components Ci.
3.4 Some examples
Now we present some simple examples in order to show how Bio-PEPA can be used to
capture some biological situations.
Example 1: Mass-action kinetics Consider the reaction 2X + Y
; fM
−−−→3Z, described by
the mass-action kinetic law fM = r × X2
× Y. The three species can be speciﬁed by the
syntax:
X
def
= (α, 2)↓X Y
def
= (α, 1)↓Y Z
def
= (α, 3)↑Z
The system is described by (X(lX0) {α}
Y(lY0)) {α}
Z(lZ0), where lX0, lY0 and lZ0 denote the
initial concentration level of the three components. The functional rate is fα = f MA(r).
The rate associated with a transition is given by:
rα =
r × (lX × h)2
× (lY × h)
h
where lX, lY are the concentration levels for the species X and Y in a given state and h is
the step size of all the species. The reaction can happen only if we have at least 3 levels
(0, 1, 2) for X and 4 levels (0, 1, 2, 3) for Z.
Example 2: Michaelis-Menten kinetics One of the most commonly used kinetic laws
is Michaelis-Menten. It describes a basic enzymatic reaction from the substrate S to the
product P and is written as S
E; fE
−−−→P, where E is the enzyme involved in the reaction.
This reaction is an approximation of a sequence of two reactions, under the quasi-steady
state assumption (QSSA). The whole sequence of reactions is described by the kinetic
law fE = vM×E×S
(KM+S ) . For more details about the derivation of this kinetic law and the
meaning of parameters see [65].
The three species can be speciﬁed in Bio-PEPA by the following components:
S
def
= (α, 1)↓S P
def
= (α, 1)↑P E
def
= (α, 1) ⊕ E
The system is described by (S (lS 0) {α}
E(lE0)) {α}
P(lP0) and the functional rate is
fα = f MM(vM, KM).
The transition rate is given by:
rα =
vM × (lS × h) × (lE × h)
(KM + lS × h)
×
1
h
where lS , lE are the concentration levels for the species S and E in a given state and h is
the step size of all the species. The reaction can happen only if we have at least 3 levels
(0, 1, 2) for all the species involved.
Example 3: competitive inhibition Competitive inhibition is a form of enzyme inhibition
where binding of the inhibitor to the enzyme prevents binding of the substrate and
vice versa. In classical competitive inhibition, the inhibitor binds to the same active site
as the normal enzyme substrate, without undergoing a reaction. The substrate molecule
cannot enter the active site while the inhibitor is there, and the inhibitor cannot enter
the site when the substrate is there. This reaction is described as:
S + E + I ←→ SE −→ P + E
EI
where S is the substrate, E the enzyme, I the inhibitor and P the product. Under QSSA
the intermediate species SE and EI are constant and we can approximate the reactions
above by a unique reaction S
E,I: fI
−−−−→P, with rate fI =
vc × S × E
S + KM(1 + I
KI
)
, where vc is the
the turnover number (catalytic constant), KM is the Michaelis-constant and KI is the
inhibition constant.
The speciﬁcation in Bio-PEPA is:
S
def
= (α, 1)↓S P
def
= (α, 1)↑P E
def
= (α, 1) ⊕ E I
def
= (α, 1) I
The system is described by ((S (lS 0) {α}
E(lE0)) {α}
I(lI0)) {α}
P(lP0) with functional rate
fα = fCI((vc, KM, KI), S, E, I) =
vc × S × E
S + KM(1 + I
KI
)
.
The transition rate is given by:
rα =
vc × (lS × h) × (lE × h)
(lS × h + KM(1 + lI ×h
KI
))
×
1
h
where lS , lE, lI are the concentration levels for the species S , E, I in a given state and
h is the step size of all the species. The reaction can happen only if we have at least 3
levels (0, 1, 2) for all the species involved.
Example 4: degradation and synthesis of a species Two particular reactions are those
which describe the degradation and the creation of a species. In order to model these
reactions we need to add two auxiliary species components to represent respectively the
residue (Res) of the reaction and the creation factor (CF), i.e. genes or DNA.
Let us consider the degradation reaction A−→∅. We describe this reaction in BioPEPA
by introducing the component Res as the residue/product of the reaction. The
two species A and Res are deﬁned as:
A
def
= (α, 1)↓A Res
def
= (α, 1) Res
The component Res is described by one or more sub-terms each of which describes a
diﬀerent degradation reaction.
In contrast the synthesis of a species ∅−→A is described by a new component CF.
The two species A and CF are described by:
A
def
= (α, 1)↑A CF
def
= (α, 1) CF
In the deﬁnitions of the components Res and CF we use the symbol to indicate
that they do not change with the reaction.
3.5 Equivalences
It is sometimes useful to consider equivalences between models in order to determine
whether the systems represented are in some sense the “same”. In this section we discuss
some notions of equivalence for Bio-PEPA. We consider two styles of equivalence
which are commonly considered for process algebras: isomorphism, a structural
equivalence, and bisimulation, a behavioural equivalence. Some characteristics of the
language impact on the deﬁnitions of equivalence and we start by highlighting those.
Firstly, there is no hiding operator or τ actions. Therefore, in Bio-PEPA we do not have
weaker forms of equivalence based on abstracting τ actions. Secondly, in well-deﬁned
systems we have at most one action of a given type in each sequential term and each
component describes the behaviour of a single species. So we cannot have processes of
the form “S + S ” and terms such as “A = a.C” (where A and C diﬀer). Thirdly, if we
have two transitions between the processes P and P , they involve diﬀerent action types
and they represent similar reactions that diﬀer only in the kind/number of modiﬁers. Finally,
we have deﬁned two relations within the semantics. In one case the labels contain
the information about the action type and about the elements involved. This is used as
an auxiliary relation for the derivation of the second one, in which the labels contain
the information about the action type and the rate (similarly to PEPA activity). Thus we
have a choice of which relation on which to base each notion of equivalence.
Recall that in Bio-PEPA we make a distinction between systems and model components.
However note that the only element that is modiﬁed by the transitions of a BioPEPA
system is the model component. All the other components remain unchanged.
Thus we deﬁne equivalences for the Bio-PEPA systems in terms of equivalences for
the model components. Speciﬁcally, we say that two Bio-PEPA systems P1 and P2 are
equivalent if their respective model components are equivalent.
Auxiliary deﬁnitions Before we proceed it will be useful to make some auxiliary
deﬁnitions. Firstly we consider the derivative of a component, the derivative set and
the derivative graph. We refer to the relation −→s, the case of −→c is analogous, the only
diﬀerences are in the label and in the fact that the former relation refers to Bio-PEPA
systems and the latter refers to model components.
Deﬁnition 7. If P
(α,r)
−−−→sP then P is a one-step −→s system derivative of P.
If P
(α1,r1)
−−−−→sP1
(α2,r2)
−−−−→s....
(αn,rn)
−−−−→sP then P is a system derivative of P.
We can indicate the sequence
γ1
−→s
γ2
−→s....
γn
−→s with
µ
−→s, where µ denotes the sequence
γ1γ2, ...γn (possibly empty).
Deﬁnition 8. A system α-derivative of P is a system P such that P
(α,r)
−−−→sP . For each
α ∈ A we have at most one system α-derivative of a system P.
Deﬁnition 9. The system derivative set ds(P) is the smallest set such that:
– P ∈ ds(P);
– if P ∈ ds(P) and there exists α ∈ A(P ) such that P
(α,r)
−−−→sP then P ∈ ds(P).
Deﬁnition 10. The system derivative graph D(P) is the labelled directed multi-graph
whose set of nodes is ds(P) and whose multi-set of arcs are elements in ds(P)×ds(P)×Γ.
Note that in well-deﬁned Bio-PEPA components the multiplicity of Pi, Pj, γ is always
one.
The deﬁnitions above refer to Bio-PEPA systems. The only components of the system
P = V, N, K, F ,Comp, P that evolves is the model component P. The other
components collect information about the compartments, the species, the rates and report
the deﬁnition of the species components. They remain unchanged in the evolution
of the system. In some cases it can be useful (and simpler) to focus on the model component
instead of considering the whole system and use the other components for the
derivation of the rates. We deﬁne a function πP(P) = P, that, given a Bio-PEPA system
returns the model component. Then we deﬁne a (component) derivative of P by considering
the model component P of the system derivative of P. Similarly, we deﬁne a
(component) α-derivative of P, (component) derivative set ds(P) and the (component)
derivative graph D(P) starting from the deﬁnitions for the associated system P.
In the derivation of the CTMC (see Section 3.6) we need to identify the actions
describing the interactions from one state to another.
Deﬁnition 11. Let P be a Bio-PEPA system and let P = πP(P). Let Pu, Pv be two
derivatives of a model component P with Pv a one-step derivative of Pu. The set of
action types associated with the transitions from the process Pu to the process Pv is
denoted A(Pu|Pv).
The next deﬁnition concerns the complete action type set of a system P and a component
P.
Deﬁnition 12. The complete action type set of a system P is deﬁned as:
¯A = ∪Pi∈ds(P)A(Pi)
The complete action type set of a component P is deﬁned similarly.
Other useful deﬁnitions are the ones concerning the exit rate and transition rates. In
the following we report the deﬁnition for the model components, but a similar deﬁnition
can be used for Bio-PEPA systems.
Deﬁnition 13. Let us consider a Bio-PEPA system P = V, N, K, F ,Comp, P and let
P1, P2 ∈ ds(P). The exit rate of a process P1 is deﬁned as:
rate(P1) =
{α|∃P2.P1
(α,rα[w,N,K])
−−−−−−−−−−→sP2, P1=πP(P1)}
rα[w, N, K]
Similarly, the transition rate is deﬁned as:
rate(P1 | P2) =
{α|P1
(α,rα[w,N,K])
−−−−−−−−−−→sP2, P1=πP(P1), P2=πP(P2)}
rα[w, N, K]
For the label γ in the stochastic relation, the function action(γ) = α extracts the ﬁrst
component of the pair (i.e. the action type) and the function rate(γ) = r ∈ R returns the
second component (i.e. the rate).
In the following we use the same symbol to denote equivalences for both the system
and the corresponding model component. In this section we present deﬁnitions of isomorphism
and strong bisimulation which are similar to the relations deﬁned for PEPA
in [42]. Furthermore we show some relationships between the deﬁned equivalences.
Isomorphism Isomorphism is a strong notion of equivalence based on the derivation
graph of the components (systems). Broadly speaking, two components (systems) are
isomorphic if they generate derivation graphs with the same structure and capable of
carrying out exactly the same activities.
We have the following deﬁnition of isomorphism based on the capability relation:
Deﬁnition 14. Let P1, P2 be two Bio-PEPA systems whose model components are
P and Q, respectively. A function F : ds(P) → ds(Q) is a component isomorphism
between P and Q, with respect to −→c, if F is an injective function and for any component
P ∈ ds(P), A(P ) = A(F (P )), with rα[P , N, K] = rα[F (P ), N , K ] for each α ∈
A(P), and for all α ∈ A the set of α-derivatives of F (P ) is the same as the set of
F −images of the α-derivatives of P , with respect to −→c.
This is a very strong relation because the labels associated with the capability relation
contain a lot of information, all of which must be matched. Formally, we can deﬁne
isomorphic components in the following way:
Deﬁnition 15. Let P1, P2 be two Bio-PEPA systems whose model components are P
and Q. P and Q are isomorphic with respect to −→c (denoted P =c Q), if there exists a
component isomorphism F between them such that D(F (P)) = D(Q), where D denotes
the derivative graph.
We can now deﬁne when two Bio-PEPA systems are isomorphic.
Deﬁnition 16. Let P1, P2 be two Bio-PEPA systems whose model components are P
and Q. P1 and P2 are isomorphic with respect to −→c (denoted P1 =c P2), if P =c Q.
A similar structural relation based on the stochastic relation can also be deﬁned
and used to characterise another form of isomorphism between systems (components)
=s (see [?] for details). Both isomorphisms, =c and =s are equivalence relations, and
congruences with respect to the combinators of Bio-PEPA. In both cases they retain
enough information about the structure and behaviour of the isomorphic components
to ensure that they give rise to identical underlying Markov processes. However, =c is
more strict than =s, i.e. there will be pairs of systems (components) which satisfy =s
but do not satisfy =c.
Equational laws Once an equivalence relation has been deﬁned it can be used to establish
equational laws which may be used to manipulate models and recognise equivalent
terms. In the following the symbol “=” denotes either =c or =s. The proof of the laws
follow from the deﬁnition of isomorphism and the semantic rules.
Choice
1) (P + Q) L
S = (Q + P) L
S
2) (P + (Q + R)) L
S = ((P + Q) + R) L
S
Cooperation
1. P L
Q = Q L
P
2. P L
(Q L
R) = (P L
Q) L
R
3. P K
Q = P L
Q if K ∩ ( ¯A(P) ∪ ¯A(Q)) = L
4. (P L
Q) K
R =
P L
(Q K
R) if ¯A(R) ∩ (L\K) = ∅ ∧ ¯A(P) ∩ (K\L) = ∅
Q L
(P K
R) if ¯A(R) ∩ (L\K) = ∅ ∧ ¯A(Q) ∩ (K\L) = ∅
Constant If A
def
= P then A = P
Bio-PEPA systems
Let P1 and P2 be two Bio-PEPA systems, with P = πP(P1) and Q = πP(P2).
If P = Q then P1 = P2.
Strong bisimulation The deﬁnition of bisimulation is based on the labelled transition
system. Strong bisimulation captures the idea that bisimilar components (systems) are
able to perform the same actions with same rates resulting in derivatives that are themselves
bisimilar. This makes the components (systems) indistinguishable to an external
observer. As with isomorphism we can develop two deﬁnition based on the two semantic
relations. This time for illustration we present the deﬁnitions based on the stochastic
relation. The strong capability bisimulation, ∼c, is deﬁned similarly (see [?] for details).
Deﬁnition 17. A binary relation R ⊆ ˜P × ˜P is a strong stochastic bisimulation, if
(P1, P2) ∈ R implies for all α ∈ A:
– if P1
γ1
−→sP 1 then, for some P 2 and γ2, P2
γ2
−→sP 2 with (P 1, P 2) ∈ R and
1. action(γ1) = action(γ2) = α
2. rate(γ1) = rate(γ2)
– the symmetric deﬁnition with P2 replacing P1.
Deﬁnition 18. Let P1, P2 be two Bio-PEPA systems whose model components are
P and Q, respectively. P and Q are strong stochastic bisimilar, written P ∼s Q, if
(P1, P2) ∈ R for some strong stochastic bisimulation R.
Deﬁnition 19. Let P1, P2 be two Bio-PEPA systems whose model components are P
and Q, respectively. P1, P2 are strong stochastic bisimilar, written P1 ∼s P2, if P ∼s Q.
Both ∼c and ∼s are equivalence relations and congruences with respect to the combinators
of Bio-PEPA. Moreover it is straightforward to see that isomorphism implies
strong bisimulation in both cases.
Example Consider the following systems representing two biological systems. The
former system P1 represents a system described by an enzymatic reaction with kinetic
law
v1 × E × S
K1 + S
, where S is the substrate and E the enzyme. We have that the set N is
deﬁned as “S : h, NS ; P : h, NP; E : 1, 1; ” for some step size h and maximum levels
NS and NP. The component and the model components are deﬁned as:
S
def
= (α, 1)↓S E
def
= (α, 1) ⊕ E P
def
= (α, 1)↑P
The model component P1 is (S (lS 0) {α}
E(1)) {α}
P(lP0). The functional rate is fα =
f MM(v1, K1).
The second system P2 describes an enzymatic reaction where the enzyme is left
implicit (it is constant). The rate is given by
v1 × S
K1 + S
, where S is the substrate.
We have that the set N is deﬁned as “S : h, NS ; P : h, NP ; ”.
The components are deﬁned as S
def
= (α, 1)↓S and P
def
= (α, 1)↑P and the model
component P2 is S (lS 0) {α}
P (lP0). In this case fα = f MM ((v1, K1), S ) =
v1 × S
K1 + S
and
the component S and P have the same number of levels/maximum concentration of S
and P.
We have that P1 ∼s P2, but P1 c P2, because the number of enzymes is diﬀerent.
The same relations are valid if the systems rather than the model components are
considered.
3.6 Analysis
A Bio-PEPA system is an intermediate, formal, compositional representation of the biological
model. Based on this representation we can perform diﬀerent kinds of analysis.
In this section we discuss brieﬂy how to use a Bio-PEPA system to derive a CTMC with
levels, a set of Ordinary Diﬀerential Equations (ODEs), a Gillespie simulation and a
PRISM model.
From Bio-PEPA to CTMC As for the reagent-centric view of PEPA, the CTMC associated
with the system refers to the concentration levels of the species components.
Speciﬁcally, the states of the CTMC are deﬁned in terms of concentration levels and the
transitions from one state to the other describe some variations in these levels. Hereafter
we call the CTMC derived from a Bio-PEPA system (or from a PEPA reagent-centric
view system) CTMC with levels.
Theorem 1. For any ﬁnite Bio-PEPA system P = V, N, K, FR,Comp, P , if we deﬁne
the stochastic process X(t) such that X(t) = Pi indicates that the system behaves as the
component Pi at time t, then X(t) is a Markov Process.
The proof is not reproduced here but it is analogous the one presented for PEPA [42].
Instead of the PEPA activity we consider the label γ and the rate is obtained by evaluating
the functional rate in the system. We consider ﬁnite models to ensure that a solution
for the CTMC is feasible. This is equivalent to supposing that each species in the model
has a maximum level of concentration.
Theorem 2. Given ( ˜P, Γ, −→s), let P be a Bio-PEPA system, with model component P.
Let nc = |ds(P)|, where ds(P) is the derivative set of P. Then the inﬁnitesimal generator
matrix of the CTMC for P is a square matrix Q (nc ×nc) whose elements qu,v are deﬁned
as
qu,v =
αj∈A(Pu|Pv)
rαj
[wu, N, K] if u v qu,u = −
u v
qu,v otherwise.
where Pu, Pv are two derivatives of P.
It is worth noting that the states of CTMC are deﬁned in terms of the derivatives of
the model component. These derivatives are uniquely identiﬁed by the levels of species
components in the system, so we can give the following deﬁnition of CTMC states:
Deﬁnition 20. The CTMC states derived from a Bio-PEPA system can be deﬁned as
vectors of levels σ = (l1, l2, ..., ln), where li , for i = 1, 2, ..., n is the level of the species
i and n is the total number of species. We can avoid consideration of the two levels for
Res and CF as they are always constant.
This leads to the following proposition.
Proposition 1. Let P be a Bio-PEPA system with model component P. Let Pu and Pv
be two derivatives of P such that the latter is one-step derivative of the former. If there
exist two action types α1 and α2 that belong to A(Pu|Pv) then:
1. α1 α2;
2. the two action types refer to two transitions/biological reactions that diﬀer only in
the modiﬁers.
If two transitions are possible between a pair of states, the actions involved are diﬀerent
and they represent reactions that diﬀer only in the modiﬁers and/or the number of
enzymes used. The former point follows from the deﬁnition of well-deﬁned Bio-PEPA
system. The second point follows because the only possibility to have two transitions
between two given states is that the associated reactions have the same reactants and
products. We can see this by observing that the states depend on the levels and the reactions
cause some changes in these levels. The only elements that do not change during
a reaction are the modiﬁers.
From Bio-PEPA to ODEs The translation into ODEs is similar to the method proposed
for PEPA (reagent-centric view) [13]. It is based on the syntactic presentation of the
model and on the derivation of the stoichiometry matrix D = {dij} from the deﬁnition
of the components. The entries of the matrix are the stoichiometric coeﬃcients of the
reactions and are obtained in the following way: for each component Ci consider the
preﬁx subterms Cij representing the contribution of the species i to the reaction j. If the
term represents a reactant we write the corresponding stoichiometry κij as −κij in the
entry dij. For a product we write +κij in the entry dij. All other cases are null.
The derivation of ODEs from the Bio-PEPA system P, hereafer called tODE, is based
on the following steps:
1. deﬁnition of the stoichiometry (n × m) matrix D, where n is the number of species
and m is the number of molecules;
2. deﬁnition of the kinetic law vector (m × 1) vKL containing the kinetic laws of each
reaction;
3. association of the variable xi with each component Ci and deﬁnition of the vector
(n × 1) ¯x.
The ODE system is then obtained as:
d ¯x
dt
= D × vKL
with initial concentrations xi0 = li0 × h, for i = 1, ..., n.
The following property holds:
Property 1. For a biochemical network M and a Bio-PEPA system P=tr BM BP(M),
we have that tODE(P) = tBODE(M), where tODE and tBODE are the translation functions
from Bio-PEPA and the biological system into ODEs, respectively.
The ODE system derived from a Bio-PEPA system P is “equal” to the one obtained
directly from the biological network itself. This means that in the translation into BioPEPA
no information for the derivation of ODEs is lost. This result follows from the
fact that in both cases we derive the stoichiometric matrix and, for construction, they
are the same in both cases. However the Bio-PEPA model can collect generally more
information than the respective ODEs. We have this further result:
Property 2. Given two biochemical networks M1 and M2 we deﬁne the corresponding
Bio-PEPA models P1 = tr BM BP(M1) and P2 = tr BM BP(M2). Let tODE(P1) and
tODE(P2) be the two ODE systems obtained from P1 and P2 respectively. If tODE(P1) =
tODE(P2) it does not imply that P1 is “equivalent” to P2.
The above result can be easily seen with appropriate counterexamples. For example
consider the Bio-PEPA models corresponding to the following sets of reactions: {A
r
−→B+
C; A
r
−→B; A
r
−→C + D} and {A
2r
−→B + C; A
r
−→D}. The two Bio-PEPA models are diﬀerent,
but the ODE systems that we derived from them coincide.
From Bio-PEPA to stochastic simulation Gillespie’s stochastic simulation algorithm
[36] is a widely-used method for the simulation of biochemical reactions. It deals with
homogenous, well-stirred systems in thermal equilibrium and constant volume, composed
of n diﬀerent species that interact through m reactions. Broadly speaking, the
goal is to describe the evolution the system X(t), described in terms of the number of
elements of each species, starting from an initial state. Every reaction is characterized
by a stochastic rate constant cj, termed the basal rate (derived from the constant rate
r by means of some simple relations proposed in [36, 68]). Using this it is possible to
calculate the actual rate aj(X(t)) of the reaction, that is the probability of the reaction
Rj occurring in time (t, t + ∆t) given that the system is in a speciﬁc state.
The algorithm is based essentially on the following two steps:
– calculation of the next reaction that occurs in the system;
– calculation of the time when the next reaction occurs.
We derive the information above from two conditional density functions:
p( j | X(t)) = aj(X(t))/a0, that is, the probability that the next reaction is Rj and
p(τ|X(t)) = a0ea0X(t)τ
, the probability that the next reaction occurs in [t + τ, t + τ + dτ],
where a0 =
m
v=1
av(X(t)).
The translation of a Bio-PEPA model to a Gillespie’s simulation is similar to the
approach proposed for ODEs. The main drawbacks are the deﬁnition of the rates and
the correctness of the approach in the case of general kinetic laws. Indeed Gillespie’s
stochastic simulation algorithm supposes elementary reactions and constant rates (massaction
kinetics). If the model contains only this kind of reactions the translation is
straightforward. If there are non-elementary reactions and general kinetic laws, it is
a widely-used approach to consider them translated directly into a stochastic context.
This is not always valid and some counterexamples have been demonstrated [11]. The
authors of [11] showed that when Gillespie’s algorithm is applied to Hill kinetics in the
context of the transcription initiation of autoregulated genes, the magnitude of ﬂuctuations
is overestimated. The application of Gillespie’s algorithm in the case of general
kinetics laws is discussed by several authors [1, 17]. Rao and Arkin [1] show that this
approach is valid in the case of some speciﬁc kinetic laws, such as Michaelis-Menten
and inhibition. However, it is important to remember that these laws are approximations,
based on some assumptions that speciﬁc conditions (such as “S E” in the
case of Michaelis-Menten) hold. The approach followed here is as in [47]: we apply
Gillespie’s algorithm, but particular attention must be paid to the interpretation of the
simulation results and to their validity.
The deﬁnition of a Gillespie model is based on:
– deﬁnition of the state vector ¯X. It is composed of n components Xi, representing the
number of elements for each species i.
– Deﬁnition of the initial condition ¯X0. The values are given by:
Xi0 = li0 × h × NA × vi molecules
where NA is the Avogadro’s number that indicates the number of molecules in a
mole of a substance and vi is the volume size of the containing compartment Vi.
– Deﬁnition of the actual rate for each reaction. We have two cases:
1. reactions whose dynamics is described by mass-action law and with constant
rate rj. The actual rate for the reaction is:
aj( ¯Xj) = cj × fh( ¯Xj)
where cj is the stochastic rate constant, fh is a function that gives the number
of distinct combinations of reactant molecules and ¯Xj are the species involved
in the reaction j. The stochastic rate constant is deﬁned in [68] as:
cj =
rj
(NA × v)ntot−1
×
nj
u=1
κuj!
where nj is the number of distinct reactants in the reaction j, rj is the rate of
the reaction and ntot =
nj
u=1
κuj is the total number of reactants 6
.
Finally, the number of possible combinations of reactants is deﬁned as
fh(( ¯Xj) =
nj
u=1
Xp(u,j)
κuj
∼
nj
u=1
(Xp(u,j))κuj
nj
u=1
κuj!
2. reactions with general kinetic laws fαj
(¯k, ¯C). The actual rate is:
aj( ¯Xj) = fαj
(¯k, ¯Xj)
From Bio-PEPA to PRISM PRISM [61] is a probabilistic model checker, a tool for
the formal modelling and analysis of systems which exhibit random or probabilistic behaviour.
PRISM has been used to analyse systems from a wide range of application domains.
Models are described using the PRISM language, a simple state-based language
6
We assume that all the species that are involved in the reaction as reactants are inside the same
compartment with volume v.
and it is possible to specify quantitative properties of the system using a temporal logic,
called CSL [2] (Continuous Stochatic Logic). For our purposes the underlying mathematical
model of a PRISM model is a CTMC and the PRISM models we generate from
Bio-PEPA correspond to the CTMCs with levels. However we present the translation
separately as the models are speciﬁed in the PRISM language.
The PRISM language is composed of modules and variables. A model is composed
of a number of modules which can interact with each other. A module contains a number
of local variables. The values of these variables at any given time constitute the state of
the module. The global state of the whole model is determined by the local state of all
modules. The behaviour of each module is described by a set of commands. Each update
describes a transition which the module can make if the guard is true. A transition is
speciﬁed by giving the new values of the variables in the module, possibly as a function
of other variables. Each update is also assigned a probability (or in some cases a rate)
which will be assigned to the corresponding transition. It is straightforward to translate
a Bio-PEPA system into a PRISM model. We have the following correspondences:
– The model is deﬁned as stochastic (this term is used in PRISM for CTMC).
– Each element in the set of parameters K is deﬁned as a global constant.
– The maximum levels, the concentration steps and the volume sizes are deﬁned as
global constants.
– Each species component is represented by a PRISM module. The species component
concentration is represented by a local variable and it can (generally) assume
values between 0 and Ni. For each sub-term (i.e. reaction where the species is involved)
we have a deﬁnition of a command. The name of the command is related
to the action α (and then to the associated reaction). The guards and the change
in levels are deﬁned according to whether the element is a reactant, a product or a
modiﬁer of the reactions.
– The functional rates are deﬁned inside an auxiliary module.
– In PRISM the rate associated with an action is the product of the rates of the commands
in the diﬀerent modules that cooperate. For each reaction, we give the value
“1” to the rate of each command involved in the reaction, with the exception of the
command in the module containing the functional rates. In this case the rate is the
functional rate f, expressing the kinetic law. The rate associated with a reaction is
given by 1 × 1 × ... × f = f, as desired.
4 Examples
This section reports the translation of three biological models into Bio-PEPA and some
analysis results. The ﬁrst example is taken from [37] and describes a minimal model
for the cascade of post-translational modiﬁcations that modulate the activity of cdc2
kinase during the cell cycle. The second model is taken from [11] and concerns a simple
genetic network with negative feedback loop. The last example is the repressilator [33],
a synthetic genetic network with an oscillating behaviour.
In the present work the stochastic and deterministic simulations are obtained exporting
the Bio-PEPA system by means of the maps described in Section 3.6. An automatic
translation is under implementation.
4.1 The Goldbeter’s model
In the following we show the translation of the Goldbeter’s model presented in [37]
into Bio-PEPA and we discuss the kinds of analysis that are possible from it. Broadly
speaking, the model describes the activity of the protein cyclin in the cell cycle. The
cyclin promotes the activation of a cdk (cdc2) which in turn activates a cyclin protease.
This protease promotes cyclin degradation, and therefore a negative feedback loop is
obtained.
CYCLIN (C)
cdc2 inactive (M’)
Protease inactive (X’) Protease active (X)
R1
R3
R4
R7
cdc2 active (M)
R2
R6
R5
Fig. 2. Goldbeter’s model.
The biological model A schema of the model is shown in Fig. 2. There are three
distinct species involved:
– cyclin, the protein protagonist of the cycle, represented by variable C;
– cdc2 kinase, in both active (i.e. dephosphorylated) and inactive form (i.e. phosphorylated).
The variables used to represent them are M and M , respectively;
– cyclin protease, in both active (i.e. phosphorylated) and inactive form (i.e. dephosphorylated).
The variables are X and X .
A detailed list of reactions is reported in Table 2. The ﬁrst two reactions are the
creation of cyclin and its degradation. The reactions R3-R6 are enzymatic reactions
describing the activation/deactivation of the biological species cdc2 and protease. These
reactions are activated through phosphorylation/dephosphorylation. The last reaction is
the degradation of the cyclin triggered by the protease.
Concerning the kinetic laws, the ﬁrst two reactions have mass-action kinetics, whereas
the others all have Michaelis-Menten kinetics. We have some kinetic laws in which the
enzyme is explicit (reactions 3, 5), other ones in which is not explicit (reactions 4, 6) as
it is constant and abstracted within the Michaelis-Menten parameter Vi.
id name react. prod. mod. kinetic laws
R1 creation of cyclin - C - vi
R2 degradation of cyclin C - - kd × C
R3 activation of cdc2 kinase M M - C×V1
(Kc+C)
M
(K1+M )
R4 deactivation of cdc2 kinase M M - M×V2
(K2+M)
R5 activation of cyclin protease X X M X ×M×V3
(K3+X )
R6 deactivation of cyclin protease X X - X×V4
K4+X
R7 degradation of cyclin C - X C×Vd×X
C+Kd
triggered by protease
Table 2. Goldbeter model. The list of reactions.
The Bio-PEPA system The translation of the Goldbeter’s model into Bio-PEPA is
achieved in the following steps.
– Deﬁnition of the list V. In the model compartments are not considered. Here we
add the default compartment:
cell : 1.10−14
litre
– Deﬁnition of the set N. This is deﬁned as:
C : h, NC, 0.01, 0.6, V; M : h, NM , 0.99, 1, V; M : h, NM, 0.01, 0.7, V;
X : h, NX , 0.99, 1, V; X : h, NX, 0.01, 0.65, V; Res : 1, 1, , , ; CF : 1, 1, , , ;
The components Res and CF are added to represent degradation reactions and the
synthesis of the cyclin, respectively. The information about the initial and the maximum
concentrations are derived from the paper. We can ﬁx the step size to 0.05. In
this case the maximum levels are: NC = 12, NM = 14, NX = 13, NM = NX = 20.
If we wanted to consider the ﬁner granularity h = 0.01 (corresponding to the initial
concentration of some of the species) we would have NC = 60, NM = 70, NX = 65,
NM = NX = 100.
– Deﬁnition of functional rates (FR) and parameters (K). The functional rates are:
fα1
= f MA(vi); fα2
= f MA(kd); fα4
= f MM(V2, K2);
fα5
= f MM(V3, K3); fα6
= f MM(V4, K4); fα7
= f MM(Vd, Kd);
fα3
= f MM ((V1, Kc, K1), M ,C) =
V1 × C
Kc + C
M
K1 + M
;
The parameters are those reported in the original paper and we have:
vi = 0.025 µM.min−1
; kd = 0.01 min−1
; V1 = 12 µM.min−1
; K1 = 0.02 µM;
V2 = 1.5 µM.min−1
; K2 = 0.02 µM; V3 = 12 min−1
; K3 = 0.02 µM;
Vd = 0.0625 µM.min−1
; V4 = 2 µM.min−1
; K4 = 0.02 µM; Kd = 0.02 µM ;
Kc = 0.5 µM
– Deﬁnition of species components (Comp) and of the model component (P).
C
def
= (α1, 1)↑C + (α2, 1)↓C + (α7, 1)↓C + (α3, 1) ⊕ C;
M
def
= (α4, 1)↑M + (α3, 1)↓M ;
M
def
= (α3, 1)↑M + (α4, 1)↓M + (α5, 1) ⊕ M;
X
def
= (α6, 1)↑X + (α5, 1)↓X ;
X
def
= (α5, 1)↑X + (α6, 1)↓X + (α7, 1) ⊕ X;
Res
def
= (α2, 1) Res; CF = (α1, 1) CF;
C(l0C) {α3}
M(l0M) {α3,α4}
M (l0M ) {α5,α7}
X(l0X) {α5,α6}
X (l0X ) {α2}
Res(0) {α1}
CF(1)
The levels represent the initial values of the system and are set to l0C = l0M = l0X =
0 and l0M = l0X = 20.
Analysis In the following we report some observations about the analysis of the BioPEPA
system.
SLTS and CTMC By considering the step size h = 0.05 and the number of levels given
in the Bio-PEPA system we obtain a CTMC with 52 states and 185 transitions. The
states are described by the vector:
(C(lC), M (lM ), M(lM), X (lX ), X(lX))
where the diﬀerent components can assume diﬀerent values according to the possible
number of levels for each species. This CTMC is not reported.
In the following we present a simpler CTMC for our model, obtained assuming
h = 1 and considering only two levels for each species. The vector N is modiﬁed
accordingly. We show how to deﬁne the states and the transition rates of this CTMC
starting from the Bio-PEPA system and the associated transition system. The initial situation
is with C, M and X absent (0) and the other elements present (1). The initial state
is (C(0), M (1), M(0), X (1), X(0)). Figure 3 reports the stochastic transition system in
this simpliﬁed case.
The numbers indicate the diﬀerent transitions. Each transition is characterized by a
label γi containing the information about the action type and the rate. We have:
γ1 = (α1, r1) γ2 = (α2, r2) γ3 = (α3, r3) γ4 = (α4, r4)
γ5 = (α5, r5) γ6 = (α6, r6) γ7 = (α4, r7) γ8 = (α3, r8)
γ9 = (α1, r9) γ10 = (α2, r10) γ11 = (α7, r11) γ12 = (α4, r12)
γ13 = (α3, r13) γ14 = (α5, r14) γ15 = (α6, r15) γ16 = (α4, r16)
γ17 = (α2, r17) γ18 = (α6, r18) γ19 = (α1, r19) γ20 = (α2, r20)
γ21 = (α7, r21) γ22 = (α1, r22) γ23 = (α6, r23)
where
(0,0,1,1,0)(0,1,0,1,0)
(0,1,0,0,1)(1,0,1,1,0)
(1,1,0,1,0) (0,0,1,0,1)
(1,0,1,0,1) (1,1,0,0,1)
43
16
5 6
8
7
1 2 1814 15
12 13
9 10 11
22
21
20
19
17
23
Fig. 3. The transition system for the Goldbeter’s model in the case of two levels.
r1 = r9 = r19 = r22 = vi = 0.025, r2 = r10 = r20 = r17 = kd × MC = 0.0001,
r3 = r13 = V1∗MC
Kc+MC
MM
(K1+MM ) = 0.23, r4 = r7 = r12 = r16 = V2×MM
(K2+MM) = 2.66,
r5 = r14 = V3×MM×MX
(K3+MX ) = 0.117, r6 = r15 = r23 = r18 = V4×MX
(K4+MX) = 2.66,
r11 = r21 = Vd×MC×MX
(Kd+MC) = 0.00086
The states and transitions of the CTMC correspond to those of the SLTS with
the exception of multiple transitions between the same two states. In this case in the
CTMC we have only two transitions whose rate is the sum of the rates of two single
transitions in the SLTS. In the graph above these cases correspond to the degradation
of cyclin, that can happen both with and without the protease. In the CTMC the
rate associated with the transition between the states (C(1), M (0), M(1), X (0), X(1))
and (C(0), M (0), M(1), X (0), X(1)) and between (C(1), M (1), M(0), X (0), X(1)) and
(C(0), M (1), M(0), X (0), X(1)) is given by the sum of the rates of the two degradation
reactions kd × MC + vd×MC×MX
(Kd+MC) = 0.00093 µM.min−1
. The rates associated with the other
transitions are the ones contained in the labels γi above.
ODEs The stoichiometry matrix D associated with the Bio-PEPA system above is
R1 R2 R3 R4 R5 R6 R7
C +1 -1 0 0 0 0 -1 xC
M 0 0 -1 +1 0 0 0 xM
M 0 0 +1 -1 0 0 0 xM
X 0 0 0 0 -1 +1 0 xX
X 0 0 0 0 +1 -1 0 xX
The vector that contains the kinetic laws is:
vT
KL = vi × 1, kd × xC,
V1 × xC
Kc + xC
xM
(K1 + xM )
,
V2 × xM
(K2 + xM)
,
V3 × xM × xX
(K3 + xX )
,
V4 × xX
(K4 + xX)
,
vd × xC × xX
(Kd + xC)
where “T” indicates the transpose of a vector (or a matrix). The system of ODEs is
obtained as d ¯x
dt = D × vKL, with ¯xT
:= (xC, xM , xM, xX , xX), the vector of the species
variables:
dxC
dt
= vi × 1 − kd × xC −
vd × xC × xX
(Kd + xC)
;
dxM
dt
= −
V1 × xC
Kc + xC
×
xM
(K1 + xM )
+
V2 × xM
(K2 + xM)
;
dxM
dt
=
V1 × xC
Kc + xC
×
xM
(K1 + xM )
−
V2 × xM
(K2 + xM)
;
dxX
dt
= −
V3 × xM × xX
(K3 + xX )
+
V4 × xX
(K4 + xX)
;
dxX
dt
=
V3 × xM × xX
(K3 + xX )
−
V4 × xX
(K4 + xX)
;
The initial conditions are the ones reported in the set N. It is worth noting that the
system is equivalent, after some arithmetic manipulations, to the ODE model presented
in [37]. The analysis of the model using ODEs is reported in Figure 11. The graphs
coincide with results in the original paper.
PRISM The full translation of the model into PRISM is reported in the Appendix A. The
number of levels, the maximum concentrations and the parameters used in the kinetic
laws are expressed using global constants. For each species a module is constructed.
The module representing the cyclin is:
module cyclin
cyclin : [0..Nc] init 0;
[creationC] cyclin < Nc → (cyclin = cyclin + 1);
[degradationC] cyclin > 0 → (cyclin = cyclin − 1);
[activationM] cyclin > 0 → (cyclin = cyclin);
[degradationCX] cyclin > 0 → (cyclin = cyclin − 1);
endmodule
The variable cyclin is local and represents the species “cyclin”. The possible values
are [0..Nc] (where Nc is the maximum level for cyclin) and the initial value is set to
0. Cyclin is involved in four diﬀerent reactions represented by four commands. The
name in the square brackets denotes the reaction. The guards are deﬁned according
to whether cyclin is a reactant, product or modiﬁer of the reaction (this can be derived
from the Bio-PEPA speciﬁcation of the model). The rate associated with each command
is “1” with the exception of the command in the module describing the functional rates.
The functional rates are deﬁned in a speciﬁc module.
Fig. 4. ODE simulation results. In both the ﬁgures we consider the same parameters
with the exception of Michaelis-Menten constants. For Ki i = 1, 2, 3, 4 we have that
Ki = 0.02 µM for the graph on the left and Ki = 40 µM for the graph on the right. The
initial values are the ones reported in the original Goldbeter’s paper: 0.01 µM for C, X
and M. The simulation time is 100 minutes. In the ﬁgure on the left we have sustained
oscillations whereas in the ﬁgure on the right we have no oscillations.
Extension of the model with a control mechanism based on inhibition The authors
of [34] proposed an extension of Goldbeter’s model in order to represent a control
mechanism for the cell division cycle (CDC). Their approach is based on the introduction
of a protein that binds to and inhibits one of the proteins involved in the CDC. This
inﬂuences the initiation and the conclusion of cell division and modulates the frequency
of oscillations. Their approach is based on the basic biochemical network of the CDC
oscillations and not on the details of the model so that it may work for other model of
this kind. One possible extension for Goldbeter’s model is reported in Figure 5.
Generally speaking, given a general CDC model with l proteins U1, U2, ...,Ul, Gardner
et al. show that the ODE model is modiﬁed in the following way (see [34] for
details):
dU1
dt
= f1(U1, U2, · · · , Ul) − a1 × U1 × Y + (a2 + θ × d1);
dU2
dt
= f2(U1, U2, · · · , Ul);
...
dUl
dt
= fl(U1, U2, · · · , Ul);
dY
dt
= vs − d1 × Y − a1 × U1 × Y + (a2 + θ × kd) × Z;
dZ
dt
= a1 × U1 × Y − (a2 + θ × d1 + θ × kd) × Z;
where:
– fi(U1, U2, ...) with i = 1, 2, ..., l are the functions of the standard model;
Protease inactive (X’) Protease active (X)
R7
R6
cdc2 active (M)
R4
cdc2 inactive (M’)
CYCLIN (C)
R1
INHIBITOR−CYCLIN (IC)
INHIBITOR (I)
R10
R3
R5
R8R9
R11
R13
R12
R2
Fig. 5. Extension of the Goldbeter’s model. An inhibitor is added.
– U1 is the concentration of the target protein of the inhibitor, Y is the inhibitor and
Z denotes the concentration of the inhibition-target complex. U2,...,Ul are the other
proteins involved in the cycle;
– a1 and a2 are the constant rates for the binding and for the release;
– vs and d1 are the rate for the inhibitor synthesis and degradation;
– θ < 1 is the fraction of the degradation rates for the complex Z.
In the following we show how to modify the Bio-PEPA system in order to capture
the new reactions and species. Bio-PEPA oﬀers a compositional approach: it is possible
to compose the whole system by deﬁning the simple subcomponents that compose it.
As observed in 1, compositionality is one of the main property of process algebras, that
makes them particularly useful in th case of complex models. In our example, the new
reactions and species are indeed added in a straightforward way, with minor modiﬁcations
of the system speciﬁcation. Broadly speaking, we need to deﬁne components
for the new species, some new terms to describe the new reactions and new functional
rates. Finally, the new components are added to the system component.
Here we consider l = 3, U1 = C, U2 = M and U3 = X. However we can obtain
modulation of CDC frequency by using an inhibitor of any of the proteins.
We need to extend the Bio-PEPA model in the following way:
C
def
= · · · + (α8, 1)↓C + (α9, 1)↑C + (α12, 1)↑C
...
...
Res
def
= · · · + (α11, 1) Res CF
def
= · · · + (α10, 1) CF
I
def
= (α8, 1)↓I + (α9, 1)↑I + (α10, 1)↑I + (α11, 1)↓I + (α13, 1)↑I
IC
def
= (α8, 1)↑IC + (α9, 1)↓IC + (α12, 1)↓IC + (α13, 1)↓IC
where I stands for the inhibitor and IC for the inhibitor-cyclin complex in Figure 5. The
new functional rates, all described by mass-action kinetics, are reported below.
fα8
= vs; fα9
= f MA(d1); fα10
= f MA(a1);
fα11
= f MA(a2); fα12
= f MA(θ × d1); fα13
= f MA(θ × kd)
The list of parameters is extended in order to consider the new values.
Finally the Bio-PEPA model is:
C(l0C) {α3}
M(l0M) {α3,α4}
M (l0M ) {α5,α7}
X(l0X) {α5,α6}
X (l0X )
{α2}
Res(0) {α1}
CF(1) {α8,α9,α10,α11}
I(l0I) {α8,α9,α12,α13}
IC(l0IC)
The results of the ODE simulations corresponding to the new model are reported in
Fig. 6.
Fig. 6. ODE simulation results for the extended model. The parameters of Goldbeter’s
model are as before. For the new parameters, in all the graphs d1 = 0.05, θ = 0.1 and
Kdiss = a1
a2
= 1. In the model on the left a1 = a2 = 0.3 and vs = 0.6, in the graph in
the middle a1 = a2 = 0.7 and vs = 1.4 and in the graph on the right a1 = a2 = 0.05
and vs = 0.1. The initial values of C, X, M and I are 0.01µM. Simulation time is 100
minutes.
4.2 A simple genetic network
Information processing in biological cells is often implemented by a genetic network.
The state of such a network is represented by the concentrations and locations of the
diﬀerent species of molecules. The interactions between these molecules occur in a
random fashion. In order to prevent large ﬂuctuations in the number of molecules of
some species, the genetic network itself can contain negative feedback mechanisms
that suppress these ﬂuctuations.
In order to show how to model genetic networks in Bio-PEPA, we consider a model
from [11]. The model describes a general genetic network with negative feedback
through dimers, such as the one representing the control circuit for the λ repressor protein
CI of λ-phage in E.Coli. Here we focus on the second model presented in the paper,
which uses an inhibition reaction to describe the negative feedback loop (3 species and
5 reactions, one of which is reversible).
In the present work the stochastic and deterministic simulations are obtained exporting
the Bio-PEPA system by means of the maps described in the previous section.
An automatic translation is under implementation.
The biological model A schema of the model is reported in Figure 7. The model is
composed of three biological entities that interact with each other through ﬁve reactions
(of which one is reversible). The biological entities are the mRNA molecule (M), the
protein in monomer form (P) and the protein in dimeric form (P2). The ﬁrst reaction (1)
mRNA (M)
Degradation (3)
Protein (P)
Degradation (4)
Dimer protein (P2)
Dimerisation (5− 5i)
Transcription (1)
Translation (2)
Fig. 7. Genetic network model
is the transcription of the mRNA (M) from the genes/DNA (not considered explicitly).
The protein P in the dimer form (P2), which is the ﬁnal result of the network, has
an inhibitory eﬀect on this process. The second reaction (2) is the translation of the
protein P from M. Another two reactions represent the degradation of M (3) and the
degradation of P (4). Finally there is the dimerization of P and its inverse process (5,5i).
All the reactions are described by mass-action kinetics with the exception of the ﬁrst
reaction, which has an inhibition/Michaelis-Menten kinetics.
The Bio-PEPA system The translation of the model in Bio-PEPA is based on the
following steps:
– Deﬁnition of compartments. The only compartment is deﬁned as vcell : 1 (nM)−1
.
– Deﬁnition of the set N. We need to deﬁne the step size and the number of levels for
each species:
M : hM, NM; P : hP, NP; P2 : hP2, NP2; Res : 1, 1; CF : 1, 1
We omit the information concerning the initial and maximum concentrations. Furthermore
the compartments are not considered as all the species are in the same
compartment. From the original paper the maximum concentrations are MN =
1 nM, MP = 60 nM and MP2 = 180 nM. We can consider hM = 1 nM and
hP = hP2 = 30 nM. Indeed the stoichiometry of P in the dimerization reaction
is 2 and we need to consider at least two levels for P. The maximum levels for the
three species are: NM = 1, NP = 2 and NP2 = 6.
– Deﬁnition of the set of functional rates FR.
fα1
= f I((v, KM), [P2,CF]) = v×CF
KM+P2
;
fα2
= f MA(k2); fα3
= f MA(k3); fα4
= f MA(k4);
fα5
= f MA(k5); fα5i
= f MA(k5i)
where the suﬃx of the action type α refers to the number of the reaction as reported
in Fig. 7.
– Deﬁnition of the set of parameters. The parameter values are
KM = 356 nM; v = 2.19 s−1
; k2 = 0.043 s−1
; k3 = 0.0039 s−1
;
k4 = 0.0007 s−1
; k5 = 0.025 s−1
nM−1
; k5i = 0.5 s−1
– Deﬁnition of the set of species components and of the model component.
M
def
= (α2, 1) ⊕ M + (α3, 1)↓M + (α1, 1)↑M;
P
def
= (α4, 1)↓P + (α5, 2)↓P + (α5i, 2)↑P) + (α2, 0)↑P;
P2
def
= (α1, 1) P2 + (α5i, 1)↓P2 + (α5, 1)↑P2;
Res
def
= (α3, 1) Res + (α4, 1) Res;
CF
def
= (α1, 1) CF;
(((CF(1) {α1}
M(0)) {α2}
P(0)) {α5,α5i}
P2(0)) {α3,α4}
Res(0)
Analysis The model is amenable to a number of diﬀerent analyses as we report in the
following paragraphs.
SLTS and CTMC From the Bio-PEPA model we can derive the SLTS and the CTMC.
The transition system consists of 42 states and 108 transitions, in the case we consider
the information about species listed above.
The states are described by the levels of the single components. Speciﬁcally, we can
deﬁne a state using a vector (CF(l1), M(l2), P(l3), P2(l4), Res(l5)),where li i = 1, ..., 5
represents the level of each component. The parameter li can assume the values 0 and 1
in the case of M, the values 0, 1, 2 for P and values between 0 and 6 for P2. Res and
CF can assume only one value (0 and 1, respectively). The labels γt of the stochastic
transition system contain the action type αj and the rate rαj
, calculated by applying
the associated function fαj
to the quantitative information collected in the labels of the
capability relation and dividing this by the step size of the reactants/products involved
in the reaction. These rates are the ones associated with the CTMC transitions.
ODEs and Gillespie A second kind of analysis concerns diﬀerential equations. The
stoichiometry matrix D associated with the system is
R1 R2 R3 R4 R5 R5i
CF 0 0 0 0 0 0 xCF
M +1 0 -1 0 0 0 x1
P 0 +1 0 -1 -2 +2 x2
P2 0 0 0 0 +1 -1 x3
Res 0 0 0 0 0 0 xRes
The kinetic-law vector is vT
KL = (v×xCF
K+x3
; k2 × x1; k3 × x1; k4 × x2; k5 × x2
2; k5i × x3). The
system of ODEs is obtained as d ¯x
dt = D × vKL:
dx1
dt
=
v × 1
K + x3
− k3 × x1
dx2
dt
= k2 × x1 − k4 × x2 − 2 × k5 × x2
2 + 2 × k5i × x3
dx2
dt
= k5 × x2
2 − k5i × x3
The derivation of the Gillespie’s simulation model is straightforward and not reported
here.
The simulation results are depicted in Figure 12. We consider both deterministic
and stochastic simulation. The two simulation graphs show the same behaviour (with
the exception of some noise in the Gillespie’s simulation), as expected.
PRISM The full translation of the model into PRISM is reported in [?]. Each species is
represented by a PRISM module and the reactions in which it is involved are captured
by commands. In the following we report the deﬁnition of the modules representing the
protein in the monomer and dimer form respectively.
Fig. 8. ODE and Gillespie simulation results. In the case of Gillespie we consider 10
runs. In both cases the rates are as in the original paper.
module p
p : [0..Np] init 0;
[a2] p < Np → (p = p + 1);
[a4] p > 0 → (p = p − 1);
[a5] p > 0 → (p = p − 2);
[a5i] p < Np → (p = p + 2);
endmodule
module pd
pd : [0..Npd] init 0;
[a5i] pd > 0 → (pd = pd − 1);
[a5] pd < Npd → (pd = pd + 1);
endmodule
The variables p and pd are local with respect to each of the two modules and represent
the species “protein in monomer form” and “protein in dimer form”, respectively.
The possible values are [0..Np] for p and [0..Npd] for pd, while the initial values are 0.
The monomer P is involved in four diﬀerent reactions while the dimer form P2 in just
two. We have an additional module with the functional rates.
Properties of the system can be expressed formally in CSL and analysed against
the constructed model. Two examples of possible queries are considered below. A ﬁrst
query considers the probability that the monomer is at level i at time T. The property
is expressed by the form “P =?[...]”, that returns a numerical value representing
the probability of the proposition inside the square brackets. In our case the query is
P =?[true U[T, T] p = i], where U is the bounded until operator and [T, T] indicates
a single time instant. A property of the form “prop1 U[time] prop2” is true for a path
if time deﬁnes an interval of real values and the path is such that prop2 becomes true at
a time instant which falls within the interval and prop1 is true in all time instants up to
that point. The second query concerns the proportion of the protein in monomer form
(P) relative to the total quantity of the protein (i.e. P + P2). In order to deﬁne this property,
we need a reward structure. State rewards can be speciﬁed using multiple reward
items, each of the form “guard:reward;”, where guard is a predicate and reward is an
expression. States of the model which satisfy the predicate in the guard are assigned the
corresponding reward. Speciﬁcally, in our case we deﬁne the reward:
rewards
true : p
(p+pd)
;
endrewards
This reward assigns the value p
(p+pd) to each state of the system. We can ask for the
frequency of P by using the query R =?[I = T]. This is an instantaneous reward property,
i.e. it refers to the reward of a model at a particular instant in time T. The property
“I = T” associates with a path the reward in the state of that path when exactly T time
units have elapsed. The letter “R” indicates that the property refers to a reward structure.
The results of the two queries are reported in Fig. 9.
Fig. 9. PRISM query results. In the ﬁgure on the left reports the graph of the proportion
of monomer P over the total protein with respect to time. On the right it is depicted the
probability that the monomer protein is at level 1 and 2, with respect to time.
4.3 The repressilator
The repressilator is a synthetic genetic regulatory network with oscillating behaviour
reported in [33]. The repressilator consists of three genes connected in a feedback loop,
such that the transcription of a gene is inhibited by one of the other proteins. In the
following we present the translation of the original model into Bio-PEPA and we report
some analysis results.
The biological model A schema of network is reported in Figure 10.
The species involved are:
– three kinds of genes, hereafter denoted G1, G2, G3. These represent the genes lacl,
tetR and cI, respectively;
– the mRNAs transcripted from the three genes, hereafter denoted mRNA1, mRNA2,
mRNA3, respectively;
– the proteins corresponding to the three genes, denoted P1, P2, P3, respectively.
These represent the protein associated to the previous genes and are Lacl, TetR, CI.
The reactions are:
– the transcription of the three mRNAs with inhibition by one of the proteins. These
reactions are indicated as tr1, tr2, tr3. The genes are constant and are kept implicit;
– the translation of mRNAs into the proteins, indicated as trl1, trl2, trl3;
– degradation of both mRNAs and proteins, indicated as di with i = 1, ..., 6.
The transcription reactions are described by Hill kinetics, while the other reactions have
mass-action kinetic laws.
P2mRNA2G2
G3 mRNA3 P3 G1 mRNA1 P1
trl2tr2
d2
d5
tr1 trl1
d2d1d6
trl3tr3
d3
Fig. 10. Repressilator model.
The Bio-PEPA system The deﬁnition of the Bio-PEPA corresponding to the repressilator
model is reported below. The parameters and the initial concentrations are one of
the possibilities deﬁned in the paper [33].
– Deﬁnition of compartments. There are no compartments deﬁned explicitly in the
model. We consider the default compartment:
vCell : 1;
– Deﬁnition of the set N It is deﬁned as:
mRNA1 : 1, 1, 0, , vCell; mRNA2 : 1, 1, 0, , vCell; mRNA3 : 1, 1, 0, , vCell;
P1 : 1, 1, 5, , vCell; P2 : 1, 1, 0, , vCell; P3 : 1, 1, 15, , vCell;
Res : 1, 1, , , vCell; CF : 1, 1, , , vCell;
It is worth noting that in the original model the genes are not represented explicitly.
In Bio-PEPA we introduce CF to deﬁne the transcription. For all the species
we consider two levels (high and low) and step h = 1. The initial values (third
components) are the ones reported in the paper.
– Deﬁnition of the set FR and of the set of parameters. The set of functional rates is:
ftr1 = f I((α, α0), [P3], 2) = α
1+P32 + α0;
ftr2 = f I((α, α0), [P1], 2) = α
1+P12 + α0;
ftr3 = f I((α, α0), [P2], 2) = α
1+P22 + α0;
ftrl1 = f MA(β)
ftrl2 = f MA(β)
ftrl3 = f MA(β)
fdi = f MA(1) i = 1, 2, 3, 4, 5, 6;
All the three repressors have same behaviour except for their DNA-binding speciﬁcities.
We assume that all the degradation reactions have rate 1.
The other parameters are:
α = 250; α0 = 0; β = 5
These parameters have the following meaning:
• α0 is the number of protein copies per cell produced from a given promoter
type during growth in the presence of saturing amounts of the repressor. In the
case of the absence of the repressor this number is α0 + α;
• β is the ratio of the protein decay rate to the mRNA decay rate.
– Deﬁnition of the species components. The species components are:
mRNA1
def
= (d1, 1)↓mRNA1 + (tr1, 1)↑mRNA1 + (trl1, 1) ⊕ mRNA1;
mRNA2
def
= (d2, 1)↓mRNA2 + (tr2, 1)↑mRNA2 + (trl2, 1) ⊕ mRNA2;
mRNA3
def
= (d3, 1)↓mRNA3 + (tr3, 1)↑mRNA3 + (trl3, 1) ⊕ mRNA3;
P1
def
= (d4, 1)↓P1 + (trl1, 1)↑P1 + (tr3, 1) P1;
P2
def
= (d5, 1)↓P2 + (trl2, 1)↑P2 + (tr1, 1) P2;
P3
def
= (d6, 1)↓P3 + (trl3, 1)↑P3 + (tr2, 1) P3;
CF
def
= (tr1, 1) CF + (tr1, 1) CF + (tr1, 1) CF;
Res
def
= (d1, 1) Res + (d2, 1) Res + (d3, 1) Res+
(d4, 1) Res + (d5, 1) Res + (d6, 1) Res;
– Deﬁnition of the model component. The model is deﬁned as:
((((((M1(lM10) <> M2(lM20)) M3(lM30)) {trl1,tr3}
P1(lP10)) {trl2,tr1}
P2(lP20)) {trl3,tr2}
P3(lP30)) {tr1,tr2,tr3} {tr1,tr2,tr3}
CF(1)) {d1,d2,d3,d4,d5,d6}
Res(0)
The initial levels are deﬁned according to the initial values of the model.
Analysis We consider both ODEs and stochastic simulation by Gillespie. The analysis
results are reported in Figures 11 and 12. In Figure 11 we have used the parameters reported
in the paper. On the left, the ODE simulation results are reported. An oscillating
behaviour is shown by all the three proteins. On the right stochastic simulation, averaged
over 100 runs, is reported. In this case the average oscillating behaviour becomes
weaker after some time. The diﬀerence between determistic and stochastic simulations
is probably due to the use of Hill kinetics with Gillespie. Indeed, as discussed in Section
[?] the use of this kinetic law with Gillespie can lead to some wrong results. Nothe that
varying the values of α and β for the diﬀerent elements we obtain diﬀerent amplitudes
for the oscillations. In the case of Figure 12, the three proteins reach a steady state, both
with ODE and stochastic simulation.
5 Conclusions and future perspectives
In this paper we have presented Bio-PEPA, a modiﬁcation of the process algebra PEPA
for the modelling and the analysis of biochemical networks. Bio-PEPA allows us to
Fig. 11. Analysis of the model: ODE (on the left) and Gillespie (on the right). The
parameters are the ones reported in the text. In the case of Gillespie’s simulation 100
runs are considered.
Fig. 12. Analysis of the model: ODE (on the left) and Gillespie (on the right). The
parameters are diﬀerent from the ones reported in the text. The parameter α0 is 25
and the initial values are P1 = 5, P2 = 10 and P3 = 15. In Gillespie, 100 runs are
considered.
represent explicitly some features of biochemical networks, such as stoichiometry and
general kinetic laws. Thus not only elementary reactions with constant rates, but also
complex reactions described by general kinetic laws can be considered. Indeed each
reaction in the model is associated with an action type and with a functional rate. The
potential to consider various kinds of kinetic laws permits us to model a vast number
of biochemical networks. Indeed complex reactions are frequently found in models as
abstractions of sequences of elementary steps and reducing to elementary reactions is
often impossible and undesirable.
Bio-PEPA in enriched with some notions of equivalence. We have presented deﬁnitions
of isomorphism and strong bisimulation which are similar to the relations deﬁned
for PEPA in [42]. These equivalence are quite strict. A further investigation concerns
the deﬁnition of other forms of equivalence, more appropriate for studying biological
systems.
A principal feature of Bio-PEPA is the possibility of mapping the system to different
kinds of analysis. In this work we have shown how to derive a CTMC from a
Bio-PEPA model and we have discussed the derivation of ODEs, stochastic simulation
and PRISM models. Indeed Bio-PEPA has been deﬁned as an intermediate language
for the formal representation of the model. We have extended the deﬁnition of CTMC
with levels, deﬁned in [16], to the case of general kinetic laws and to diﬀerent levels for
the species. The main beniﬁt of this approach is a reduction in the state space. Some
assumptions are made. First of all all the species must have a ﬁnite maximum concentration.
This is to ensure a ﬁnite state space in the corresponding CTMC, making
numerical solution feasible. However, we can have a species without a limiting value.
In these cases we can consider a maximum level for the values greater than a certain
(high) value. A second point concerns the assumption that all the species have the same
step size. This may be a problem when the species can have maximum concentrations
belonging to diﬀerent concentration scales. In this case some species can have only few
levels whereas other can have many. Furthermore, some species (for instance genes)
are present in the system only in few copies and in this case the representation in terms
of continuous concentration is wrong. In order to handle this situation, Bio-PEPA can
be enriched with discrete variables. The possibility to consider diﬀerent step sizes and
discrete variables is a topic for future work.
The diﬀerent kinds of analysis proposed for Bio-PEPA are strongly related. An area
forfurther work will concern a deeper study of these relationships, in particular for the
CTMC with levels and Gillespie models. An outstanding problem is the application of
Gillespie’s stochastic simulation with general kinetic laws. The approach proposed in
this work is to use Gillespie simulation also in this context, but to be careful about the
interpretation of the results. An important topic for the future will be deeper investigation
of the relation between Gillespie simulation and general kinetic laws. Another
aspect that needs further study is the determination of the number of concentration levels
for each species. The deﬁnition of these numbers is critical for the deﬁnition of
the CTMC and must be done carefully. Indeed for some analyses (ODEs and Gillespie
simulation) two levels are enough to capture the behaviour of some systems, but for
numerical analysis of the CTMC and PRISM model checking a ﬁner granularity for
levels is necessary to capture the full behaviour of the system.
Finally, a tool for the analysis of biochemical networks using Bio-PEPA is under
implementation and a translation from SBML into Bio-PEPA is planned.
A Appendix A: PRISM speciﬁcation of the Goldbeter’s model
//Kind of model
stochastic
//Volume
const double cell = 1;
// Levels
const int Nc = 1;
const int Nm = 1;
const int Nx = 1;
const int Nxi = 1;
const int Nmi = 1;
//Steps
const double Hc = 0.01;
const double Hm = 0.01;
const double Hx = 0.01;
const double Hxi = 0.01;
const double Hmi = 0.01;
//Parameters
const double vi = 0.05;
const double vd = 0.025;
const double kd = 0.01;
const double Kc = 0.5;
const double V1 = 3;
const double V3 = 1;
const double Kd = 0.2;
const double V2 = 1.5;
const double V4 = 0.5;
const double K1 = 0.005;
const double K2 = 0.005;
const double K3 = 0.005;
const double K4 = 0.005;
//Modules
//module Cyclin
module cyclin
cyclin : [0..Nc] init 0;
[creationC] cyclin<Nc --> 1: (cyclin’ = cyclin+1);
[degradationC] cyclin>0 --> 1: (cyclin’ = cyclin-1);
[activationM] cyclin>0 --> 1: (cyclin’ = cyclin);
[degradationCX] cyclin>0 --> 1: (cyclin’ = cyclin-1);
endmodule
//module kinase inactive
module kinasei
kinasei : [0..Nmi] init 1;
[activationM] kinasei>0 --> 1: (kinasei’ = kinasei-1);
[deactivationM] kinasei<Nmi --> 1: (kinasei’= kinasei+1);
endmodule
//module kinase active
module kinase
kinase : [0..Nm] init 0;
[activationM] kinase<Nm --> 1: (kinase’= kinase+1);
[deactivationM] kinase>0 --> 1: (kinase’ = kinase-1);
[activationX] kinase>0 --> 1: (kinase’ = kinase);
endmodule
//module protease inactive
module proteasei
proteasei : [0..Nxi] init 1;
[activationX] proteasei>0 --> 1: (proteasei’= proteasei-1);
[deactivationX] proteasei<Nxi --> 1: (proteasei’= proteasei+1);
endmodule
//module protease active
module protease
protease : [0..Nx] init 0;
[activationX] protease<Nx --> 1: (protease’ = protease+1);
[deactivationX] protease>0 --> 1: (protease’ = protease-1);
[degradationCX] protease>0 --> 1: (protease’ = protease);
endmodule
module Functional_rates
dummy: bool init true;
[creationC] cyclin<Nc --> vi/Hc: (dummy’=dummy);
[degradationC] cyclin<Nc --> (kd*cyclin*Hc)/Hc: (dummy’=dummy);
[activationM] cyclin>0 & kinasei>0 -->
((cyclin*Hc*V1 )/(Kc + cyclin*Hc))*((kinasei*Hmi)/(K1+kinasei*Hmi))
*(1/Hmi): (dummy’=dummy);
[activationX] kinase>0 & proteasei>0 -->
(kinase*Hm*proteasei*Hxi*V3/(K3+proteasei*Hxi))*(1/Hxi):
(dummy’=dummy);
[deactivationM] kinase>0 --> ((kinase*Hm*V2)/(K2 +kinase*Hm))*(1/Hm):
(dummy’=dummy);
[deactivationX] protease>0 --> (protease*Hx*V4/(K4 + protease*Hx))
*(1/Hx): (dummy’=dummy);
[degradationCX] cyclin>0 & protease>0 -->
((cyclin*Hc*vd *protease*Hx)/(cyclin*Hc + Kd))*(1/Hc): (dummy’=dummy);
endmodule
B AppendixB: PRISM speciﬁcation of the genetic network model
//Kind of model
stochastic
//Volume
const double cell = 1*10ˆ{-6};
//Levels
const int NCF = 1;
const int Nm = 1;
const int Np = 2;
const int Npd = 6;
//Steps
const double HCF = 1;
const double Hm = 1.0;
const double Hp = 30.0;
const double Hpd = 30.0;
//Parameters
const double v = 2.19;
const double KM = 356;
const double k3 = 0.0039;
const double k2 = 0.043;
const double k4 = 0.0007;
const double k5 = 0.025;
const double k5i = 0.5;
//Modules
module CF
CF: [0..NCF] init NCF;
[a1] CF>0 --> 1: (CF’=CF);
endmodule
module p
p: [0..Np] init 0;
[a2] p<Np --> 1: (p’=p+1);
[a4] p>0 --> 1: (p’=p-1);
[a5] p>0 --> 1: (p’=p-2);
[a5i] p<Np --> 1: (p’=p+2);
endmodule
module pd
pd: [0..Npd] init 0;
[a5i] pd>0 --> 1: (pd’=pd-1);
[a5] pd<Npd --> 1: (pd’=pd+1);
endmodule
module m
m: [0..Nm] init 0;
[a1] m<Nm --> 1: (m’=m+1);
[a2] m>0 --> 1: (m’=m);
[a3] m>0 --> 1: (m’=m-1);
endmodule
module Functional_rates
dummy: bool init true;
[a1] dummy = true --> (v/(KM + pd*Hpd))*(1/Hm): (dummy’ = dummy);
[a2] dummy = true --> (k2*m*Hm)/Hm: (dummy’ = dummy);
[a3] dummy = true --> (k3*m*Hm)/Hm: (dummy’ = dummy);
[a4] dummy = true --> (k4*p*Hp)/Hp: (dummy’ = dummy);
[a5] dummy = true --> (k5*(p*Hp)ˆ2)/Hp: (dummy’ = dummy);
[a5i] dummy = true --> (k5i*pd*Hpd)/Hpd: (dummy’ = dummy);
endmodule
References
1. A.P. Arkin and C.V. Rao. Stochastic chemical kinetics and the quasi-steady-state assumption:
application to the Gillespie algorithm. Journal of Chemical Physics, volume 11, pages
4999–5010, 2003.
2. A. Aziz, K. Kanwal, V. Singhal and V. Brayton. Verifying continuous time Markov chains.
Proc. 8th International Conference on Computer Aided Veriﬁcation (CAV’96), volume 1102
of LNCS, pages 269–276, Springer, 1996.
3. S. van Bakel, I. Kahn, M. Vigliotti and J. Heath. Modelling intracellular fate of FGF receptors
with Bio-Ambients. Electronic Notes in Computer Science, 2007.
4. M. Bernardo, R. Gorrieri and L. Donatiello. MPA: A Stochastic Process Algebra. Technical
report UBLCS-94-10, Laboratory of Computer Science, University of Bologna, 1994.
5. M. Bernardo and R. Gorrieri. A tutorial on EMPA: a theory of concurrent processes with
nondeterminism, priorites, probabilities and time. Theoretical Computer Science, volume
202, pages 1–54, 1998.
6. The BIOSPI Project, Available at http://www.wisdom.weizmann.ac.il/ biospi/.
7. B.J. Bornstein, J.C. Doyle, A. Finney, A. Funahashi, M. Hucka, S.M. Keating, H. Kitano,
B.L. Kovitz, J. Matthews, B.E. Shapiro and M.J. Schilstra. Evolving a Lingua Franca and
Associated Software Infrastructure for Computational Systems Biology: The Systems Biology
Markup Language (SBML) Project. Systems Biology, volume 1, pages 41–53, 2004.
8. BioModels Database. http://www.ebi.ac.uk/biomodels/
9. L. Bortolussi and A. Policriti. Modeling Biological Systems in Stochastic Concurrent Constraint
Programming, Proc. of WCB 2006, 2006.
10. L. Brodo, P. Degano and C. Priami. A Stochastic Semantics for Bio-Ambients. In Proc.
of International Conference on Parallel Computing Technologies (PaCT), LNCS, volume
4671, pages 22–34, Springer-Verlag, 2007.
11. R. Bundschuh, F. Hayot and C. Jayaprakash. Fluctuations and Slow Variables in Genetic
Networks, Biophys. J., volume 84, pages 1606–1615, 2003.
12. N. Busi and C. Zandron. Modeling and analysis of biological processes by membrane calculi
and systems. Proc. of the Winter Simulation Conference (WSC 2006).
13. M. Calder, S. Gilmore and J. Hillston. Automatically deriving ODEs from process algebra
models of signalling pathways. Proc. of CMSB’05, pages 204–215, 2005.
14. M. Calder, S. Gilmore, and J. Hillston. Modelling the inﬂuence of RKIP on the ERK signalling
pathway using the stochastic process algebra PEPA. T. Comp. Sys. Biology, VII,
volume 4230 of LNCS, pages 1–23, Springer, 2006.
15. M. Calder, A. Duguid, S. Gilmore and J. Hillston. Stronger computational modelling of
signalling pathways using both continuous and discrete-space methods. Proc. of CMSB’06,
volume 4210 of LNCS, pages 63–77, 2006.
16. M. Calder, V. Vyshemirsky, D. Gilbert and R. Orton. Analysis of Signalling Pathways using
Continuous Time Markov Chains. T. Comp. Sys. Biology, VI, volume 4220 of LNCS, pages
44–67, Springer, 2006.
17. Y. Cao, D.T. Gillespie and L. Petzold. Accelerated Stochastic Simulation of the Stiﬀ
Enzyme-Substrate Reaction. J. Chem. Phys., volume 123, number 14, pages 144917–
144929, 2005.
18. L. Cardelli. Brane Calculi - Interactions of Biological Membranes. Proceedings of Workshop
on Computational Methods in Systems Biology (CMSB04), Lecture Notes in Computer
Science, volume 3082, pages 257–278, 2005.
19. L. Cardelli, E. M. Panina, A. Regev, E. Shapiro and W. Silverman. BioAmbients: An Abstraction
for Biological Compartments. Theoretical Computer Science, volume 325, number
1, pages 141–167, Elsevier, 2004.
20. N. Chabrier-Rivier, F. Fages and S. Soliman. Modelling and querying interaction networks
in the biochemical abstract machine BIOCHAM. Journal of Biological Physics and Chemistry,
volume 4, pages 64–73, 2004.
21. D. Chiarugi, P. Degano and R. Marangoni. A Computational Approach to the Functional
Screening of Genomes. Plos Comput. Biol., volume 9, pages 1801–1806, 2007.
22. F. Ciocchetta, and J. Hillston. Bio-PEPA: an extension of the process algebra PEPA for
biochemical networks. Proc. of FBTC 2007, Electronic Notes in Computer Science, volume
194, number 3, pages 103–117, 2008.
23. F. Ciocchetta and J. Hillston. Bio-PEPA: a framework for the modelling and analysis of
biological systems. Technical Report of the School of Informatics, University of Edinburgh,
EDI-INF-RR-1231, 2008.
24. F. Ciocchetta and C. Priami. Biological transactions for quantitative models. Proc. of
MeCBIC 2006, ENTCS, volume 171, number 2, pages 55–67, 2007.
25. F. Ciocchetta and C. Priami. Beta-binders with Biological Transactions. Technical report
TR-10-2006, The Microsoft Research-University of Trento Centre for Computational and
Systems Biology, 2006.
26. G. Costantin, C. Laudanna, P. Lecca, C. Priami, P. Quaglia and B. Rossi. Language modeling
and simulation of autoreactive lymphocytes recruitment in inﬂamed brain vessels.
SIMULATION: Transactions of The Society for Modeling and Simulation International,
volume 80, pages 273–288, 2003.
27. V. Danos and C. Laneve. Formal molecular biology. Theor. Comput. Sci., volume 325, number
1, pages 69–110, 2004.
28. V. Danos and J. Krivine. Formal molecular biology done in CCS-R. Proc. of Workshop on
Concurrent Models in Molecular Biology (BioConcur’03), 2003.
29. V. Danos, J. Feret, W. Fontana, R. Harmer and J. Krivine. Ruled-based modelling of cellular
signalling. Proc. of CONCUR’07, LNCS, volume 4703, 2007.
30. V. Danos, J. Feret, W. Fontana and J. Krivine. Scalable simulation of cellular signalling
networks. Proc. of APLAS’07 2007.
31. V. Danos and J. Krivine. Reversible Communicating Systems. Proc. of Concur’04, LNCS,
volume 3170, pages 292–307, 2004.
32. C. Eichler-Jonsson, E.D. Gilles, G. Muller and B. Schoeberl. Computational modeling of
the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors.
Nature Biotechnology, volume 20, pages 370–375, 2002.
33. M.B. Elowitz and S. Leibler. A synthetic oscillatory network of transcriptional regulators.
Nature, volume 403, number 6767, pages 335–338, 2000.
34. T.S. Gardner, M. Dolnik, and J.J. Collins. A theory for controlling cell cycle dynamics using
a reversibly binding inhibitor. Proc. Nat. Acad. Sci. USA, volume 95, pages 14190–14195,
1998.
35. N. Geisweiller, J. Hillston and M. Stenico. Relating continuous and discrete PEPA models
of signalling pathways. To appear in Theoretical Computer Science, 2007.
36. D.T. Gillespie. Exact stochastic simulation of coupled chemical reactions. Journal of Physical
Chemistry, volume 81, pages 2340–2361, 1977.
37. A. Goldbeter. A Minimal Cascade Model for the Mitotic Oscillator Involving Cyclin and
Cdc2 kinase. Proc. Nat. Acad. Sci., volume 8, pages 9107–9111, 1991.
38. N. G¨otz, U. Herzog and M. Rettelbach. TIPP—a language for timed processes and performance
evaluation. Technical report 4/92, IMMD7, University of Erlangen-N¨urnberg, Germany,
1992.
39. E.L. Haseltine and J.B. Rawlings. Approximate simulation of coupled fast and slow reactions
for stochastic chemical kinetics. J. Chem. Phys., volume 117, pages 6959–6969,
2006.
40. J. Heath, M. Kwiatkowska, G. Norman, D. Parker and O. Tymchyshyn. Probabilistic Model
Checking of Complex Biological Pathways. Theoretical Computer Science (Special Issue
on Converging Sciences: Informatics and Biology), 2007.
41. H. Hermanns. Interactive Markov Chains: The Quest for Quantiﬁed Quality. Volume 2428
of LNCS, Springer, 2002.
42. J. Hillston. A Compositional Approach to Performance Modelling, Cambridge University
Press, 1996.
43. C.A.R. Hoare. Communicating sequential processes, Prentice hall, 1985.
44. C.C. Jou and S. Smoka. Equivalences, Congruences and Complete Axiomatizations of
Probabilistic Processes. Proc. of Concur’90, volume 458 of LNCS, pages 367–383, 1990.
45. M. Kanehisa. A database for post-genome analysis. Trends Genet., volume 13, pages 375–
376, 1997.
46. KEGG home page, available at http://sbml.org/kegg2sbml.html.
47. A.M. Kierzek and J. Puchalka. Bridging the gap between stochastic and deterministic
regimes in the kinetic simulations of the biochemical reaction networks. BIOPHYS J., volume
86, pages 1357–1372, 2004.
48. C. Kuttler, C. Lhoussaine and J. Niehren. A Stochastic Pi Calculus for Concurrent Objects.
Technical report RR-6076 INRIA, 2006.
49. C. Kuttler and J. Niehren. Gene regulation in the π-calculus: simulating cooperativity at
the lambda switch. Transactions on Computational Systems Biology VII, volume 4230 of
LNCS, pages 24–55, Springer, 2006.
50. P. Lecca and C. Priami. Cell Cycle control in Eukaryotes: a BioSpi model. Proced. of Bioconcur
03, 2003.
51. R. Milner. Communication and Concurrency. Prentice Hall, International Series in Computer
Science, 1989.
52. R. Milner. Communicating and mobile systems: the π-calculus. Cambridge University
Press, 1999.
53. F. Moller, Tofts, C. A Temporal Calculus for Communicating Systems. Proc. of Concur’90.
Volume 458 of LNCS, pages 401–415, 1990.
54. X. Nicollin and J. Sifakis. An Overview and Synthesis on Timed Process Algebras. In RealTime:
Theory in Practice. Volume 600 of LNCS, pages 526–548, 1991.
55. NuMSV model checker, available at http://nusmv.irst.itc.it.
56. N. Le Nov´ere, B. Bornstein, A. Broicher, M. Courtot, M. Donizelli, H. Dharuri, L. Li, H.
Sauro, M. Schilstra, B. Shapiro, J.L. Snoep, and M. Hucka. BioModels Database: a Free,
Centralized Database of Curated, Published, Quantitative Kinetic Models of Biochemical
and Cellular Systems. Nucleic Acids Research, volume 34, pages D689–D691, 2006.
57. G.D. Plotkin. A structural Approach to Operational Semantics. Technical report DAIMI
FM-19, Computer Science Department, Aarhus University, 1981.
58. C. Priami and P. Quaglia. Beta-binders for biological interactions. Proc. of CMSB’04, Volume
3082 of LNCS, pages 20–33, Springer, 2005.
59. C. Priami. Stochastic π-calculus. The Computer Journal, volume 38, number 6, pages 578–
589, 1995.
60. C. Priami, A. Regev, W. Silverman and E. Shapiro. Application of a stochastic namepassing
calculus to representation and simulation of molecular processes. Information Processing
Letters, volume 80, pages 25–31, 2001.
61. Prism web site. http://www.prismmodelchecker.org/
62. A. Regev. Representation and simulation of molecular pathways in the stochastic πcalculus.
Proceedings of the 2nd workshop on Computation of Biochemical Pathways and
Genetic Networks, 2001.
63. A. Regev and E. Shapiro. Cells as computation. Nature, volume 419, page 343, 2002.
64. A. Romanel, L. Dematte’ and C. Priami. The Beta Workbench. Technical report TR-03-
2007, The Microsoft Research-University of Trento Centre for Computational and Systems
Biology, 2007.
65. I.H. Segel. Enzyme Kinetics: Behaviour and Analysis of Rapid Equilibrium and SteadyState
Enzyme Systems, Wiley-Interscience, New-York, 1993.
66. SPIM, The stochastic Pi-Machine, Available at www.doc.ic.ac.uk/∼anp/spim/.
67. B. Strulo. Process Algebra for Discrete Event Simulation. PhD Thesis, Imperial College,
1993.
68. O. Wolkenhauer, M. Ullah, W. Kolch and K. H. Cho. Modelling and Simulation of IntraCellular
Dynamics: Choosing an Appropriate Framework. IEEE Transactions on NanoBioScience,
volume 3, pages 200–207, 2004.
69. C. Versari and N. Busi. Eﬃcient stochastic simulation of biological systems with multiple
variable volumes. Proc. of FBTC 2007, Electronic Notes in Computer Science, volume 194,
number 3, 2008.