REVIEW Executable cell biology Jasmin Fisher1'2 & Thomas A Henzinger2'3 Computational modeling of biological systems is becoming increasingly important in efforts to better understand complex biological behaviors. In this review, we distinguish between two types of biological models—mathematical and computational— which differ in their representations of biological phenomena. We call the approach of constructing computational models of biological systems 'executable biology', as it focuses on the design of executable computer algorithms that mimic biological phenomena. We survey the main modeling efforts in this direction, emphasize the applicability and benefits of executable models in biological research and highlight some of the challenges that executable biology poses for biology and computer science. We claim that for executable biology to reach its full potential as a mainstream biological technique, formal and algorithmic approaches must be integrated into biological research. This will drive biology toward a more precise engineering discipline. Over the past decade, biological research has reached a point where the accumulated data exceed the human capacity to analyze it. The vast information generated by DNA microarrays, genome sequencers and other large-scale technologies requires computer power for storage, searching and integration into a coherent picture. Systems biology, which combines biology, chemistry, physics, mathematics, electrical engineering and computer science, among other disciplines, aims to integrate the data concerning individual genes and proteins and to investigate the behavior and relationships of various elements in a biological system to explain how it functions1-3. At the core of systems biology lies the construction of models describing biological systems. Over the years, biologists have used diagrammatic models to summarize a mechanistic understanding of a set of observations. Despite the many benefits of such models, as well as their simplicity, they give a rather static picture of cellular processes. The growing need to translate these models into more dynamic forms that can capture time-dependent processes, together with increases in the models' scale and complexity, has prompted biologists to harness computers to build and analyze ever-larger models. The long-term vision is that large-scale models should revolutionize biology and medicine and enable design of new therapies. We distinguish between two types of models: (i) those that use computer power to analyze mathematical relationships between quantities and (ii) a new variety, resembling a computer program, which is central to an emerging field that we call executable biology. Here, we explain the differences between these two approaches, explore some recent executable biology models and emphasize some challenges facing this new field. 'Microsoft Research, Cambridge CB3 OFB, UK. 2School of Computer and Communication Sciences, EPFL, Lausanne CH-1015, Switzerland. 3Electrical Engineering & Computer Sciences, University of California at Berkeley, California 94720-1770, USA. Correspondence should be addressed to J.F. (jasmin. fisher@microsoft.com) or T.A.H (tah@epfl.ch). Published online 7 November 2007; doi:10.1038/nbtl356 Mathematical versus computational models Mathematical models, such as those based on differential equations, can represent many situations in the natural sciences and engineering. Although they were developed before computation became feasible on a grand scale, they are now profiting from our increasing computational ability. In contrast, computational models present a recipe—an algorithm— for an abstract execution engine to mimic a design or natural phenomenon. Such models are ideally suited to representing complicated chains of events. They have been used recently to model biochemical processes4-7, thymocyte development and cell fate determination during Caenorhabditis elegans development8-14. Mathematical and computational models (Box 1) differ in the languages in which they are specified. Whereas the former are specified in mathematics, typically equations, the latter are specified by computer programs, often very high-level code written in a modeling language such as Statecharts15 or Reactive Modules16. Consequently, the two types of models yield different kinds of insights. The differences are exemplified by comparing different modeling approaches to cell fate determination during C. elegans vulval development11,13,17, which concentrate on different aspects and consequently provide different kinds of insights into the same system. In contrast with a mathematical model17 that predicts rates of intercellular reactions and suggests a time frame in which cell fate determination is established, the computational models11,13 predict the timing and order of signaling events as well as new modes of interaction between the epidermal growth factor receptor and LIN-12/Notch signaling pathways. Mathematical models can be simulated and possibly solved. The basic entity of a mathematical model is the transfer function, which relates different numerical quantities to each other. A transfer function may be specified, for example, by a differential equation that relates an input to an output quantity. Complex mathematical models are constructed through the composition of transfer functions, yielding a network of interdependent quantities. If the constraints for individual transfer functions are relatively simple (e.g., linear differential equations), then mathematical models are amenable to mathematical NATURE BIOTECHNOLOGY VOLUME 25 NUMBER 11 NOVEMBER 2007 1239 REVIEW Box 1 Mathematical versus computational models A computational model is a formal model whose primary semantics is operational; that is, the model prescribes a sequence of steps or instructions that can be executed by an abstract machine, which can be implemented on a real computer. A mathematical model is a formal model whose primary semantics is denotational; that is, the model describes by equations a relationship between quantities and how they change over time. The equations do not determine an algorithm for solving them; in general, there may be many different solution algorithms and often such algorithms compute only approximate solutions. There is an entire sub-field of computer science that studies the relationships and differences between computational (operational) and mathematical (denotational) views of a system. Whereas for computational models, the computer implementation is by definition a faithful representation of the model, for mathematical models, there is a gap between the meaning of the model and its implementation on a computer. This gap needs to be bridged, for example, by proving that a certain algorithm solves a certain equation with a certain precision. This is not to say that for computational models, the representation gap magically disappears; rather it is shifted and reappears between the biological system and the model. Bridging that gap requires the adequacy of abstractions, not the faithfulness of implementations. Although computational models are further from the biological system and closer to the computer, a good computational model-if one can be found-may explain the mechanisms behind a biological system in more intuitive and more easily analyzable terms than a mathematical model. analysis. In more complicated cases (e.g., nonlinear or stochastic differential equations) and in very high-dimensional cases (where the number of variables is large), mathematical models require computational simulation to plot changes in quantities of substances overtime. Computational models can be executed. By contrast, the basic entity of computational models is the state machine, which relates different qualitative configurations ('states') to each other. A state machine may be specified by simple computer programs that define how, given certain events, one state is transformed into another. Complex computational models are constructed through the composition of state machines, yielding a reactive system (Box 2). The components of such a system represent biological entities, such as cells, which react to events involving neighboring components by state transformations. This is often useful in cell biology, because it requires the modeler to think in terms of'cause and effect' rather than rates of change. Such computational models can have a very large number of states, are often highly nonlinear and n on deterministic (Box 2) and are generally not amenable to mathematical analysis. Whereas an algorithm must be devised to simulate a mathematical model, a computational model prescribes the steps taken by an abstract machine and is therefore inherently and immediately executable. As the primary semantics of computational models are operational, we use the term execution instead of simulation—hence executable biology. The efficiency with which computers can execute instructions, which exceeds their ability to solve or simulate mathematical equations, makes them ideally suited to the execution of very large computational models. Quantitative versus qualitative modeling of biology. In biology, mathematical models for many quantitative relationships between variables, such as molecule concentrations and gene activity levels, have been devised to represent cell signaling pathways in a physically and biologically realistic manner and have been shown repeatedly to generate novel and useful hypotheses18-27. Such models, however, are difficult to obtain and analyze if the number of interdependent variables grows and if the relationships depend on qualitative events, such as a concentration reaching a threshold value. Computational models offer an effective alternative if precise quantitative relationships are unknown, if they involve many different variables or if they change over time, depending on certain events. Because computational models are qualitative, they do not presuppose a precision absent from the experimental data; because they are nondeterministic or stochastic, they allow many possible outcomes of a chain of events, which is often observed in biological systems. A significant advantage of qualitative models is that different models can be used to describe the same system at different levels of detail and that the various levels can be related formally. There are several natural levels of abstraction for describing biological systems using computational models. For example, the individual components may represent molecules or, at a less detailed level, they may represent cells. In such models, it may not be necessary to know exactly how a certain process (e.g., protein synthesis) achieves a certain output, provided that the behavior of the process can be defined qualitatively in a robust manner. Hence, computational models can be useful even when not every detail about a system is known. Computational models can be analyzed by model checking. Computational models can be used for testing and comparing hypotheses. Suppose that we have collected experimental data. A computational model represents a hypothesis about the mechanism that results in the data. An execution of the model can be used to check whether a possible outcome of the mechanism conforms to the data (Fig. 1). Owing to non determinism or stochastic choices, each repeated execution may yield a different outcome. Therefore it is impossible to check by executing the model whether all possible outcomes conform to the data, or whether the distribution of outcomes conforms to the data. This, however, can be done by a technique called model checking28, which systematically analyzes all of the infinitely many possible outcomes of a computational model without executing them one by one. Intuitively, this is done by exploring the states and possible state changes of a model, rather than by exploring all possible executions of the model. Model checking is effective, because there are usually many more executions than states. A state that may repeat can give rise to infinitely many possible executions. If model checking tells us that all possible outcomes of the computational model agree with the experimental data and that all experimental outcomes can be reproduced by the model, then the model represents a mechanism that satisfactorily explains the experimental data. If, on one hand, some of the experimental data cannot be reproduced, then the hypothesis is wrong. In this case, either the model must be improved to produce the additional outcomes that are present in the data, or completely revised. If, on the other hand, some outcomes of the computational model disagree with the experimental data, then the situation is more interesting. In this case, the mechanistic hypothesis represented by the model may be wrong and one may attempt to restrict the model so it does not produce outcomes that are not supported by the data, as recently illustrated by a model of crosstalk between Notch and Wnt signaling29 and a model of C. elegans vulval development13. Alternatively, the experimental data may be incom- 1240 VOLUME 25 NUMBER 11 NOVEMBER 2007 NATURE BIOTECHNOLOGY REVIEW Box 2 Glossary of terms Reactive system. A system that consists of parallel processes, where each process may change state in reaction to another process changing state. Biological systems are highly reactive (e.g., cells constantly send and receive signals and operate under various conditions simultaneously). Nondeterministic system. A system that may have several possible reactions to the same stimulus. In biological systems, for example, we can observe various patterns of cell fate under the same genotype. Hence, nondeterministic models capture the diverse behavior often observed in biological systems by allowing different choices of execution, without assigning priorities or probabilities to each choice. Distributed system. A system that consists of a collection of autonomous computers, connected through a network that enables the computers to coordinate their activities and to share the resources of the system, so that users perceive the system as a single, integrated computing facility. Concurrency. In computer science, a property of systems that consist of many processes running in parallel and sharing common resources. Tokens in Petri nets. These describe the presence or absence of a condition, a signal, or a resource. In the case of metabolic networks, the number of tokens in a place stands for the number of molecules of that metabolite existing at a given moment. Alternatively, tokens may correspond to a predefined unit measuring the amount of a substance, such as mole and millimole. Visual languages. Languages that allow programming with visual expressions (such as diagrams, drawings, animations and icons) as opposed to conventional textual languages that use only textual code. Visual programming environments provide graphical interfaces, which can be manipulated by the user in an interactive way. Reactive animation. A visual front-end that can be set up to yield interactive animation movies that follow in real time the events taking place during model execution and which can be manipulated and changed during run-time. Compositional analysis. Analysis through which the properties of a system can be derived from properties of its parts. The Delta-Notch decision. A signaling process where two equipotent cells that initially express equal amounts of Notch and its ligand Delta gradually express either Notch or Delta. The Notch-expressing cell receives activation signals from the Delta-presenting neighboring cell, resulting in these two cells adopting very different cell fates. Very-large-scale integration (VLSI). The process of creating integrated circuits by combining many thousands of transistor-based circuits into a single chip. plete and not exhibit some possible observations that would become evident if more data were collected. In this case, model checking can offer suggestions for additional, targeted experiments that would either confirm or invalidate the mechanistic hypothesis represented by the computational model (Fig. 1). Models for executable biology Here we summarize several research efforts aimed at realizing the executable biology framework. These are explained further in Box 3. of large regulatory networks. Boolean networks were also among the first formalisms for which algorithms were devised to infer genetic interactions from gene expression data36-41. From a computational point of view, it is difficult to compose larger models from smaller building blocks using Boolean networks. Hierarchical structuring, which makes the design and analysis of models simpler, is not possible in Boolean networks. Recently, Schaub et al. introduced an extension to Boolean networks, called 'qualitative networks', in an attempt to support hierarchical structuring29. Boolean networks for analyzing systems robustness and stability. Boolean networks were first introduced by Kauffman in the early 1970s30,31 and are the oldest form of executable biology models. Boolean networks approximate the dynamics of biological networks by considering each molecule (e.g., gene or protein) in the network as either active (1) or inactive (0); intermediate expression levels are neglected. Thus, the state of the system corresponds to the activation state of each of the molecules that make up the network. A molecule is considered to become active if the sum of its activations is larger than the sum of its inhibitions and inactive if the sum of its activations is smaller than the sum of its inhibitions (Fig. 2a). Hence, we obtain a system whose state evolves according to the postulated connections between its molecules (Fig. 2b). Despite this clearly simplified view of biological networks, several examples from models of genetic regulatory networks show that Boolean approaches give meaningful biological information32-34. Boolean networks have proved useful in analyzing system dynamics and reasoning about the stability and robustness of biological systems34,35. The possible states of a Boolean network are drawn as nodes of a graph andpossible state changes are drawn as edges. Loops in the graph are used to deduce which are the stable states of the system. The number of loops can be used to reason about system robustness (Fig. 2c). The strong simplifying assumptions on the structure and dynamics of a genetic regulatory system enable the efficient analysis Executable biology Experimental biology Adjust model Suggest new experiments Figure 1 The methodology of executable biology. Our view of executable biology is an interplay between collecting data in experiments (experimenta biology) and constructing executable models that capture some mechanistic understanding of how the systems under study work. By executing the models under various conditions that correspond to the experiments and by comparing the outcomes to the experimental data, one can identify discrepancies between hypothetical mechanisms and the experimental observations. These differences can be used to suggest new hypotheses, which serve to adjust the model and need to be validated experimentally, or new experiments, which can confirm or falsify modeling hypotheses. NATURE BIOTECHNOLOGY VOtUME 25 NUMBER 11 NOVEMBER 2007 1241 REVIEW Box 3 Computational models Boolean networks. These models are computational, because from a given activation state of all molecules, they prescribe which molecules become active in the next step. The execution of a Boolean network thus illuminates the causal and temporal relationships between the activation of different molecules. The main drawback of Boolean networks is that they do not support the composition of larger models from smaller ones. To allow integration of several interacting mechanisms, a model needs to offer a so-called composition operation. Interacting state machines and process calculi support such a composition operation. Petri nets. These models are computational, because from a given assignment of tokens to places, they prescribe which tokens can change place in the next step. Petri nets are more general than Boolean networks, because their execution semantics allows for true concurrency: several tokens may change place independently in the same step. Also, whereas Boolean networks are deterministic (that is, the outcome of execution is unique), Petri nets may be nondeterministic (execution may have many different outcomes; e.g. when there are multiple options to move tokens), or stochastic (that is, there is a probability distribution of possible outcomes), or both (when there are several different probability distributions of possible outcomes). Like Boolean networks, Petri nets do not support the composition of several networks. Interacting state machines. Several languages are available to specify these models, for example, the language of Reactive Modules. They are computational because from given states of all interacting machines, the model prescribes the next states of the machines. The interaction may be synchronous, when some machines change state simultaneously because of causal dependencies, or asynchronous, when some machines change state independently, in any order. Asynchronous interaction gives rise to nondeterminism, because different orders may give different results. Like Petri nets, interacting state machines may be nondeterministic, stochastic, or both. Unlike Boolean networks and Petri nets, interacting state machines are compositional, because several machines can be put together and will interact with each other. State machines can also be equipped with a hierarchical structure, as in Statecharts. A hierarchical machine may change state at a microlevel and several microsteps together make up a macro-step, which is a single state change of a higher-level machine providing a more abstract (that is, less detailed) view of the system. Process calculi. Like interacting state machines, these models are computational and compositional. They may be nondeterministic, stochastic, or both. The main difference between interacting state machines and process calculi is that in the former case, the most basic notion is that of a state and the model prescribes how the state changes, whereas in the latter case, the most basic notion is that of an event and the model prescribes how events either cause or are independent of other events. Although an event can be represented by a state change and a state by a history of events, the two views give rise to different styles of modeling. Hybrid models. The discrete part of these models is computational and the continuous part, mathematical. Executing the continuous change of variables, as described by differential equations, requires an algorithm that is independent of the model and often gives only approximate results. For example, the possibility of a discrete state transition may be missed if that possibility depends on the exact value of a continuous variable. Petri nets for analyzing biological networks. Petri nets represent a well-established technique in computer science for modeling distributed systems (Box 2). The model stresses concurrency (Box 2), which is important when modeling biological systems. A Petri net is a graph with two types of nodes: places, which represent the resources of the system, and transitions, which correspond to events that can change the state of the resources. The edges of the graph connect places to transitions and transitions to places (Fig. 3a). The state of the system is represented by places holding so-called tokens (Box 2); one place may hold multiple tokens. Thus, different assignments of tokens to places induce different states of the system. Transitions change the state of the system by moving tokens along edges. In a given state of the system, there may be more than one transition that can move a token, leading to nondeterminism. Petri nets are well-suited for modeling the concurrent behavior of biochemical networks42,43 and have been used to represent metabolic pathways44 and protein synthesis45,46. Figure 3 shows a Petri net model of the biosynthesis of tryptophan in Escherichia coli. Some of the main advantages of Petri nets are that they are visual, have different flavors and can be designed and analyzed by a range of tools. The simple type of Petri nets described above subsume Boolean networks: a place represents a molecule and a token at that place represents the active state of the molecule. Choosing to model Boolean networks using Petri nets has the added advantage of ready-made visual design and analysis tools. Recently an automatic translation of Boolean networks to Petri nets has been suggested47. However, much like Boolean networks, Petri nets do no support hierarchical structuring, which makes them difficult to use for large-scale models. More complex flavors of Petri nets provide additional possibilities in modeling. For example, in colored Petri nets, different-colored tokens induce multiple possible values for each place, allowing different activation levels to be assigned to resources. Colored Petri nets have been used to analyze metabolic pathways48. Stochastic Petri nets add probabilities to the different choices of the transitions and have been used to analyze signaling pathways49-51, where the number of molecules of a given type is represented by the color of a place and probabilities represent reaction rates. The Pathalyzer is a software tool that builds on the availability of Petri nets analysis and design tools to standardize and collect information about signal transduction pathways52. It uses analysis techniques for Petri nets to answer queries such as "what could cause the activation of a certain substance?", or "is it possible that a certain substance will reach activation in the absence of a different substance?"52. Interacting state machine models for biological mechanisms. State machine models define the behavior of objects over time, based on the various states that an object can be in over its lifetime. In other words, states are abstract situations in an object's life cycle. Interacting state machines can specify causal relationships between state changes in different machines. These models describe both how objects communicate and collaborate as well as how they behave under different circumstances. Usually, the state of an object is determined by the states of its parts. For example, the state of a cell is determined by the states of various genes and proteins, each having its own reaction to the presence or absence of some other molecules. Changes in the state of the cell are determined by the interdependent state changes of all parts. 1242 VOLUME 25 NUMBER 11 NOVEMBER 2007 NATURE BIOTECHNOLOGY REVIEW Cell Size Clb5,6 Mcm1/SFF Cdc20&Cdc14 Swi5 Figure 2 Boolean networks, (a) An isolated part of a Boolean network representing the behavior of one substance. Arrows indicates activation and bars denote inhibition. The next value of the substance is determined by the sum of activations minus the sum of inhibitions. In this example, if we denote the values of a1? a2, a3 and a4at time ŕ by a1? a2, a3 and a4, then the value of substance b at time r+ 1 will be 1 ifa1 + a2 — (a3+ a4) is positive and 0 otherwise. Sometimes arrows are given strengths and then we take the sum of strengths of activation arrows whose source is active (that is, set to 1) minus the sum of strengths of inhibition arrows whose source is active, (b) Simplified cell-cycle network of the budding yeast, (c) Analysis of the yeast cell-cycle network using Boolean networks. Each dot represents a state of the proteins in the system, where each of the proteins is either active or inactive. Each arrow represents a transition from one state to another. The blue transitions correspond to the cell-cycle sequence. Starting from any point in the graph, in order to avoid reaching the stable state at the bottom of the diagram, one would have to continuously perturb the system. Hence, the normal behavior converges fast to the stable state at the bottom of the diagram, corresponding to the Gl stationary state in which the cell awaits a signal that will start another round of division. This demonstrates that the yeast cell-cycle regulatory network is stable and robust for its function. Figures reproduced with permission from ref. 34. A hierarchical structure allows one to view a system at different levels of detail (e.g., whole organism, tissues, cells; Fig. 4a). Models of this kind have been used to model T-cell activation and differentiation8,9, as well as C. elegans development10,11,13,14. Interacting state machine models are particularly suitable for describing mechanistic models of biological systems that are well understood qualitatively. Such models do not require quantitative data relating to the number of molecules and reaction rates. They allow the creation of abstract high-level models and the application of strong analysis tools such as model checking. The possibility of hierarchical structuring is extremely useful in cases where the behavior is distributed over many cells and where multiple copies of the same process are executed in parallel. There are many different languages to express interacting state machine models. Using the visual language (Box 2) of Statecharts15, Kam et al. developed a model that described the various stages in the life span of a T-cell and the transitions between these stages8. The initial T-cell model was followed by a more extensive animated model of T-cell differentiation in the thymus9. A major advantage of Statecharts compared to other state-based formalisms, such as Reactive Modules16, is the fact that this language is visual. The user can draw states and state changes and the tool automatically creates an executable model, enabling relatively easy and intuitive programming even for nonspecialists. Efroni et al. used reactive animation (Box 2)9,53, where a reactive system drives the display of animation software to visualize the model. These studies were followed by ongoing efforts to model C. elegans development10,11,13,14, which used Statecharts and a visual language called Live Sequence Charts54 and more recently a language called Reactive Modules16 that supports compositional analysis techniques (Box 2). Figure 3 Petri nets, (a) A simple, standard Petri net. The circles denote places, whereas the boxes denote transitions. The distribution of tokens (black dots) in the places at a given time defines a marking. Transitions change the marking by removing a token from each ncoming arrow and adding a token to each outgoing arrow, (b) Simplified logical regulatory graph for the biosynthesis of tryptophan in E. coli. Each node of the regulatory graph represents an active component: tryptophan (Trp), the active enzyme (TrpE) and the active repressor (TrpR). The node marked by a rectangle accounts for the import of Trp from external medium. All nodes are binary (that is, can take the value 0 or 1), except Trp, which is represented by a ternary variable (taking the values 0, 1, 2). Arrows represent activation and bars denote inhibition, (c) Petr net of the Trp regulatory network. Each of the TrpE TrpR TrpE four components of b is represented by two complementary places and all the different situations that lead to a change of the state ( modeled by one of the nine transitions (tjjg). Figures reproduced with permission from ref. 46. NATURE BIOTECHNOLOGY VOtUME 25 NUMBER 11 NOVEMBER 2007 1243 REVIEW VPCs I VPC I [else]> " W[V jlWutation] VulNotMutated... Vul Mutated Vul gene shuts-off NoVulSignal [!IS_IN( Primary )&&!IS_IN (Tertiary)] MuvNotMutated> Action on entry: Gen (Muvl n hi bi ti on MuvMutated Muv gene shuts-off No Muvlnhibition Lateral Signal [v sdVulSignal On> Action on entry: Gen(LateralSignal) P8.p P7.p P6.pP5.p p4p'P3.p pap P7.p pe.p P5.p p4p,P3p "^^^ ■f ^P3.p P4.p NP5.p ■jŕ v pep , P8.p P7.p ^f P4P ^■y pe.p - P8.p P7.p -^^. p'p •■ . P6PPSPP4P P3.P- ■ t J f T* -jCl«\. P8.p * + P8.p ' ĺ P7.p , P7.p , =jk "\ P6P P5p - Vl P6P P5P * Vi P4.p -*V P4.p ■ ■ .* P3.p 1 P3.p ** . Figure 4 Interacting state machine models, (a) Hierarchical diagram representing a tissue comprising three cells of type A and three cells of type B. All cells of the same type work according to the same program, (b) Diagrammatic mechanistic model for the signaling events underlying vulval precursor cell (VPC) fate specification. IS, inductive signal; LS, lateral signal; cell fates: 1°, primary; 2°, secondary; 3°, tertiary, (c) Statecharts model of a vulval precursor cell. Rectangles represent states and arrows represent transitions. A short arrow exiting a small circle marks the initial state of the object. The circled C denotes a condition between two states. A transition from a condition is taken if the guard is true. Areas separated by dashed lines are concurrent components; that is, the cell is present in all these components simultaneously. Figure reproduced with permission from ref. 11. (d) Experimental validation of the loss of sequential activation in lin-15 mutants, as predicted by the computational model. The pictures visualize cell fate specification in C. elegans using blue and yellow fluorescent proteins (EGL-17::CFPand LIP-1-YFP) expressed during activation of the inductive and lateral signaling pathways, respectively. The upper and middle rows show examples of wild-type animals at mid and late L2 stage, expressing the EGL-17 marker in P6.p and the LIP-1 marker in P5.p and P7.p, respectively. The lower row shows examples of a lin-15 mutant at the late L2 stage showing simultaneous expression of both markers in P5.p and P7.p. Figure reproduced from ref. 13. Fisher et al.11 created a formal dynamic model of vulval fate specification based on the proposed mechanistic model of Sternberg and Horvitz55. This work revealed that state-based mechanistic modeling is well-suited to developmental genetics and can provide new insights into the temporal aspects of cell fate specification during C. elegans vulval development. Subsequent work13 was based on the more sophisticated current understanding of vulval fate specification (Fig. 4b). Model checking allowed us to test the consistency of the current conceptual model for vulval precursor cell fate specification with an extensive set of observed behaviors and experimental perturbations of the vulval system. The analysis of this model predicted new genetic interactions between the signaling pathways involved in the patterning process, together with temporal constraints that may further elucidate the mechanisms underlying precise pattern formation during animal development. These predictions were validated experimentally (Fig. 4). Process calculi for executing molecular processes. A different approach stresses the importance of interactions between molecules as the driving force for biological processes. As opposed to previous approaches where execution results in a sequence of states, here execu- tion is defined through a sequence of events and their causal dependencies. This approach uses process calculi—languages that have been developed to model networks of communicating processes56. In this context, a process is a state machine for which some state changes can be observed as events. Events provide communication between processes. To model biological behaviors, a process is associated with a molecule and many copies of the same process run in parallel to simulate the existence of many molecules. Communication between processes is used to model interactions between molecules. For example, the activation of a certain molecule by the energy released from ATP hydrolysis can be modeled by two processes and a communication event between them as follows: a process associated with ATP proceeds from the 'ATP-state' to the 'ADP-state', a process associated with the specific molecule proceeds from the 'inactive-state' to the 'active-state', and the two state changes are simultaneous because of a communication event. The inactivation of the same molecule can then be modeled by an independent state change. This modeling approach is applicable to molecular interactions that occur stochastically. It can be used for the detailed analysis of the stochastic behavior of a molecular interaction network using model 1244 VOLUME 25 NUMBER 11 NOVEMBER 2007 NATURE BIOTECHNOLOGY REVIEW Figure 5 Pi calculus, (a) Possible molecular nteractions in the fibroblast growth factor (FGF) pathway. Figure reproduced from reference 63. (b) Partial summary of reactions between the components presented in the diagram, including reaction rates obtained from the literature, (c) A fragment of the stochastic pi-calculus code (in the textual format of BioSPl) relating to FGF receptor (FGFR) and its interactions with FGF and Src. (d) The BioSPl system inputs the pi-calculus code and performs simulations using the Gillespie algorithm. The curves show the amount of relocated FGFR and Grb2 bound to FGFR over time, for an average often simulations. Figures reproduced from ref. 72. ŕ FRS2 ▼ ▼ ▼ X ▼ ▼ ▼ GBR2 SHP2 SRC FGFR p. SOS i GBR2 Phosphorylation 1 FGF binds/releases FGFR FGF + FGFR -» FGFR:FGF /f, = 5e + 8 M-1s-1 FGF + FGFR ^ FGFR:FGF /r2 = 0.002 s-1 2 Phosphorylation of FGFR (whilst FGFR:FGF) FGFR:FGF + FGFR1 -» FGFR:FGF + FGFR1P k3 = 0A s-< FGFR:FGF + FGFR2 -» FGFR:FGF + FGFR2P /r4 = 0.1 s-1 3 Dephosphorylation of FGFR FGFR1P^FGFR1 /r6 = 0.1 s-1 FGFR2P -» FGFR2 k, = 0.1 s-1 4 Effectors bind phosphorylated FGFR SRC + FGFR1P -» SRC:FGFR /^=1e + 6M-1s-1 SRC + FGFR1P <- SRC:FGFR k,= 0.02sŕ GRB2 + FGFR2P -» GRB2:FGFR /f9=1e + 6M-1s-1 GRB2 + FGFR2P <- GRB2:FGFR /r10= 0.02 s-1 5 Relocation of FGFR (whilst SRC:FGFR) SRC:FGFR-»relocFGFR ^^l.-le-Ss-' FGFR ::= FGFR_FGF01 FGFR_Phl01 FGFR_Ph1 c FGFR_Ph1, checking. Currently, owing to scalability issues, such an analysis can be applied only to relatively small models. However, the information and insights provided by this kind of analysis suggest that it is beneficial to create oversimplified models of large and complex networks. Initial work along this line used a process calculus called pi-calculus56 as a modeling language for molecular interactions5. These studies included the modeling of the receptor tyrosine kinase and the mitogen-activated protein kinase signal-trans-duction pathway and the construction of a simulation environment called BioSPl. The stochastic pi-calculus57 was later used to model a gene regulatory positive-feedback loop4. Many other studies have followed this direction, including experiments using the ambient calculus58 and the brane calculus59. The methodology has also been used to model transcription factor activation and glycolysis60, Raf kinase inhibitory protein inhibition of extracellular signal-regulated kinase61 and more recently, the mitogen-activated protein kinase cascade (including a comparison with a similar model using differential equations)62 and the fibroblast growth factor pathway (Fig. 5) (and its extensive analysis __________ using stochastic simulation and probabilistic Figure 6 Hybrid systems, (a) In hybrid system models, discrete state changes modify the way continuous variables change. The discrete changes are governed by the values of the continuous variables. Each discrete state has its own differential equations, which govern the dynamics of continuous variables, (b) Influence diagram for Delta-Notch protein signaling network in a hexagonal close-packed lattice. (c) Plots of the continuous changes of the levels of Delta (x:) and Notch (x2) proteins as governed by changing differential equations that match three different discrete states, blue, brown and green. An isolated single cell will converge to a steady state where it has a high level of Delta protein and a low level of Notch protein. (d) From left to right, layout of four cell Delta-Notch network showing the variables associated with each cell; biologically consistent steady states of the four-cell network, a shaded eel represents a high steady-state concentration of Delta protein and low level of Notch protein; an unshaded cell has low Delta protein and high Notch protein at steady state. Figures reproduced from ref. 66. bindjgf!{reljgf, reloc4}, FGFRJGF,; % binding FGF relod ?[], true . % relocation rel_fgf?[], FGFR_FGF0; % releasing FGF ph1?[], FGFR_FGF1; % phosphorylation relod ?[], reloc4![], true; % relocation phi![], FGFR_Ph11 . % phosphorylation bind_src!{rel_scr1, rel_scr2}, FGFR_SRC; % binding Src dph1![], FGFR_Ph11 . % dephosphorylation : rel_src1?[], FGFR_Ph11; % releasing Src dph1![], rel_src2! [], FGFR_Ph10; % dephosphorylation (and releasing SRC) reloc![], relod![], reloc2![], true. % relocation model checking)63. A recent review7 discusses the process calculus approach in depth. Hybrid models combining mathematical and computational models. Hybrid systems combine in a single framework variables that span discrete and continuous domains64. The discrete variables are controlled by discrete state changes that may depend on the values of continuous variables. The changes in continuous variables are governed by differential equations (preferably linear), which depend on discrete states (that is, the combined value of all discrete variables determines the discrete state and the discrete state determines which differential equations are to be used to govern the rates of change Notch Delta y > Notch -<—U- Delta T . l^iVT T Vlotch xt > 0 a x2 > -hD ■- I xt > 0 a x2 < -hD ■ ■■■ Equilibrium Delta protein Delta protein «) <& U NATURE BIOTECHNOLOGY VOtUME 25 NUMBER 11 NOVEMBER 2007 1245 REVIEW Biology Signaling pathway Differential equations