SASB 2015 Formal Biochemical Space with Semantics in Kappa and BNGL T. Děd, D. Šafránek, M. Troják, M. Klement, J. Salagovič, L. Brim Faculty of Informatics, Masaryk University Brno, Czech Republic Abstract Biochemical Space (BCS) has been introduced as a semi-formal notation for reaction networks of biological processes. It provides a concise mapping of mathematical models to their biological description established at a desired level of abstraction. In this paper, we first turn BCS into a completely formal language with rigorously defined semantics by means of a simplified Kappa calculus. On the practical end, we support BCS with translation to BNGL, a well-known practically used rule-based language. Finally, we show the current status of BCS defined for cyanobacteria processes. 1 Introduction To provide a rigorous representation of complex biological processes without congesting the users with overcomplicated syntax, we have enriched our online platform for modelling of cyanobacteria processes, e-cyanobac-terium 2 , with a semi-formal textual notation called Biochemical Space (BCS) [10]. BCS represents reaction networks of the studied processes and provides a concise mapping of mathematical models to a precise biological description that is established at a consortium-agreed level of abstraction. The concept of BCS makes a crucial methodological part of Comprehensive Modelling Platform (CMP), a general platform for computational modelling and analysis of biological processes, first introduced in [15] as a concept for unambiguous representation of internally consistent reduced 1 This work has been supported by the Czech Science Foundation grant No. GA15-11089S. 2 http://www.e-cyanobacterium.org, http://www.cyanotearn.org T. Ded et al. mathematical models of oxygenic photosynthesis [17] and further refined to a general online modelling platform as described in [11]. In general, the main goal of BCS as a part of CMP is to simplify systems level model-building tasks by providing simple and clear way of notation easily understandable by in silico modellers on the one end, and experimental biologists on the other end. Fig. 1. Graphical representation of Comprehensive Modelling Platform (CMP). In [1] we have shown that rule-based methods can be directly used for rewriting existing kinetic models of oxygenic photosynthesis into a compact non-redundant form obtained by applying a set of automatised syntactic reductions defined in Kappa [4]. That achievement lead us further to employ rule-based definition of biological processes as the framework for qualitative description of the consortium-agreed understanding of chemical reactions behind the processes. Existing quantitative models can be then mapped onto the qualitative rule-based BCS. BCS borrows concepts from two worlds - the formal rule-based languages and semi-formal reaction network annotation bases such as KEGG [9] The BCS language is defined with a clear relation to BNGL [6], a practical tool-supported rule-based language compatible with Kappa. Since the most important requirement of the consortium-driven modelling platform is a simple-to-use format well-adjusted to the suitable level of abstraction employed for biological process description, we were not able to directly employ any of the well-established rule-based languages and rather defined a new language with a clear relation to the existing formats. In particular, for our purpose BNGL and Kappa consider too many details. The most important fact is that BNGL requires to specify bindings inside the complex structures. This demands binding sites specification for each molecule and unique labelling for each interaction. In BCS, these structural details are abstracted out. It is enough just to know that molecules interact and form a complex while abstracting from the details. Another issue is the fact that existing formalisms consider biological entities as agents all defined at the same level of abstraction. In BCS we allow hierarchical 2 T. Ded et al. construction of agents from simple molecules to composite structures and complexes. Finally, the algebraic representation of Kappa and BNGL goes quite far from common chemical notation and is not human readable. BCS attempts to avoid this. In [10] we have presented general ideas behind BCS. The language has been defined as a semi-formal notation. In this paper, we turn BCS into a completely formal language with clearly defined syntax and semantics (by embedding to Kappa). We define the relation of BCS and BNGL which allows us to translate specification between both languages. In Section 7, we show the current status of BCS description implemented for cyanobacteria. 1.1 Related Work On the bioinformatics side, the closest format to BCS is KEGG [9]. In contrast to BCS, KEGG does not support rule-based description allowing compact representation of combinatorial states. Moreover, it does not support logical organization of entities and reactions into an organism-specific hierarchy that may significantly simplify understanding of the complex processes driving the organisms physiology and its interaction with the environment. Since the notation relies on a simple textual base and focuses on a simple but still reasonably precise and compact description maintainable by biologists, the format of BCS specification is compliant with KEGG. BCS should be also compared to the well-acclaimed standard provided by SBML [8,13] that might be also used for representation of a biochemical space. BCS completely avoids issues related with dynamical models. As an annotation platform purely focused on process-level description, BCS goes beyond SBML level 2 in generalization of entities to hierarchical agents, in introducing entity states, and in dealing with related combinatorial explosion. These issues are solved in detail by rule-based approaches [4,7] and there is a draft of a package for SBML level 3 in preparation [18] (multi). In comparison with process algebraic languages treating chemical reactions mechanistically as communicating concurrent processes [2,5], BCS keeps a purely qualitative level of description closed to chemical reactions and remains as simple as possible to cover the consortium-agreed level of abstraction. The language defined in [16] targets a similar level of abstraction as BCS. However, it is intended more as a programming language for biological systems than an annotation format. 2 Background We define simplified Kappa (kappas) using a process-like notation as is presented in [3], syntax and the notions of structural equivalence and match- 3 T. Ded et al. ing are entirely taken from [3]: expression E::=Q\a,E site s ::= agent a ::= N(a) site name n ::= x e 5 agent name N ::= A G »4 internal state c:=e|me¥ interface cr ::= 0 | s, cr binding state A ::= e | i G IN where -4 is a finite set of agent names, S is a finite set of site names, V is a finite set of values representing modified states of the sites. We use notation a (a) for a signature associated to an agent a. An agent is denoted by its name and its interface. Interface consists of a sequence of sites. denotes a site x with internal state i and binding state A. If the binding state is e then the site is free, otherwise it is bound. By convention, when a binding or internal site is not specified, e is considered. Note that full Kappa is richer. It allows a binding state meaning a free or bound site, denoted by a question mark. We also omit rates from the rules. Definition 2.1 An expression is well-formed if a site name occurs only once in an interface and if each binding state (V e) present in the expression occurs exactly twice. The set of all well-formed expressions is denoted as £. We assume a standard structural equivalence on well-formed expressions that treats as equivalent all expressions differing in order of sites in interfaces, order of agents in expression, and naming of binding sites. A rule is a pair of expressions £/, Er (usually written as £/ —» Er). The set of all rules is denoted as 1Z. The left hand side £/ of the rule describes the solution taking part in the reaction and the right hand side Er describes the effects of the rule. The rule can be either a binding rule or a modification rule. A binding (unbinding) rule binds two free sites together (or unbinds two bound sites). A modification rule modifies some internal state [3]. Matching is a relation denoted as |=c SxS and defined inductively in the left column below. Replacement is a function £ xS —» £ defined in the right column below: EhEr [E'[Er]] whenever 3E' e [£].£' |= £/. An agent signature is a pair of mappings E : .4. —> 2s and 7 : AxS —> 2V. Informally E restricts for each agent name A e A the set of site names that can occur in an agent with name A and I restricts the set of internal states a particular site can attain. Additionally expressions are treated as complete if their agents employ all sites and states of the signature. For formal definitions see [1] or the original paper [3]. A rule-based model M is a tuple (L,I,1Z) such that 1Z satisfies the signature An initialised model Mq is a pair (M,Ej) where M = (L,I,1Z) is a rule-based model and £, is an expression representing the initial solution such that £; is complete for the signature (E, J). Definition 2.2 A state space of an initialised model Mq = (M,Ej) is a pair (SoLutions(A/fo) £ £-r Reactions(A/fo) Q£ x£) defined inductively as follows: (i) [£;] 6 Solutions(7W0) (ii) [£] 6 Solutions(7W0) and 3r e Rules(M).t([E],r) = [E'] if and only if [E'] 6 SoLutions(A/fo) and ([£], [£']) 6 Reactions(A/fo) In BNGL, agents are called molecules and they are specified in a similar manner as in kappas. An example of a molecule is A( x~n! 1) where the site x has an internal state n (separated from the site by a tilde) and a binding state is 1 (separated by the exclamation mark). The BNGL alternatives to agent signatures are called molecule types and they are defined using the notation demonstrated in the following example: A(x~n~b, y~n~a). Here, the allowed internal states of the individual sites are separated by tildes (site x can have an internal state n or b). Rules are described by the llis -> rhs notation (or Ihs <-> rhs in the case of reversible rules). The individual model components (molecule types, reaction rules, seed species, observables) are in BNGL separated by the begin keyword and end keyword. 3 Biochemical Space BCS provides well described biological background for mathematical models of processes taking place in specific organism. Complete BCS model provides a connection between existing ontologies and partial mathematical models. A BCS model is represented in a form of a textual file. This 5 T. Ded et al. file offers a human readable format of BCS which can be easily edited in a dedicated editor and visualised on the website. First part of a BCS model is represented by a set of entities (to be compliant with process-algebraic frameworks we call entities agents), while the second part contains rules (abstractly represented chemical reactions defined over the set of entities). In our case study a consortium of scientists is involved in modelling several cyanobacterial processes and in establishing of the respective BCS model. When building the BCS model, emphasis is put on well-defined and complete annotations. Therefore, links to relevant ontologies must be specified for each entity and rule. Unique IDs provided by ontologies can help to automatically detect duplicities. IDs are also used to create hypertext links to related ontologies on the web, thus providing a one part of the already mentioned connection between ontologies and models. At this moment, links to KEGG, ChEBI, CyanoBase [14] and other databases are supported. A single entity or a rule can have multiple links to several external databases. An example is presence of a particular entity in ChEBI as well as in KEGG. In the case of annotating enzymatic rules, an EC number (here acting as a descriptor of the rule mechanism behind the catalytic reaction) is associated to the enzyme via a respective KEGG ID. For an entity that represents a protein, annotation can be enriched with a sequence of genes that encode the protein. A single link (in our case to genome browser in CyanoBase) is created for every gene separately. If more than one gene sequence is present, additional information about every particular sequence is specified in terms of notes. In general, NOTES records carry internal information about an entity or a rule. Finally, a comma is used as a separator between records within links and notes fields. In most cases, ontologies contain general information about entities and about rule mechanisms. If this is not available, verbal description of the role of an entity or a rule can be specified directly within the particular record. Example 3.1 Description, links, and notes information for an entity. DESCRIPTION: Protein involved in hydrolysis of N-acetylated amino acids LINKS: KEGG::ec3.5.1.14, CBS::slrl653, CBS::sll0100 NOTES: ChEBI link is missing The fact that most fields in entity and rule definitions are tightly coupled with information from linked ontologies is the reason why we have started with describing annotation attributes. In the first place, one of these attributes is ENTITY NAME, which is taken from ontologies or follows the standard naming conventions. ENTITY ID of every entity is fixed by the consortium. KEGG ID, ChEBI ID or internal ID is used if no reasonable ID is available. IDs of rules are internal and assigned automatically. 6 T. Ded et al. Example 3.2 Complete information given for an atomic entity. ENTITY ID: HC03 STATES: {-.+\ LOCATIONS: cyt, liq COMPOSITION: ENTITY NAME: hydrogencarbonate CLASSIFICATION: small molecule DESCRIPTION: Plays major role in carbon concentrating mechanism (CCM). LINKS: CHEBL: 17544 ORGANISM: Synechococcus elongatus PCC 7942 An entity in our interpretation is a bounded space (a so-called compartment) or a structural part of a specific organism. BCS covers a hierarchy of entities ranging from small molecules {atomic entities (agents)) through composite structures (structure entities (agents)) to large complex molecules (complex entities (agents)). Our goal is to make BCS as simple as possible. In existing ontologies, entities residing in several different states (oxidised, reduced, etc.) are usually treated as separate entities, thus causing the total number of entities to be enormous. To reduce this complexity, the concept of STATES is defined in BCS. All states are enclosed in curly brackets and they are comma-separated. The relationship entity-state is of the form parent-child. All information about an entity is inherited by its states unless it is specified explicitly for a particular state. The ID of an entity and its state in curly brackets form together a unique entity identifier. If no state is specified, the default value is the 'neutral' (ground) state. BCS extends the traditional concept of compartmentalisation with a hierarchy at the level of entities. A particular entity can reside in several different compartments as specified in the LOCATIONS field. Additionally, the CLASSIFICATION field specifies the type of an entity in a sense of functional or structural characterisation. An entity can be a part of a structurally more complex entity. We consider two kinds of composite entities: structure and complex entities. Structure entity represents partially specified composite species (we employ the partial composition operator '\', e.g., ps2(chl\yz\oec)), a photosystem complex partially specified with prosthetic groups of interest ps2(chl\yz\oec)). Complex entity represents fully specified composite species (we employ the full composition operator e.g., a homodimer KaiC.KaiC). The composition of a composite entity is given in the field COMPOSITION. We employ a so-called localisation operator to express the fact that an entity plays a role of a location for the structurally simpler entity (e.g., chlorophyl chl located in a photosystem ps2 is written chl :: ps2). In Example 3.3 there is a protein KaiC specified as a partial composition of two amino acids 7 T. Ded et al. of interest - serine (S) and threonin (T). In such a configuration, serine-phosphorylated state of KaiC can be written as S{p}:: KaiC. Example 3.3 Complete information given for a structure entity. ENTITY ID: KaiC STATES: LOCATIONS: cyt COMPOSITION: S | T ENTITY NAME: circadian clock protein kinase KaiC CLASSIFICATION: enzyme DESCRIPTION: Monomer component representing a core component of the circadian clock system. LINKS: uniprot::Q79PF4, cyanobase::Synpcc7942_l 216 ORGANISM: Synechococcus elongatus PCC 7942 Rules are specified by rule equations enriched with additional annotation information. When defining a rule equation, identifiers of substrates and products are used to make the notation of rules compact. Every entity appearing in a RULE EQUATION has to be followed by the localisation operator associating it with a particular compartment. This is important especially for rules that act on both sides of a membrane. That way, a rule is always precisely localised in/inbetween compartments. A natural stoichiometric coefficient can be placed before any entity in a rule equation. Irreversible and reversible rules are distinguished by the operators '=>', '<=>'. The '+' symbol is used as a separator between individual substrates and individual products. A rule can also have an assigned classification. Rule classification assigns a list of higher level biophysical processes in which the rule is involved. Example 3.4 Complete information for a rule. RULE ID: NADPH oxid. RULE EQUATION: NADPH :: cyt + 5h{+] ::cyt + pq :: cym => => NADP{+]:: cyt + ih{+]:: pps + pqhl:: cym MODIFIER: NDH1 RULE NAME: plastoquinone reduction in the cytoplasmic membrane CLASSIFICATION: reduction, oxidation DESCRIPTION: Oxidation of NADPH and reduction of plastoquinone in the cytoplasmic membrane. In some cases, emphasis on a detailed description leads to very complex BCS models. Abstraction of some processes is therefore needed to keep BCS models as simple as possible. To this end, rules expressing enzymatic reactions are considered in a simplified form. In fact, there should be at least 8 T. Ded et al. two different rules describing an enzymatic reaction (one for a substrate binding and another for a catalytic step). Instead, since an enzyme is not affected during the reaction, it is affiliated to the rule as a MODIFIER. However, it is difficult to define precise meaning of a modifier in this case. We rather treat the modifier field informally as an entity which has to be present for the rule to be enabled. The exact reaction mechanism of an enzyme is not always clear and therefore it is abstracted out (see Example 3.4). Example 3.5 A rule employing structure entity state change. RULE ID: FGFR2 phosph. RULE EQUATION: Thr{u]:: FGF :: FGFR2 :: cyt o Thr{p]:: FGF :: FGFR2 :: cyt MODIFIER: NDH1 RULE NAME: FRRG2 threonine residue (de)phosphorylation CLASSIFICATION: phosphorylation, dephosphorylation DESCRIPTION: FGF enzyme is phosphorylated on threonine residue in FGFR2 complex. Higher abstraction comes into account when several electrons play 'musical chairs' inside protein complexes. The issue is that parts of processing protein complex can have different unstable states during a short period of time. When one tries to define all rules among these proteins, combinatorial explosion of the number of states of the complex arises. Not all of these combinations are biologically correct. Even when excluding biologically inadmissible cases, the number of states is still enormous. For the purpose of BCS, we introduce a solution inspired by the enzymatic rule mentioned above. We treat a protein complex as a structure entity on which structurally simpler entities change its state (not necessarily proteins) and we abstract from background processes. We can see a particular rule as a change of a state of a structure entity (see Example 3.5). 4 Formal Definition of Biochemical Space At the general level, BCS is a complex annotation format for description of the reaction network including textual annotation and links to existing annotation bases. The rigorous (rule-based) core of the language is made by declaration of chemical entities and reaction rules. The annotation part has been described in [10]. Here we define the formal core of BCS and associate it with a formal semantics by means of translating BCS rules into kappas. Model in BCS is defined in similar way as a kappas model. First, we define syntax of expressions describing agents formally in BCS. Next, the notion of agent signature is defined that allows to specify restrictions on the general expressions. Finally, agents are used as elementary constructs in definition of BCS rules. 9 T. Ded et al. 4.1 BCS Agents Let Na, Nj, Nx, Nc, Ns be mutually exclusive finite sets of atomic, structure, complex, compartment, and state names, respectively. Agents are defined hierarchically starting from atomic agents that are of two kinds: class atoms representing (abstract) class agents and object atoms representing (concrete) object agents. Class atomic agents allow us to represent compactly objects that can reside in several selected (or even all possible) states whereas object atomic agents represent concrete objects specified with the particular state. Every atomic agent must be accompanied with a physical compartment within which it is considered. Atomic agent expressions have the following syntax: atomic agent a ::= aD | a0 state signature b ::= b, s \ s class atom aD ::= cfi :: c state s ::— n £ Ms object atom a0 ::= a{s):: c compartment c ::— n £ Mc atom name a ::= n e Ma From now on, we restrict ourselves to atomic agents where the state signature can be treated as a set (a state cannot occur more than once in a state signature). This restriction is motivated by the aim to keep the language as simple as possible. Treating the state signatures as multisets would lead to confusions and is actually not needed to clearly represent biological objects. Definition 4.1 Let a, a' be atomic agents. We define the structural equivalence of atomic agents by claiming a = a' whenever a, a' are (i) two identical object atoms or (ii) two identical class atoms that differ only in the order of states in the state signature. Notation 4.2 • We denote s 6 b the fact that s is included in the state signature b. • For better readability of class atomic agents, we enclose non-trivial state signatures into curly brackets. I.e., we write instead of oft whenever b contains more than one state. Since our notion of atomic agents considers concrete objects as well as general classes of objects, we need to formally relate a class with concrete objects that instantiate it. To this end, we define compatibility relation < that is stronger than structural equivalence. Definition 4.3 Let a, a' be atomic agents. We say a is compatible with a', written a | o rule expression r ::= 0 | p e :: c | p e :: c + T stoichiometry p ::— n £ ]N+ rule expression item e ::= I e2 I e3 basic rule agent e\ ::= a | T | X shallow rule agent a :: T | T :: X deep rule agent £3 ::= a :: T :: X We assume that a single rule cannot appear more than once in the list R (every rule must be unique). In relation to that, we can use the notation r 6 R to refer to rules in R. See Section 7 for examples of several rules. Rule expressions allow more extensive syntax in terms of the localisation operator The localisation operator is intended for allowing an alternative way of expressing the hierarchically constructed agents. The main idea is to allow zooming into individual parts of a complex or a structure agent. E.g., for a structure agent %(a\{s\\a^) c residing in compartment c we can use the notation a2{t] T(a1{s}i«|!'tl):: c to refer explicitly to a concretisation of its subagent otj- This notation is fully equivalent with the original form %(ai{s}\a2{t}) and can be therefore considered as an alternative way to concre- 15 T. Ded et al. tise a structure agent. Similarly, the concept of localisation is applied also to complex agents. E.g., for a complex agent A(ai{s}).B(a2s'^) ■■■■ c we can zoom to some of its components and express its concretisation such as B(a2{t}) A(a\{s}).B( a2s,t^) ■■■■ c. In this case, the notation B(a2{t}) A(ai{s}).B(a^) is equivalent to the complex agent A(ai{s}).B(a2{t}). In every rule subexpression p e :: c the compartment c makes the scope for every agent appearing in e. In particular, every agent inside e is assumed to be assigned the compartment c. To simplify the resulting language to construct reasonable expressions only, we restrict ourselves to rules where the operator '::' respects constraints given in Definition 4.15. Definition 4.15 Let e be a rule expression item that appears in a rule r e R. The rule expression e is well-defined iff the following constrains are satisfied: (i) If a :: r(yp) is a subexpression of e for some a, r,yp then there must exist a' 6 yp such that a < a'. (ii) If T :: X is a subexpression of e for some T, X then there must exist T'eX such that T < T. Every rule agent in a shallow or deep form can be translated to an equivalent basic form. Formally, this is given in Lemma 4.16. Lemma 4.16 (Rule Flattening) Let (ET, Ex) be a signature and R a set of rules. Every rule r e R that includes some rule agents in shallow or deep form can be reduced to a rule r'eR where every rule agent is in basic form. For every rule agent e in r, the reduction is done by replacing e with e' in the following way: (i) If e = a :: T where T = x(yp) for some x,yp then there must exist a' e yp such that a < a'. Then we set e' = x(y'p) where y'p is constructed from yp by replacing a' e yp with a. (ii) If e = T :: X where X = yjr then there must exist J' e yjr such that T < J'. Then we set e' = y| where y'^ is constructed from yjr by replacing J' 6 yjr with T. (iii) If e = a :: T :: X then the steps (i,ii) above are applied successively. Definition 4.17 We say that a rule r 6 R satisfies agent signature (ET, Ex), written r |= (ET, Ex), iff every structure or complex agent that appears as a rule agent in r satisfies agent signature (ET, Ex). To increase succinctness, we extend the language with a variable Iv. A variable can be assigned to any rule in place of an agent. Evaluation of a variable within a rule is realised for every occurrence of Iv. For a given signature (ET, Ex) we assume that after evaluating the variable, every rule 16 T. Ded et al. agent must satisfy the signature and is well-defined. Moreover, the scope of the compartment is always uniquely given in the rule expression. The domain of a variable is assumed to be considered as a set (values are not repeated). An example is given in Example 7.2. The extended syntax is the following: extended rule equation variable variable value atomic variable value structure variable value complex variable value extended basic rule agent extended shallow rule agent extended deep rule agent r ..- var ::--i: 02 ■ 03 : r I T O r ; var z a, -- T, -- X, ex I e2\ ■e3\ where m is the number of connected graphs on n nodes. Stringency of a rule makes a relevant difference. Stringency stands for degree of universality or specificity of the rule, i.e. the width of the applicability. In both languages, this can be solved by context of the rule. However, it is not always suitable to list the whole context. An example can be phosphorylation in circadian clock (Example 7.2). It can occur on each KaiC protein which is included in a complex. For this purpose there is 'site\+' notation in BNGL which requires the protein to be in a bound state. Since BCS does not provide binding sites, this cannot be used. To this end, we employ the localisation operator '::' in rule agents. It allows to nest rule agents to strengthen the stringency. Moreover, we have introduced variables in BCS. A variable in a reactant is denoted Iv and can be specified as a set of atomic, structure or complex agents to which the rule can be applied. The last fact that is worth noting is construction of complex structures. In BNGL, each complex is identified with an exact structural notation which does not allow hierarchical construction. BCS provides the notion of structure and complex agents, this allows to form a hierarchy of the agents. Additionally, when defining a rule with quantities of interacting entities, in BNGL it is necessary to enumerate all of them whereas in BCS the stoi-chiometry is allowed in standard way. m 19 T. Ded et al. 6.1 BCS and BNGL translation It is possible to translate from BCS to BNGL. This can be achieved by the application of finite set of transformation steps. The procedure is analogous to translation to kappas (Section 5). Translation BNGL to BCS is also possible, but the bond information is discarded in the process. In particular, all binding operations have to be removed. The only problem is the '!+' notation in BNGL which requires a bond for a entity. This kind of bond has a high level of abstraction. For this reason we cannot translate such a rule. However, every rule in BNGL with '!+' can be expanded to finite number of rules where this operator is omitted. Instead of an unknown bond, there are enumerated rules each accompanied with a known binding partner. In that case, the variable Iv is added to the BCS rule containing all the enumerated binding partners. 7 Case Study BCS makes a part of CMP and is implemented at e-cyanobacterium.org and currently covers several functional modules of cyanobacteria. To support translation between BCS and BNGL, we have implemented a set of scripts 1 allowing to translate a BCS model to BNGL and vice versa. 7.1 Metabolism Metabolism forms the backbone of cyanobacteria cellular processes and in BCS covers the largest part of cyanobacteria network. We distinguish two groups of entities in metabolism - enzymes and metabolites. Enzymes drive metabolic reactions and therefore are assigned to rules as modifiers. On the other hand, metabolites are small molecules playing a role of substrates or products of metabolic rules with no enzymatic function. Both groups are involved in rules which occur mostly in the cell cytoplasm, therefore the majority of their entities uses cytoplasm as a compartment. Example 7.1 A rule from metabolism of cyanobacteria. It is visualised in Figure 2 in the upper left part. 1 http://www.e-cyanobacterium.org/downloads/ 20 T. Ded et al. '■ (oxa I pace täte] nh; /- ^™+h * A, nad' _,/' (Citrate ] -#— n coa \ nadph+h' [eis-Ac q n itate j I [ l5pcitrate ) I nadp ■ [ 2-Qxoglutarate _X Succinate semialdehyde y Fig. 2. Part of the reaction scheme of metabolism in cyanobacteria [12]. RULE ID: (S)-malate:NAD{+) oxidoreductase RULE EQUATION: malate :: cyt + NAD{+):: cyt <^> oxaloacetate :: cyt + + NADH ::cyt + H{+]:: cyt MODIFIER: RULE NAME: malate oxidation CLASSIFICATION: oxidation, reduction DESCRIPTION: Process is involved in citric acid cycle. Malate is oxidised to oxaloacetate producing NADH from NAD{+). In metabolism, there are approximately 770 rules. Despite the fact that there are plenty of molecules, the rules are very specific. In our proposed rule-based language it means the mapping of reactions to rules is almost one-to-one (reaction-like rules). The stringency of rules is high which is what allows them to be applied only to a narrow group of molecules. It causes that compaction of metabolism in rules brings almost no benefits. 7.2 Circadian clock Circadian clock is one of the most complex processes in cyanobacteria BCS. Its core is formed by three proteins KaiA, KaiB and KaiC. Moreover, KaiC contains two phosphorylation sites serine S431 and threonine T432. These sites can be phosphorylated independently, but only if KaiC is in a complex. All these proteins can interact with each other in predetermined ways and form specific complexes. All processes inside the cell are then controlled by periodical formation/dissociation and (de)phosphorylation of these complexes. 21 T. Ded et al. Fig. 3. Circadian clock cycle constructed by 17 BCS rules including complex formation, translation and phosphorylation. Example 7.2 Serine (de)phosphorylation on KaiC protein. In Figure 3 it is (also with threonine phosphorylation) responsible for all short cycles. RULE ID: serine (de)phosph. RULE EQUATION: S{u):: KaiC :: IX :: cyt o S{p]:: KaiC :: IX :: cyt; ?X = {KaiC6,KaiA2C6,KaiB6C6,KaiAiC6,KaiA6B6C6} MODIFIER: RULE NAME: Serine phosphorylation and dephosphorylation CLASSIFICATION: phosphorylation, dephosphorylation DESCRIPTION: KaiC molecule is phosphorylated/dephosphorylated on serine amino acid. This process can appear whenever KaiC is in one of the complexes enumerated in variable X. Owing to the fact the proteins can form homohexamers or smaller complexes, and each of these complexes can interact with others, it causes combinatorial explosion. Together there is possible formation of six different complexes containing KaiC: KaiC6, KaiB6C6, KaiA2C6, KaiA4;C6, KaiA4iB6C6 and KaiA6B6C6. Each protein KaiC can occur in four different states because of the two phosphorylation sites. Considering all six complexes and also other rules in circadian clock, we obtain combinatorial explosion of different species in the system. To achieve representation of the whole system it is inefficient to enumerate each single conformation. To this end, we employ the capability of BCS rules. Example 7.3 Formation of KaiB6C6 complex is important for circadian clock. It can be seen in the upper left part of Figure 3, where it forms the bigger cycle {with all other complex formation rules). 22 T. Ded et al. RULE ID: KaiB6C6 form./diss. RULE EQUATION: 6 KaiB :: cyt + KaiC6 ::cyt<3 KaiB6C6 :: cyt MODIFIER: RULE NAME: KaiB6C6 complex formation and dissociation CLASSIFICATION: complex formation, dissociation DESCRIPTION: Formation of complex from six KaiB molecules and KaiC hexamer and its dissociation. KaiC6 represents specification of complex composed from six KaiC proteins, KaiB6C6 complex of six KaiC and six KaiB respectively. LINKS: doi::10.1093/emboj/l 8.5.1137, doi::10.1016/j.febslet.2009.11.021 In BCS we have achieved complete, human readable representation of circadian clock using only 17 rules (examples are rules in Example 7.2 and Example 7.3). Regarding the defined agents, it gives us over 500 different distinguishable entities, while in BNGL similar number of rules describing the same system gives us almost 25000 entities. 7.3 Photosynthesis Photosynthesis represents part of BCS of cyanobacteria. The process occurs in a specific folds of the cell membrane called thylakoid membrane. Photosynthesis serves as the source of energy taken from light and transferred into production of ATP and NADPH molecules with oxygen resulting as a by-product. Fig. 4. Reaction scheme of photosynthesis in cyanobacteria. The lumen processes are displayed under thylakoid membrane while stroma processes are above. 23 T. Ded et al. Example 7.4 A rule from photosynthesis. Oxidation reaction on PSII. RULE ID: PSII oxidation RULE EQUATION: ps2(oec{3+] | yz{+}):: tint o ps2(oec{A+] \ yz{n}):: tint MODIFIER: RULE NAME: oxidation from S3 to S4 of oxygen evolving complex CLASSIFICATION: oxidation DESCRIPTION: Oxidation occurring on photosystem II. Electron is transferred from oxygen evolving complex oec to active tyrosine yz. Entities of photosynthesis BCS are represented by several complex proteins (enzymes) residing on the thylakoid membrane (tlm) in the cell. Since the thylakoid membrane encloses the inner-membrane space called lumen (lum) where H20 molecules are processed, there are basically three locations defined for this set of entities. Rules occurring in the lumen, cytosol and in-between the thylakoid membrane and these locations have classical form. However, electron transfer reactions occurring in the structure of complex processes lead to combinatorial explosion of all possible conformations. Photosynthesis is constructed from approximately 30 agent definitions which are interacting in over 60 rules. From the rule-based point of view, this representation is somewhere between circadian clock (Section 7.2) and metabolism (Section 7.1). It means the number of generated distinguishable entities arises compared to defined agents, but not as dramatically as in circadian clock. However, photosynthesis is a good example of rule-based process. 8 Conclusions We have lifted the annotation format BCS to a formal language compatible with well-established rule-based languages. We have given an automated support for translating between BCS and BNGL. Currently, BCS is used on the portal e-cyanobacterium.org for description of cyanobacteria processes. In case study section we have shown the language is suitable for rule-based systems as well as reaction-based systems. For future work we plan to define an operational semantics directly without an intermediate format. This will enable implicit description of the model states space and allow to gain from the compact representation and take the advantages of on-the-fly model checking. 24 T. DĚD et al. References [1] Brim, L., J. Nižnan and D. Šafránek, Compact representation of photosynthesis dynamics by rule-based models, Electronic Notes in Theoretical Computer Science 316 (2015), pp. 17 - 27, 5th International Workshop on Static Analysis and Systems Biology (SASB 2014). [2] Ciocchetta, F. and J. Hillston, Bio-pepa: A framework for the modelling and analysis of biological systems, Theoretical Computer Science 410 (2009), pp. 3065 - 3084. [3] Danos, V., J. Feret, W. Fontána and J. Krivine, Abstract interpretation of cellular signalling networks, in: Verification, Model Checking, and Abstract Interpretation, 9th International Conference, VMCAI 2008, San Francisco, USA, January 7-9, 2008, Proceedings, 2008, pp. 83-97. [4] Danos, V. and C. Laneve, Formal molecular biology, Theor. Comput. Sci. 325 (2004), pp. 69-110. [5] Dematté, L., C. Priami and A. Romanel, The blenx language: A tutorial, in: M. Bernardo, P. Degano and G. Zavattaro, editors, Formal Methods for Computational Systems Biology, Lecture Notes in Computer Science 5016, Springer Berlin Heidelberg, 2008 pp. 313-365. [6] Faeder, J., M. Blinov and W. Hlavaček, Rule-Based Modeling of Biochemical Systems with BioNetGen, in: I. V. Maly, editor, Systems Biology, Methods in Molecular Biology 500, Humana Press, 2009 pp. 113-167. [7] Hlavaček, W. S., J. R. Faeder, M. L. Blinov, R. G. Posner, M. Hucka and W. Fontána, Rules for modeling signal-transduction systems, Sci. STKE 2006 (2006), p. re6. [8] Hucka, M. et al., The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models, Bioinformatics 19 (2003), pp. 524-531. [9] Kanehisa, M. and S. Goto, Kegg: Kyoto encyclopedia of genes and genomes, Nucleic Acids Research 28 (2000), pp. 27-30. [10] Klement, M., T. Děd, D. Šafránek, J. Červený, S. Muller and R. Steuer, Biochemical space: A framework for systemic annotation of biological models, in: Proceedings of the 5th International Workshop on Interactions between Computer Science and Biology (CS2Biol4) (2014), pp. 31-44. [11] Klement, M., D. Šafránek, T. Děd, A. Pejznoch, L. Nedbal, R. Steuer, J. Červený and S. Muller, A comprehensive web-based platform for domain-specific biological models, in: Proceedings of the fourth International Workshop on Interactions between Computer Science and Biology (CS2Bio 13) (2013), pp. 61-67. [12] Knoop, H., M. Grundel, Y. Zilliges, R. Lehmann, S. Hoffmann, W. Lockau and R. Steuer, Flux balance analysis of cyanobacterial metabolism: The metabolic network of synechocystis sp. pec 6803, PLoS Comput Biol 9 (2013), p. el003081. [13] Le Novere, N. et al., Minimum information requested in the annotation of biochemical models (MIRIAM), Nat Biotech 23 (2005), pp. 1509-1515. [14] Nakao, M., S. Okamoto, M. Kohara, T. Fujishiro, T. Fujisawa, S. Sato, S. Tabata, T. Kaneko and Y. Nakamura, Cyanobase: the cyanobacteria genome database update 2010, Nucleic Acids Research 38 (2010), pp. D379-D381. [15] Nedbal, L., J. Červený and H. Schmidt, Scaling and integration of kinetic models of photosynthesis: Towards comprehensive e-photosynthesis, in: Photosynthesis in silico, Advances in Photosynthesis and Respiration 29, Springer, 2009 pp. 17-29. [16] Pedersen, M. and G. Plotkin, A language for biochemical systems: Design and formal specification, in: C. Priami, R. Breitling, D. Gilbert, M. Heiner and A. Uhrmacher, editors, Transactions on Computational Systems Biology XII, Lecture Notes in Computer Science 5945, Springer Berlin Heidelberg, 2010 pp. 77-145. [17] Šafránek, D., J. Červený, M. Klement, J. Pospíšilová, L. Brim, D. Lazar and L. Nedbal, E-photosynthesis: Web-based platform for modeling of complex photosynthetic processes, Biosystems 103 (2011), pp. 115-124. [18] Zhang, F. and M. Meier-Schellersheim, "SBML Level 3 Package Specification: Multistate, Multicomponent and Multicompartment Species Package for SBML Level 3 (Version 1, Release 0.4 - Draft)," SBML.org (2015). 25