Derivation, identification and validation of a computational model of a novel synthetic regulatory network in yeast

Please download to get full document.

View again

of 22
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Document Description
Systems biology aims at building computational models of biological pathways in order to study in silico their behaviour and to verify biological hypotheses. Modelling can become a new powerful method in molecular biology, if correctly used. Here we
Document Share
Document Tags
Document Transcript
  J. Math. Biol.DOI 10.1007/s00285-010-0350-z  MathematicalBiology Derivation, identification and validationof a computational model of a novel syntheticregulatory network in yeast Lucia Marucci  ·  Stefania Santini  · Mario di Bernardo  ·  Diego di Bernardo Received: 4 November 2009 / Revised: 19 May 2010© Springer-Verlag 2010 Abstract  Systems biology aims at building computational models of biologicalpathways in order to study in silico their behaviour and to verify biological hypothe-ses. Modelling can become a new powerful method in molecular biology, if correctlyused. Here we present step-by-step the derivation and identification of the dynamicalmodel of a biological pathway using a novel synthetic network recently constructed inthe yeast  Saccharomyces cerevisiae  for In-vivo Reverse-Engineering and ModellingAssessment. This network consists of five genes regulating each other transcription.Moreover, it includes one protein–protein interaction, and its genes can be switchedon by addition of galactose to the medium. In order to describe the network dynam-ics, we adopted a deterministic modelling approach based on non-linear differentialequations. We show how, through iteration between experiments and modelling, it ispossible to derive a semi-quantitative prediction of network behaviour and to betterunderstand the biology of the pathway of interest. Electronic supplementary material  The online version of this article(doi:10.1007/s00285-010-0350-z) contains supplementary material, which is available to authorized users. L. Marucci ( B )  ·  D. di BernardoTelethon Institute of Genetics and Medicine (TIGEM), 80131 Naples, Italye-mail: marucci@tigem.itD. di Bernardoe-mail: dibernardo@tigem.itL. Marucci  ·  S. Santini  ·  M. di Bernardo  ·  D. di BernardoDepartment of Computer and Systems Engineering,Federico II University, 80125 Naples, ItalyM. di Bernardoe-mail:  1 3  L. Marucci et al. Keywords  Mathematical modelling  ·  Synthetic biology  ·  Hill functions  · Parameters identification Mathematics Subject Classification (2000)  92-08  ·  93A30  ·  93B30 1 Introduction The emerging discipline of Synthetic Biology can be defined as the engineering of biology. Up to now, two major goals have been actively investigated: the building of new biological networks in the cell that perform a specific task [e.g. periodic expres-sion of a gene (Elowitz and Leibler 2000) or genetic switching (Gardner et al. 2000)] and the modification of networks that occur in nature in order to achieve some desiredfunctionalities(e.g.productionofaspecificcompoundusefulformedicalapplicationsRoetal.2006).SyntheticBiologyisaninterdisciplinaryarearequiringadeepsynergybetween biology, biotechnology and nanotechnology on one side and mathematicalmodelling, information technology and control theory on the other. Such combinationof disciplines is needed to construct robust and predictable synthetic networks. In par-ticular, quantitative models are needed for a precise and unambiguous description of synthetic circuits (Kaznessis 2007). Mathematical models allow to rigorously com- parehypothesesandobservations,thusprovidingadditionalinsightintothebiologicalmechanisms. Model derivation from experimental data can be carried out followingthree major approaches: white-box, black-box and gray-box. In white-box modelling,the model and parameter values are entirely derived from first principles, while inblack-box modelling the model is completely derived from input–output data. Thethird alternative, the so-called gray-box approach (Nelles 2000), combines the twoabove approaches. Specifically, first principles are used to partially derive the modelstructure, while parameters or terms in the model are determined by measurementdata. The approach we use in this paper is a gray-box one. In this case, modellingentails three main steps to be executed iteratively: (i) derivation of the model equa-tions; (ii) identification of the model parameters from experimental data and/or liter-ature; (iii) validation [or invalidation (Anderson and Papachristodoulou 2009)] of the model.Step (i) requires introducing simplifying hypothesis and choosing a proper for-mal framework. A huge variety of mathematical formalisms have been proposed inthe literature, such as directed graphs, Bayesian networks, Boolean networks andtheir generalizations, ordinary and partial differential equations, qualitative differen-tial equations, stochastic equations, and rule-based formalisms (see, for example, DeJong 2002; Ventura et al. 2006; Szallasi et al. 2006 and references therein). Determin- isticformalisms are commonly used to describe the average behaviour of a populationofcells(DeJong2002).Theyhavebeenshowntobeviablefortheanalysisofsyntheticnetworks in a number of works (e. g. Elowitz and Leibler 2000; Gardner et al. 2000; Kramer et al. 2004; Tigges et al. 2009; Stricker et al. 2008). The reaction mechanism is described by applying the law of mass action: the rate of any given elementary reac-tion is proportional to the product of the concentrations of the species reacting in theelementary process (reactants) (Alon 2006). The DEs modelling approach is based on  1 3  Derivation, identification and validation of a computational model the following biological assumptions: the quantified concentrations do not vary withrespect to space and they are continuous functions of time. These assumptions holdfor processes evolving on long time scales in which the number of molecules of thespecies in the reaction volume is sufficiently large. In different experimental cases,approachesbasedonpartialdifferentialequationsorstochasticmodelswouldbemoreappropriate (Szallasi et al. 2006). Step (ii) is required to estimate unknown model parameters from the availableexperimental data. A crucial issue that arises when estimating model parameters, isthe structural identifiability (Walter and Pronzato 1997). The notion of identifiabilityaddresses feasibility of estimating unknown parameters from data collected in well-defined stimulus-response experiments (Cobelli and Distefano 1980). Structural non- identifiability is related to the model structure independently from experimental data.In contrast, practical non-identifiability also takes into account the amount and thequality of measured data used for parameters calibration. Of note, a parameter that isstructurallyidentifiablemaystillbepracticallynon-identifiable,duetotheunavoidablepresence of noise in biological experimental data (Raue et al. 2009). Unfortunately,while being well assessed in the case of linear dynamical systems, the identifiabilityanalysisofhighlynon-linearsystemsremainsanopenproblem(BoubakerandFourati2004).The parameter estimation problem can be formulated from the mathematical view-pointasaconstrainedoptimizationproblemwherethegoalistominimizetheobjectivefunction, defined as the error between model predictions and real data. In biologicalapplications, the objective function usually displays a large number of local optimaas measurements are strongly affected by noise. For this kind of problems, classicaloptimization methods, based on gradient descent from an arbitrary initial guess of the solution, can be unfeasible and show convergence difficulties. The above con-siderations suggest to look at stochastic optimization algorithms, like evolutionarystrategies, which rely on random explorations of the whole space of solutions, are notsensitivetoinitialconditionsandavoidtrappinginlocaloptimalpoints.InMolesetal.(2003),theperformanceofbothlocalandglobal-searchoptimizationmethodsiscom-pared in the identification of the 36 unknown parameters of a non-linear biochemicalnetwork. The authors show that only evolutionary strategies are able to successfullysolve the parameters estimation problem, while gradient based methods tend to con-verge to local minima. Among the stochastic techniques, genetic algorithms (GA)(Mitchell 1998) provide a very flexible approach to non-linear optimization. Theirapplication showed good results in the parametrisation of synthetic networks (Weberet al. 2007; Tigges et al. 2009). Finally, step (iii) is required to check the validity and usefulness of the model, thatis to evaluate its ability in predicting the behaviour of the actual physical system.Theoretically, the modeller should be confident that the formalism is able to describe all  input–output behaviours of the system (Smith and Doyle 1992). This condition can be never guaranteed, since it would require an infinite number of experiments.However, it is possible to test a necessary condition: the model is able to describe  allobserved   input–output behaviours of the system (Smith and Doyle 1992). To this aim,one possible approach is to use a cross-validation like procedure (Arlot and Celisse2010) by splitting the experimental data in two sets: one of them is used for the  1 3  L. Marucci et al. parameter identification, while the other one is used to validate the predictive powerof the model. If the predictive performance of the model is not satisfactory, it is invali-dated(AndersonandPapachristodoulou2009).Thus,itisnecessarytorefinethemodel(for example, by increasing the level of detail) and/or to perform new experiments,going back to step (i) of the modelling procedure.In what follows, we describe the gray-box modelling of IRMA network (Fig. 1) asarepresentativeexampleofthemodellingproblemforasmallbiologicalpathway,andpresent the detailed derivation of the model whose equations were given in Cantoneet al. (2009). Note that usually, when a genetic circuit is presented to the Synthetic Biology community, only the best performing mathematical model is showed withoutprovidinganydetailofhowthemodelwasobtained.Here,instead,weprovidethe“his-tory” of the derivation of the final model and experimental data-set, highlighting themajor modelling choices and challenges faced during the process. The aim is to builda model able to correctly predict the dynamical changes in the mRNA concentrationsof the five network genes following both internal and external perturbations (i.e. geneover-expression, galactose addition, etc.). We choose differential equations (DEs) tomodel the dynamics of the genes. The task is challenging since, to our knowledge,up to now quantitative DEs mathematical models have been developed for syntheticnetworks composed of a smaller number of genes than IRMA (e.g. Gardner et al.2000; Elowitz and Leibler 2000; Tigges et al. 2009; Kramer et al. 2004; Stricker et al. Fig.1  Diagramofthenetwork.Schematicdiagramofthesyntheticgenenetwork.Newtranscriptionalunits( rectangles ) were built by assembling promoters with non-self coding sequences. Genes were tagged at the3  end with the specified sequences. Each cassette encodes for a protein ( circle ) regulating the transcriptionof another gene in the network ( solid lines )  1 3  Derivation, identification and validation of a computational model 2008). Regarding theidentifiabilityissue,weadopted thenovel approach proposedbyRaue and colleagues (see Raue et al. 2009), able to deal with non-linear models withan high number of parameters. This approach exploits the profile likelihood and isabletodetect bothstructuralandpracticalnon-identifiable parameters.Fortheparam-eters identification, in order to cope with the high number of unknown quantities, thenoise of experimental data and the presence of non-linear aspects in the optimisationprocedure, we used an ad hoc designed hybrid genetic algorithm (see Sect.3 for fur-ther details). Finally, for the model validation, we tested the predictions of the modelagainst data not used for the parameters identification. 2 Results and discussion 2.1 Derivation of model equations: step (i)For each species in the network, i.e. each mRNA (italic capital letters) and corre-spondent protein concentration (roman small letters), we wrote one equation whichexpresses its change in time as the result of production and degradation: d  [ CBF1 ] dt  =  α 1  +  v 1  H  + − ( [ Swi5 ] ,  [ Ash1 ]; k  1 , k  2 , h 1 , h 2 ) − d  1 [ CBF1 ] ,  (1) d  [ Cbf1 ] dt  =  β 1 [ CBF1 ] − d  2 [ Cbf1 ] ,  (2) d  [ GAL4 ] dt  =  α 2  +  v 2  H  + ( [ Cbf1 ]; k  3 , h 3 ) − d  3 [ GAL4 ] ,  (3) d  [ Gal4 ] dt  =  β 2 [ GAL4 ] − d  4 [ Gal4 ] ,  (4) d  [ SWI5 ] dt  =  α 3  +  v 3  H  + ( [ Gal4  free ]; k  4 , h 4 ) − d  5 [ SWI5 ] ,  (5) d  [ Swi5 ] dt  =  β 3 [ SWI5 ] − d  6  [ Swi5 ] ,  (6) d  [ GAL80 ] dt  =  α 4  +  v 4  H  + ( [ Swi5 ]; k  5 , h 5 ) − d  7 [ GAL80 ] ,  (7) d  [ Gal80 ] dt  =  β 4 [ GAL80 ] − d  8 [ Gal80 ] ,  (8) d  [  ASH1 ] dt  =  α 5  +  v 5  H  + ( [ Swi5 ]; k  6 , h 6 ) − d  9 [  ASH1 ] ,  (9) d  [ Ash1 ] dt  =  β 5 [  ASH1 ] − d  10 [ Ash1 ] .  (10)The first two terms, on the right-hand side of the mRNA equations, represent theproduction, where  α  are the basal transcription rates;  v  are the maximal transcriptionrates modulated by the Hill functions,  H  + ( y ; k  , h )  =  y h y h + k  h  ,  H  − (  z ; k  , h )  =  k  h y h + k  h  1 3
Similar documents
View more...
Search Related
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks