Difference between revisions of "ARL/ICB Crash Course in Systems Biology, August 2010"
(9 intermediate revisions by 2 users not shown)  
Line 37:  Line 37:  
This session will provide an introduction to modeling of core processes in biology using differential equations. The first lecture will focus on the cell as a multilayered feedback system. Scientists need to build ad hoc models to analyze the cellular complexity in a quantitative manner. Ordinary differential equations (ODEs) are a good choice when considering high copy number molecules in a well mixed environment. Several transcriptional regulation pathways in bacteria, for instance, have been successfully modeled with ODEs. We will overview the general methods to build macroscopic deterministic models of biological processes, referring to the trp operon and the iron starvation pathways as application examples. Classical control and dynamical systems analysis tools (equilibria, bifurcations and frequency analysis) will also be reviewed. Finally, we will provide some fundamental notions from the theory of chemical reaction networks. The second lecture will close the modeling process cycle by covering the model identification theory and practice. Once a model structure (system of equations) is proposed, the validity of this structure should be tested by means of an identifiability analysis, e.g. making use of sensitivity analysis tools that can help to identify critical and negligible parameters and to establish a parameter ranking. If experimental data are available, parameter estimation is then carried out, leading to a first model. Otherwise a set of experiments must be devised by means of optimal experimental design and performed before the parameter estimation. The quality of these estimators should be assessed by checking the correlation between them and computing their confidence intervals. This initial model must be validated with new experiments, which in most cases will reveal a number of deficiencies. Thus, a new model structure and/or a new experimental design must be planned, and the process is repeated iteratively until the validation step is considered satisfactory.  This session will provide an introduction to modeling of core processes in biology using differential equations. The first lecture will focus on the cell as a multilayered feedback system. Scientists need to build ad hoc models to analyze the cellular complexity in a quantitative manner. Ordinary differential equations (ODEs) are a good choice when considering high copy number molecules in a well mixed environment. Several transcriptional regulation pathways in bacteria, for instance, have been successfully modeled with ODEs. We will overview the general methods to build macroscopic deterministic models of biological processes, referring to the trp operon and the iron starvation pathways as application examples. Classical control and dynamical systems analysis tools (equilibria, bifurcations and frequency analysis) will also be reviewed. Finally, we will provide some fundamental notions from the theory of chemical reaction networks. The second lecture will close the modeling process cycle by covering the model identification theory and practice. Once a model structure (system of equations) is proposed, the validity of this structure should be tested by means of an identifiability analysis, e.g. making use of sensitivity analysis tools that can help to identify critical and negligible parameters and to establish a parameter ranking. If experimental data are available, parameter estimation is then carried out, leading to a first model. Otherwise a set of experiments must be devised by means of optimal experimental design and performed before the parameter estimation. The quality of these estimators should be assessed by checking the correlation between them and computing their confidence intervals. This initial model must be validated with new experiments, which in most cases will reveal a number of deficiencies. Thus, a new model structure and/or a new experimental design must be planned, and the process is repeated iteratively until the validation step is considered satisfactory.  
−  '''Lecture 1: Core Processes in Cells (Elisa Franco, Caltech)'''  +  '''Lecture 1: [http://www.cds.caltech.edu/~elisa/CDS27042010/ARLICBWorkshopElisaFranco.pdf Core Processes in Cells (Elisa Franco, Caltech)]''' 
This lecture will provide an introduction to modeling of core processes in biology using differential equations. Specific topics to be covered include:  This lecture will provide an introduction to modeling of core processes in biology using differential equations. Specific topics to be covered include:  
* The Cell as a Dynamical System with different layers of feedback  * The Cell as a Dynamical System with different layers of feedback  
Line 52:  Line 52:  
−  '''Lecture 2: Model analysis and identification (Maria Rodriguez Fernandez, UCSB)'''  +  '''Lecture 2: [http://www.cds.caltech.edu/~murray/tmp/icb_crash_course/MRF_Berkeley_August_2010.pdf Model analysis and identification (Maria Rodriguez Fernandez, UCSB)]''' 
This lecture will provide an introduction to model identification theory and practice. Specific topics to be covered include:  This lecture will provide an introduction to model identification theory and practice. Specific topics to be covered include:  
* Global and local sensitivity analysis  * Global and local sensitivity analysis  
Line 75:  Line 75:  
The second half of the session will focus on StochKit, a software package for simulation of stochastic models. StochKit provides commandline executable for running stochastic simulations using variants of Gillespie’s Stochastic Simulation Algorithm (SSA) and Tauleaping. Among the numerous implementations of the SSA, StochKit provides solvers for the most well used and efficient methods: SSA Direct Method, Optimized Direct Method [Cao et al. 2004], Logarithmic Direct Method, and a ConstantTime Algorithm [Slepoy et al. 2008]. As for the Tauleaping algorithm, we provide a solver for an Adaptive Explicit Tauleaping method. To further increase the computational efficiency, StochKit provides automatic parallelization and a converter for SBML files. We will give a comprehensive review of the available algorithms and illustrate how to use Matlab functions in StochKit to process output files. For advanced developers, we will briefly illustrate how to build a custom solver for specific needs.  The second half of the session will focus on StochKit, a software package for simulation of stochastic models. StochKit provides commandline executable for running stochastic simulations using variants of Gillespie’s Stochastic Simulation Algorithm (SSA) and Tauleaping. Among the numerous implementations of the SSA, StochKit provides solvers for the most well used and efficient methods: SSA Direct Method, Optimized Direct Method [Cao et al. 2004], Logarithmic Direct Method, and a ConstantTime Algorithm [Slepoy et al. 2008]. As for the Tauleaping algorithm, we provide a solver for an Adaptive Explicit Tauleaping method. To further increase the computational efficiency, StochKit provides automatic parallelization and a converter for SBML files. We will give a comprehensive review of the available algorithms and illustrate how to use Matlab functions in StochKit to process output files. For advanced developers, we will briefly illustrate how to build a custom solver for specific needs.  
−  ''' Lecture 3: Multiscale Discrete Stochastic Simulation of Biochemical Systems (Linda Petzold, UCSB)  +  ''' Lecture 3: [http://www.cds.caltech.edu/~murray/tmp/icb_crash_course/Tutorial_talk_Petzold.ppt.pdf Multiscale Discrete Stochastic Simulation of Biochemical Systems (Linda Petzold, UCSB)] 
* Algorithms for wellmixed systems  * Algorithms for wellmixed systems  
Line 92:  Line 92:  
*[http://www3.interscience.wiley.com/cgibin/fulltext/112098374/PDFSTART Stochastic modelling of gene regulatory networks]  *[http://www3.interscience.wiley.com/cgibin/fulltext/112098374/PDFSTART Stochastic modelling of gene regulatory networks]  
−  ''' Lecture 4: Stochkit (Min Roh, UCSB)'''  +  ''' Lecture 4: [http://www.cds.caltech.edu/~murray/tmp/icb_crash_course/stochkit_presentation.pdf Stochkit (Min Roh, UCSB)]''' 
* Presentation on StochKit  * Presentation on StochKit  
Line 112:  Line 112:  
=== Session 3: Data Acquisition and Analysis ===  === Session 3: Data Acquisition and Analysis ===  
−  Since its inception 15 years ago, the DNA microarray has become a staple experimental tool for exploring the effects of biological intervention on gene expression. By measuring the abundance of tens of thousands of mRNA transcripts at once, microarrays provide a genomewide characterization of biological function. While the high dimensionality of microarray data provides a distinct advantage over smaller scale experimental platforms, it also requires intelligent use of data processing and analysis techniques to control for sources of noise, systematic bias, and statistical artifacts. This session will provide an overview of the workflow required to transform raw microarray data into biological insight. In the first lecture, we will begin by  +  Since its inception 15 years ago, the DNA microarray has become a staple experimental tool for exploring the effects of biological intervention on gene expression. By measuring the abundance of tens of thousands of mRNA transcripts at once, microarrays provide a genomewide characterization of biological function. While the high dimensionality of microarray data provides a distinct advantage over smaller scale experimental platforms, it also requires intelligent use of data processing and analysis techniques to control for sources of noise, systematic bias, and statistical artifacts. This session will provide an overview of the workflow required to transform raw microarray data into biological insight. In the first lecture, we will begin by introducing the R/Bioconductor analysis platform. We will then describe data preprocessing techniques including background subtraction, dye bias normalization, and scale normalization. These techniques reduce the effects of noise and systematic biases often associated with high dimensional data. Next, we will discuss methods to identify differentially expressed genesi.e., genes whose transcripts are expressed at different levels between experimental conditions. Through the use of sophisticated statistical methods, we will obtain subsets of genes showing reproducible expression changes across experimental replicates. These individual genes provide the first clues into the underlying biological processes acting in the experiment of interest. In the second lecture, we will focus on functional classification and ontological analyses of genes of interest. We will apply pathway enrichment and network analysis to identify functions and networks highly enriched in a set of genes. We will discuss ways for functional analyses of time series data sets and ways to present gene expression data. 
−  ontological analyses of genes of interest. We will apply pathway enrichment and network analysis to identify functions and networks highly enriched in a set of genes. We will discuss ways for functional analyses of time series data sets and ways to present gene expression data.  
−  ''' Lecture 5: Microarray data preprocessing and differential expression analysis (Bernie Daigle, UCSB)'''  +  ''' Lecture 5: [http://www.cds.caltech.edu/~murray/tmp/icb_crash_course/BD_army_tutorial.pdf Microarray data preprocessing and differential expression analysis (Bernie Daigle, UCSB)]''' 
+  * Introduction to R/Bioconductor  
* Preprocessing  * Preprocessing  
** Data import  ** Data import  
Line 133:  Line 133:  
* [http://www.rproject.org/ R software environment]  * [http://www.rproject.org/ R software environment]  
* [http://bioconductor.org/ Bioconductor packages for genomic data analysis]: Biobase, affy, limma, marray, hgug4110b  * [http://bioconductor.org/ Bioconductor packages for genomic data analysis]: Biobase, affy, limma, marray, hgug4110b  
+  * [http://dl.dropbox.com/u/9940131/army_tutorial.R Tutorial source code]  
Required data:  Required data:  
Line 155:  Line 156:  
=== Session 4: Applications ===  === Session 4: Applications ===  
−  ''' Lecture 7: Polarization in Yeast Mating (Mike Lawson, UCSB)'''  +  ''' Lecture 7: [http://www.cds.caltech.edu/~murray/tmp/icb_crash_course/Berk_Lawson_2010_2.pdf Polarization in Yeast Mating (Mike Lawson, UCSB)]''' 
One of the most wellstudied examples of cell polarization is the growth of the mating projection in Saccharomyces cerevisiae. A single molecular entity located at the front of the cell, termed the polarisome, helps to organize structural, transport, and signaling proteins. We have developed a spatial stochastic model (utilizing the reactiondiffusion master equation) of polarisome formation in mating yeast, focusing on the tight localization of proteins on the membrane. Prior work has produced deterministic (PDE) mathematical models that describe the spatial dynamics of yeast cell polarization in response to spatial gradients of mating pheromone; however, these required special mechanisms (e.g. high cooperativity) to match the characteristic punctate of the polarisome. This new model is built on simple mechanistic components, but is able to achieve a highly polarized phenotype even in relatively shallow input gradients. Preliminary results highlight the need for spatial stochastic modeling because deterministic simulation fails to achieve a sharp break in symmetry.  One of the most wellstudied examples of cell polarization is the growth of the mating projection in Saccharomyces cerevisiae. A single molecular entity located at the front of the cell, termed the polarisome, helps to organize structural, transport, and signaling proteins. We have developed a spatial stochastic model (utilizing the reactiondiffusion master equation) of polarisome formation in mating yeast, focusing on the tight localization of proteins on the membrane. Prior work has produced deterministic (PDE) mathematical models that describe the spatial dynamics of yeast cell polarization in response to spatial gradients of mating pheromone; however, these required special mechanisms (e.g. high cooperativity) to match the characteristic punctate of the polarisome. This new model is built on simple mechanistic components, but is able to achieve a highly polarized phenotype even in relatively shallow input gradients. Preliminary results highlight the need for spatial stochastic modeling because deterministic simulation fails to achieve a sharp break in symmetry.  
−  ''' Lecture 8: Biological variability and model uncertainty (Camilla Luni, UCSB)'''  +  '''Lecture 8: [http://www.engr.ucsb.edu/~lunicam/Luni_Berkeley_crash_course.pdf Biological variability and model uncertainty (Camilla Luni, UCSB)]''' 
Structured Singular Value (SSV) analysis is a tool developed in control theory useful to analyze uncertain biological models.  Structured Singular Value (SSV) analysis is a tool developed in control theory useful to analyze uncertain biological models. 
Latest revision as of 00:59, 10 October 2010
This course is geared toward biologists who want to become familiar with current computational biology software and capabilities, emphasizing quantitative applications for understanding and modeling complex biological systems. The course is taught by researchers from the Army Institute for Collaborative Technology and the Army Research Laboratory.
To register and obtain lodging and transportation information for the workshop, please go to:
Schedule
The course will consist of four sessions, each lasting approximately 3.5 hours (including a break in the middle of the session).
Monday, 9 Aug

Tuesday, 10 Aug

Lecture Outline
Session 1: Modeling and Analysis using Differential Equations
This session will provide an introduction to modeling of core processes in biology using differential equations. The first lecture will focus on the cell as a multilayered feedback system. Scientists need to build ad hoc models to analyze the cellular complexity in a quantitative manner. Ordinary differential equations (ODEs) are a good choice when considering high copy number molecules in a well mixed environment. Several transcriptional regulation pathways in bacteria, for instance, have been successfully modeled with ODEs. We will overview the general methods to build macroscopic deterministic models of biological processes, referring to the trp operon and the iron starvation pathways as application examples. Classical control and dynamical systems analysis tools (equilibria, bifurcations and frequency analysis) will also be reviewed. Finally, we will provide some fundamental notions from the theory of chemical reaction networks. The second lecture will close the modeling process cycle by covering the model identification theory and practice. Once a model structure (system of equations) is proposed, the validity of this structure should be tested by means of an identifiability analysis, e.g. making use of sensitivity analysis tools that can help to identify critical and negligible parameters and to establish a parameter ranking. If experimental data are available, parameter estimation is then carried out, leading to a first model. Otherwise a set of experiments must be devised by means of optimal experimental design and performed before the parameter estimation. The quality of these estimators should be assessed by checking the correlation between them and computing their confidence intervals. This initial model must be validated with new experiments, which in most cases will reveal a number of deficiencies. Thus, a new model structure and/or a new experimental design must be planned, and the process is repeated iteratively until the validation step is considered satisfactory.
Lecture 1: Core Processes in Cells (Elisa Franco, Caltech) This lecture will provide an introduction to modeling of core processes in biology using differential equations. Specific topics to be covered include:
 The Cell as a Dynamical System with different layers of feedback
 Modeling techniques and ordinary differential equations
 Examples: transcriptional and posttranscriptional regulation
 Control and dynamical systems analysis tools (equilibria, bifurcations, frequency analysis)
 Basic notions of chemical reaction networks theory
Reading list:
 H. de Jong (2002). "Modeling and simulation of genetic regulatory systems: a literature review", J Comput Biol, 9(1):67103
 M. Santillán and M. C. Mackey (2001). "Dynamic regulation of the tryptophan operon: A modeling study and comparison with experimental data", Proc. Natl. Acad. Sci. USA. 98(4):13641369
 E. Levine, Z. Zhang, T. Kuhlman and T. Hwa (2007). "Quantitative Characteristics of Gene Regulation by Small RNA", PLoS Biol, 5(9):e229
 M. Feinberg (1987). "Chemical reaction network structure and the stability of complex isothermal reactorsI. The deficiency zero and deficiency one theorems", Chemical Engineering Science, 42(10):22292268
Lecture 2: Model analysis and identification (Maria Rodriguez Fernandez, UCSB)
This lecture will provide an introduction to model identification theory and practice. Specific topics to be covered include:
 Global and local sensitivity analysis
 Identifiability analysis
 Optimal experimental design
 Robust parameter identification
 Confidence intervals of the estimated parameters
 Model identification tools
Reading list:
 M. Joshi, A. SeidelMorgenstern, and A. Kremling (2006). "Exploiting the bootstrap method for quantifying parameter confidence intervals in dynamical systems", Metabolic Engineering, 8:447–455
 A. Saltelli, M. Ratto, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli, M. Saisana, and S. Tarantola (2008). "Global Sensitivity Analysis: The Primer", John Wiley & Sons Ltd.
 M. RodriguezFernandez and J. R. Banga (2010). "SensSB: a software toolbox for the development and sensitivity analysis of systems biology models", Bioinformatics, 26(13):16751676
 E. Walter and L. Pronzato (1997). "Identification of Parametric Models from Experimental Data", Springer
 D. E. Zak, G. E. Gonye, J. S. Schwaber, and F. J. Doyle III (2003). "Importance of input perturbations and stochastic gene expression in the reverse engineering of genetic regulatory networks: Insights from an identifiability analysis of an in silico network", Genome Research, 13:2396–2405
Session 2: Stochastic Modeling and Simulation
In microscopic systems formed by living cells, the small numbers of some reactant molecules can result in dynamical behavior that is discrete and stochastic rather than continuous and deterministic. An analysis tool that respects these dynamical characteristics is the stochastic simulation algorithm (SSA). Despite recent improvements, as a procedure that simulates every reaction event, the SSA is necessarily inefficient for most realistic problems. There are two main reasons for this, both arising from the multiscale nature of the underlying problem: (1) the presence of multiple timescales (both fast and slow reactions); and (2) the need to include in the simulation both chemical species that are present in relatively small quantities and should be modeled by a discrete stochastic process, and species that are present in larger quantities and are more efficiently modeled by a deterministic differential equation. In the first half of the session, we will first describe the SSA, and then outline the methods such as tauleaping, hybrid, slowscale SSA and finite state projection that have been developed to accelerate the process of discrete stochastic simulation for wellmixed chemically reacting systems. Then we will examine the state of the art in algorithms and software for discrete stochastic simulation of spatiallydependent biochemical systems. The second half of the session will focus on StochKit, a software package for simulation of stochastic models. StochKit provides commandline executable for running stochastic simulations using variants of Gillespie’s Stochastic Simulation Algorithm (SSA) and Tauleaping. Among the numerous implementations of the SSA, StochKit provides solvers for the most well used and efficient methods: SSA Direct Method, Optimized Direct Method [Cao et al. 2004], Logarithmic Direct Method, and a ConstantTime Algorithm [Slepoy et al. 2008]. As for the Tauleaping algorithm, we provide a solver for an Adaptive Explicit Tauleaping method. To further increase the computational efficiency, StochKit provides automatic parallelization and a converter for SBML files. We will give a comprehensive review of the available algorithms and illustrate how to use Matlab functions in StochKit to process output files. For advanced developers, we will briefly illustrate how to build a custom solver for specific needs.
Lecture 3: Multiscale Discrete Stochastic Simulation of Biochemical Systems (Linda Petzold, UCSB)
 Algorithms for wellmixed systems
 SSA
 TauLeaping
 Hybrid
 SlowScale SSA
 FiniteState Projection
 Algorithms and software for spatiallydependent systems
 Inhomogeneous SSA
 Diffusive Finite State Projection
 Complex geometries and URDME software
Reading list:
Lecture 4: Stochkit (Min Roh, UCSB)
 Presentation on StochKit
 Available stochastic solvers
 Creating a model
 SBML conversion
 Output processing
 Examples
 Custom models and drivers
Reading list:
Available software:
Session 3: Data Acquisition and Analysis
Since its inception 15 years ago, the DNA microarray has become a staple experimental tool for exploring the effects of biological intervention on gene expression. By measuring the abundance of tens of thousands of mRNA transcripts at once, microarrays provide a genomewide characterization of biological function. While the high dimensionality of microarray data provides a distinct advantage over smaller scale experimental platforms, it also requires intelligent use of data processing and analysis techniques to control for sources of noise, systematic bias, and statistical artifacts. This session will provide an overview of the workflow required to transform raw microarray data into biological insight. In the first lecture, we will begin by introducing the R/Bioconductor analysis platform. We will then describe data preprocessing techniques including background subtraction, dye bias normalization, and scale normalization. These techniques reduce the effects of noise and systematic biases often associated with high dimensional data. Next, we will discuss methods to identify differentially expressed genesi.e., genes whose transcripts are expressed at different levels between experimental conditions. Through the use of sophisticated statistical methods, we will obtain subsets of genes showing reproducible expression changes across experimental replicates. These individual genes provide the first clues into the underlying biological processes acting in the experiment of interest. In the second lecture, we will focus on functional classification and ontological analyses of genes of interest. We will apply pathway enrichment and network analysis to identify functions and networks highly enriched in a set of genes. We will discuss ways for functional analyses of time series data sets and ways to present gene expression data.
Lecture 5: Microarray data preprocessing and differential expression analysis (Bernie Daigle, UCSB)
 Introduction to R/Bioconductor
 Preprocessing
 Data import
 Withinarray normalization
 Betweenarray normalization
 Data visualization
 Differential expression analysis
 Computation of test statistics
 Statistical significance and multiple testing
Reading list:
 J. Quackenbush (2002). "Microarray data normalization and transformation", Nat Genet, 32 Suppl:496501
 X. Cui and G.A. Churchill (2003). "Statistical tests for differential expression in cDNA microarray experiments", Genome Biol, 4(4):210
 DNA microarray data analysis using Bioconductor
Required software:
 R software environment
 Bioconductor packages for genomic data analysis: Biobase, affy, limma, marray, hgug4110b
 Tutorial source code
Required data:
Lecture 6: Data mining, ontological classification and pathway analysis of microarray gene expression (Rasha Hammamieh, WRAIR)
 Data presentation
 Annotation
 Functional Analysis
 Pathway Analysis
Available software:
 Cytoscape with plugins: BisoGenet, Bingo, ClusterMaker, Agilent Literature search
 Cluster
 TreeView
 Short Timeseries Expression Miner (STEM)
Required data:
Session 4: Applications
Lecture 7: Polarization in Yeast Mating (Mike Lawson, UCSB)
One of the most wellstudied examples of cell polarization is the growth of the mating projection in Saccharomyces cerevisiae. A single molecular entity located at the front of the cell, termed the polarisome, helps to organize structural, transport, and signaling proteins. We have developed a spatial stochastic model (utilizing the reactiondiffusion master equation) of polarisome formation in mating yeast, focusing on the tight localization of proteins on the membrane. Prior work has produced deterministic (PDE) mathematical models that describe the spatial dynamics of yeast cell polarization in response to spatial gradients of mating pheromone; however, these required special mechanisms (e.g. high cooperativity) to match the characteristic punctate of the polarisome. This new model is built on simple mechanistic components, but is able to achieve a highly polarized phenotype even in relatively shallow input gradients. Preliminary results highlight the need for spatial stochastic modeling because deterministic simulation fails to achieve a sharp break in symmetry.
Lecture 8: Biological variability and model uncertainty (Camilla Luni, UCSB)
Structured Singular Value (SSV) analysis is a tool developed in control theory useful to analyze uncertain biological models. A range of applications will be presented that focus on its use for analysis of fragility points in the network, drug screening, model discrimination, and model extension.
Reading list:
 Jacobsen, E. W. and C. Trane (2010). "Using dynamic perturbations to identify fragilities in biochemical networks." International Journal of Robust and Nonlinear Control 20(9): 10271046
 Jacobsen, E. W. and G. Cedersund (2008). "Structural robustness of biochemical network modelswith application to the oscillatory metabolism of activated neutrophils." Iet Systems Biology 2(1): 3947
 Kim, J. S., N. V. Valeyev, et al. (2010). "Analysis and extension of a biochemical network model using robust control theory." International Journal of Robust and Nonlinear Control 20(9): 10171026
 Shoemaker, J. E. and F. J. Doyle (2008). "Identifying fragilities in biochemical networks: Robust performance analysis of Fas signalinginduced apoptosis." Biophysical Journal 95(6): 26102623
Lecture 9: Biofuels (Adam Arkin, LBNL)