We asked the Challenge teams that received the “Most Creative Method” prize in the Whole-cell Parameter Estimation DREAM8 Challenge to submit a short write-up explaining how they are working to solve the Challenge. Below are the descriptions from the top 3 teams: winner Team Whole-Sale Modelers, followed by Team Crux, and Team newDREAM.
Team Whole-Sale Modelers
1.Summary of Overall Approach
The Challenge boils down to a complex, high-dimensional regression problem. We are asked to infer 15 perturbations that were delivered to a subset of 30 identified model parameters. These parameters (which can be though of as dependent variables) are estimated based on large amounts of \high-throughput data.” In this document I describe the statistical techniques I have used to solve this problem. Importantly, the techniques I outline below are complementary to an analysis of the “sub-models”. That is, the general search strategy I describe can be constrained and thus improved if one were to gain insight from the sub-models.
I have written code to estimate the parameters of the whole model, given the high-throughput data that is generated for each simulation. This model can be stated very simply:
where p is the estimated “perturbation vector” (which encodes the perturbation delivered to each of the 30 parameters of interest), f is a non-linear function, and x is a vector that contains all of the high-throughput data for the mutant model (provided freely to contest participants). The vector p has 30 elements, each of which represents the proportional change in each parameter (e.g. if the third parameter in the list – the kcat of Tmk -is halved, then the third element of ~p is equal to 3).In practice, I found that the above problem is intractable because of the large number of variables in the high throughput data leading to a very large vector x. Thus I performed a principal components analysis of the high-throughput data before fitting the model. This reduces the dimensionality of x to be on the order of 50 components.
The nonlinear function f is fitted by a collection of regression trees using the Random Forests technique. This is a popular technique in the field of machine learning. Its popularity stems from the fact that we do not need to have an initial guess for the form of the non-linear function f. Additionally, the algorithm cleverly avoids the over-fitting problem by probabilistically sampling from the training data. The random forest was fitted based on the high-throughput data of 1128 whole cell simulations.
2. Improvements – Compressed Sensing
A substantial improvement, which I have not had the time to implement yet, would be to incorporate the constraint that the perturbation vector p is sparse. This piece of information is critical, and is widely studied in the context of “compressed sensing”. Essentially, one should be able to improve the fit by penalizing the L1 norm of the estimated perturbation vector p (in theory, at least). Additionally, I am in the process of running more simulations which I am now doing in triplicate. Averaging the high-throughput data across these replicates has the potential to improve the fit, since the stochasticity of the model can be quite influential (especially in terms of the data stored in the variable “rxnFluxes”). In fact, I found some models that seemed to fit the data quite well, but failed when submitted to Bitmill. This was apparently due to trial-to-trial variability in the rxnFlux data, which was revealed by averaging over 8 trials.
3. Explanation of Code
Running the script “s007 simplified random forest script” should generate some estimates of the perturbations to the cell. This script can be found in the “analysis” folder. I have added the prefix “alex ” to almost all of my written functions to distinguish them from Jonathan Karr’s original code. Due to severe time constraints, I have not been able to comment and clean up all of my code. Full code can be found at: https://github.com/ahwillia/WholeCell
4. About the Team: Whole-Sale Modelers
- Alex Williams | www: Alex Williams is a research technician in Eve Marder’s lab at Brandeis University. His research interests are in computational neuroscience. Alex’s work examines how neurons maintain stable activity patterns over long time periods in spite of comparatively rapid protein turnover.
- Jeremy Zucker | www: Jeremy Zucker has over 10 years of experience in the representation, integration, modeling and simulation of biological pathways to elucidate the complex relationship between genotype and phenotype.
The goal of the DREAM8 Whole-cell parameter estimation challenge is to estimate 30 unknown parameters P of a mathematical model of a cell. The whole cell model has 1972 parameters in total and default values for all parameters are known. As prior information, it is known in addition that 15 out of the given set of 30 parameters
are identical to default values. For the remaining 15 modified parameters, the following prior knowledge is available:
1. 4 promoter affinities were modified
2. 7 kcats were modified
3. 4 RNA half lives were modified
4. 5 genes have one changed parameter
5. 5 genes have two changed parameters
6. 13 of the 15 modified parameters were decreased
7. 2 of the 15 modified parameters were increased
8. The decreases range from 2.8-93.4 %.
9. The increases range from 11.7-90.6%.
In our analyses, we restricted the parameter space to satisfy these constraints.
To account for strictly positive parameter values, all the analyses have been performed in a logarithmic parameter space. Moreover, this accounts for the fact that changes of parameter values usually contribute multiplicatively rather than additively, i.e. usually changing a parameter by a factor a or 1/a have a similar impact, but adding a constant a has mostly a qualitatively different effect than the corresponding subtraction. Since promoter affinities are normalized, all perturbations of these parameters have to satisfy the normalization condition
For all perturbation we performed, we normalized the promoter affinities by adapting a single, fixed parameter which is not in the set of modified parameters P.
The parameters were estimated using the maximum likelihood methodology. We initially perturbed each parameter ϴϵ P individually to challenge the response of the model and to estimate the gradient of the likelihood. For testing purposes, we also modified other parameters ϴ not from P. At the second stage, we also altered sets of more than a single parameter. This allows the computation of higher order derivatives. An iterative procedure then allows to advance towards a better model fit. Finally, analysis of the response of the model for default and estimated parameters allows to assess which of the 30 candidate parameter were modified.
2. Implementation of the numerical methods
A major bottleneck in this Challenge is the computational effort that a single evaluation of the model needs. It critically limits the number of possible iterations during maximumm likelihood estimation of the parameter. In addition to simulations on local computers, the simulations were performed on the Bitmill server using Matlab code.
3. About the Team: Team crux
Team crux consists of a group of researchers at the Institute of Physics at the University of Freiburg. Team crux has previously won two DREAM competitions, the DREAM6 Parameter Estimation Challenge and the DREAM7 Network Inference Challenge.
- Dr. Clemens Kreutz | www Clemens is a postdoctoral scholar at the Institute of Physics at the University of Freiburg. Clemens’ research focuses on mathematical modelling of cellular signal transduction, experimental design, and statistics.
- Dr. Andreas Raue | www Andreas is also a postdoctoral scholar at the Institute of Physics at the University of Freiburg. Andreas’ research focuses on parameter estimation, experimental design, and uncertainty analysis.
- Bernhard Steiert | www Bernhard is a PhD candidate at the Institute of Physics at the University of Freiburg. His research focuses on modeling erythropoietic signaling pathways in cancer, including EGF/HGF crosstalk.
- Prof. Jens Timmer | www Jens is a professor of mathematics and physics at the University of Freiburg. His research focuses on the development and interdisciplinary application of mathematical methods to analyse and model dynamical processes in biology and medicine. His group develops and applies mathematical methods to analyse and model these processes based on measured data. Their final aim is to help to turn the life sciences from a qualitative descriptive into a quantitative predictive science.
Participants are challenged to estimate the values of 15 unknown parameter values from a set of 30 parameters – 10 promoter affinities, 10 RNA half-lives, and 10 metabolic reaction kcats – of a recently published whole-cell model of M. genitalium (Karr et al., 2012) given the model’s structure and simulated data.
The solution needs to answer two questions: 1. Identify parameter candidates. 2. Minimize parameter distance given the candidate parameters. Since the model is too ‘big’ and simulations take very long time, we could not thoroughly seek the entire parameter space to get near-perfect solutions within the limited time. Therefore, our strategy includes:
1) Use wild type/gold standard/ downloaded perturbation datasets to identify the (potentially modified) parameters most sensitive to cell growth.
2) Observe the high-throughput data. Divide the potentially modified parameters into three groups (A, too time-consuming to get optimized; B, hard to get optimized; C, easy to get optimized) based on the observation.
3) Make educated guesses for parameters in group A firstly, and then try to optimize parameters in group B with fixing parameters in group A. Finally, optimize parameters in group C.
3. About the Team: newDream
newDream includes a team of researchers from the University of Texas Southwestern. Last year team newDream won the DREAM Drug Sensitivity Prediction Challenge.
Dr. Jichen Yang | www Jichen is a postdoctoral scholar at the University of Texas at Southwestern.
Dr. Hao Tang | www: Hao is a postdoctoral fellow at the QBCR at the University of Texas Southwestern.
Tao Wang | www: Tao is a graduate student at the QBCR at the University of Texas Southwestern.
Dr. Yueming Liu | www: Yueming is a mathematician at the University of Texas at Arlington.
Prof. Yang Xie | www: Yang is a professor of in the Department of Clinical Science at the University of Texas Southwestern.
Prof. Guanghua Xiao | www: Guanghua is a professor of in the Department of Clinical Science at the University of Texas Southwestern.