HPN Challenge teams HDSystems, IPNet and Tongii write about their winning models for Sub-Challenges 1A and 1B

Below are blogposts from three winning teams (HDSystems, IPNet and Tongii)  from sub-challenges 1A and 1B of the Heritage Provider Network DREAM Breast Cancer Network Inference Challenge.  Sub- Challenge-1A asked participants to work with experimental time-course breast cancer proteomic data to infer a causal network. Sub-Challenge-1B asked participants to work with in silico time-course data generated from a state-of-the-art dynamical model of signaling to also construct a causal network. We awarded cash prizes to the first three teams within each sub-Challenge with a model that scored 2 standard deviations above the null model.  Below three of the winning teams (HDSystems for sub-Challenge-1A and both IPNet and Tongii for sub-Challenge-1B) share a little about themselves and the rationale behind their winning model.  

You can check out the current leaderboards for these sub-Challenges here:

Sub-Challenge-1A: https://www.synapse.org/#!Synapse:syn1720047/WIKI/56830

Sub-Challenge-1B: https://www.synapse.org/#!Synapse:syn1720047/WIKI/56850

Team HD Systems (syn ID2109051)

Dear fellow HPN challenge participants and organizers,

This is Ruth Großeholz, along with Oliver Hahn and Michael Zengerling at Ruprecht-Karls University in Heidelberg, Germany. We are happy to be one of the winning teams of the sub-Challenge 1-A leaderboard incentive prize for the experimental network and would like to thank the organizers for the opportunity to introduce ourselves and our ideas.

The three of us are Master students in Professor Ursula Kummer’s group for Modeling of Biological Processes at Bioquant Heidelberg participating in this challenge to gather working experience with network inference. For us, the Challenge offers a chance to work outside of the controlled conditions of a practical course and expand our methodical knowledge. Before this Challenge, network inference was uncharted territory as it is not covered in our Master program. So far, it has been a great experience to work with such a rich data set.

Since we come from a biological background and know from a number of practical courses that a model is only as good as the information backing it, our idea was to build one model including all the edges required for this challenge using extensive literature research on the roles and interactions of all the given proteins in cellular signaling. Even though the cell lines differ quite drastically between each other we felt that having one basic model, which describes signaling in a healthy cell would be a good point to start. Only after we had placed all proteins within our universal network, we started to tailor the models to their respective cell lines and growth factors. In our primary network all edges had a score of 1, which we later adjusted according to the dynamics of both source and target.

We thank Thea, Laura and all the other challenge organizers, as well as everyone who contributed to making this challenge possible. The implementation of the leaderboard did not only provide a possibility to get feedback during the challenge but also gave the challenge a more competitive character.

Team HD Systems

Oliver Hahn, Michael Zengerling & Ruth Großeholz

Master Students, Major Systems Biology

Modelling of Biological Processes

Ruprecht-Karls University, Heidelberg

 

Team IPNet (syn ID2023386)

It is a great honor for us to be highlighted among the top-scoring models in the HPN-DREAM Subchallenge 1B on the August 7th leaderboard. We are very thankful that the organizers have given us the opportunity to present ourselves and to introduce our model.

We are a team that started at the Institute for Medical Informatics and Biometry at the TU Dresden in Germany. The team is composed of myself (Marta Matos), Dr. Bettina Knapp, and Prof. Dr. Lars Kaderali.

The model we used in the HPN-DREAM Challenge was developed during my master’s thesis [1], under the supervision of Dr. Bettina Knapp in the group of Prof. Dr. Lars Kaderali. The model is an extension of the approach previously developed by Knapp and Kaderali [2] which is available as a bioconductor software package [3]. It is based on linear programming and it infers signaling networks using perturbation data. In particular, it was designed to take advantage of RNA interference experiments in combination with steady-state expression measurements of the proteins of interest. In my master’s thesis we expanded this model to take advantage of perturbation time-series data to improve the prediction of causal relations between proteins. Therefore, the HPN-DREAM Subchallenge 1B is an excellent opportunity to evaluate the performance of the extended model on time-series data after different perturbations.

In our approach the signal is modeled as an information flow which starts at the source nodes and propagates downstream along the network until it reaches the sink nodes. A node which is not active at a given time step interrupts the flow and we assume that the signal cannot propagate to its child nodes. To distinguish between active and inactive nodes we use a thresholding approach. The choice of the threshold has a big influence on the model performance. For the HPN-DREAM Subchallenge 1A, we got our best results when using the values of each node at the first time point to discretize the data. The underlying assumption is that the expression of the network nodes are in an inactive state at t=0, since they have not yet been stimulated.

What we like the most in the DREAM Challenge, is that it allows the comparison between different models in exact the same setting and that it is possible to evaluate the performance of the models on real, yet unpublished, data. Furthermore, the Challenge facilitates to learn from other researchers working in the same field and it allows for the exchange of knowledge and expertise. This helps to improve the developed models and to answer complex biological questions in more detail. We thank the challenge organizers and all who contributed in making this competition possible.

[1]  Marta Matos. Network Inference: extension of a linear programming model for time-series data. Master’s thesis, University of Minho, 2013

[2] Bettina Knapp and Lars Kaderali. Reconstruction of cellular signal transduction networks using perturbation assays and linear programming. PLoS ONE, 8(7):e69220, 07 2013.

[3] Bettina Knapp, Johanna Mazur and Lars Kaderali (2013). lpNet: Linear Programming Model for Network Inference.  R package version 1.0.0.

 

Team Tongii (synID2024139)

This is Su Wang, along with my teammates Xiaoqi Zheng, Chengyang Wang, Yingxiang Li, Haojie Ren and Hanfei Sun at Tongji University, Shanghai, China. It is our honor that our model was highlighted as one of the top-scoring models for HPN-DREAM Challenge. Xiaoqi is an associate professor in Shanghai Normal University; Yingxiang and I are the PhD candidate students; Chengyang and Hanfei are master students; Haojie just graduated from Nankai University. The diversity of our background gives us the courage to participate in this Challenge. Thank the organizers to give me the chance to introduce team and our model.

In our previous study, we focused on the software development to detect the transcription factors and chromatin regulators target genes. We used a monotonically deceasing function based on the distance between the binding site and transcription start site to measure the contribution of the binding to a gene, and combine with the differential expression data to get the factors direct target genes. We create the transcription network based on the direct target relationship and try to find the relationships among all the factors and the co-regulate pairs. We collected some available pathways from KEGG and integrated with our predicted results to convince the predicted relationship. Although the mechanism of protein phosphorylation is different from the regulation between transcription factors, some models and ideas can still be applied to reconstruction the network.

In our model, we applied the Dynamic Bayesian Network to train the data. Combined with the mutual information between two genes, we used a simple rank average to get the causal relationship. This model works well because, firstly, the time series data can be easily used for Dynamic Bayesian network; secondly, the relationship of each gene is not linear, so the correlation like Pearson’s Correlation is not proper to get the information between the two genes; last but not the least, the information inequality applied to delete the unbelievable edges, which made our model have a better sensitivity and stability.

There is much room for improvement of our model. We thank the Challenge organizers us the chance to present our model here.  The Challenge is very good for helping us to build a model to deal with a specific question, and the leaderboard is a very good platform to test how good our model and to help us understand where we should pay more attention to improve. We believe that everyone learned a lot by doing this Challenge, both for the study of the Challenge and the skill themselves. We hope the best performing teams can share their model and learn from each other, so that we can all get the best model to solve more questions.

Comments are closed.

%d bloggers like this: