And The Winner Is…
January 21, 2013
We are very happy to announce that the Attractor Metagenes Team (consisting of Mr. Wei-Yi Cheng, Mr. Tai-Hsien Ou Yang and Professor Dimitris Anastassiou of Columbia University) is the winner of the Sage Bionetworks/DREAM Breast Cancer Prognosis Challenge. Please join all of us at Sage Bionetworks and DREAM in congratulating the team for their winning Challenge model (Syn ID#1417992) that achieved a concordance index of 0.7562. This means that given any two Breast Cancer patients, the probability that team Attractor Metagenes will correctly predict who of the two patients will survive the longer is 76%, an extremely statistically significant performance. The performance was robust, in that this team was also the best performer in most of the 100 instances in which the test set was perturbed by random removal of 20% of the patients. As the winner of the Challenge, the Attractor Metagenes team has been awarded the opportunity to publish an article about the winning Challenge model in Science Translational Medicine and will be invited to the 4th Annual Sage Congress, taking place in April 2013. Please read on to hear from Wei-Yi Cheng who submitted the winning model on behalf of his team.
Sage / DREAM7 Breast Cancer Prognostic Challenge
Validation Phase Writeup
Wei-Yi Cheng
wc2302@columbia.edu
During the first stages of the Breast Cancer Prognostic Challenge, we have shown that the breast cancer survival can be well predicted by the “Attractor Metagenes.” The Attractor Metagenes are sets of strongly co-expressed genes found using an iterative algorithm. We have previously identified several such Attractor Metagenes in almost identical forms across various cancer types, namely the mitotic chromosomal instability attractor (CIN), the mesenchymal transition attractor (MES), and the lymphocyte-specific attractor (LYM) [1]. As we had mentioned before, we like to think of these three main attractor metagenes as representing three key “bioinformatic hallmarks of cancer,” reflecting the ability of cancer cells to divide uncontrollably, to invade surrounding tissues, and the ability of the organism to recruit a particular type of immune response to fight the disease. In the METABRIC dataset, we confirmed that, under certain conditions, each of these attractors has strong prognostic power. For instance, the expression of the CIN attractor suggests high grade; the expression of the MES attractor during early stage (no positive lymph node, and tumor size less than 30 mm) indicates the invasiveness of cancer cells and thus suggests bad prognosis; and the expression of the LYM attractor is an indication of good prognosis in ER negative breast cancer, while it is reversely ominous when there are already positive lymph nodes. The attractor approach also identifies some breast-cancer specific attractor such as the ER attractor and the HER2 amplicon. We have used these Attractor Metagenes in all our models and reported their association with survival in previous stages of the challenge [2][3][4].
We think that the success of our model is due to the lack of overfitting to the training (METABRIC) dataset. Indeed our features, the attractor metagenes, were not derived from the training set. Instead, they were derived from other cancer datasets from multiple cancer types [1]. We hypothesized that the attractor metagenes represent universal biomolecular events in cancer, which would therefore be useful for the particular type of breast cancer. So, we used the training set only to find the best ways to combine these features in breast cancer. And we were so happy to see that our score for overall survival in a totally new validation dataset was actually higher than the corresponding score that we had achieved in the previous phases of the Challenge in which the METABRIC dataset itself was split and used for both training as well as validation.
In order to select from our existing models trained on the METABRIC data for the totally new Oslo validation set, we performed several hold-out tests on all of our submitted models, such as the 10-fold cross-validation. In addition, based on what Dr. Huang revealed in “Contours of the Oslo Validation set” [5], we thought it was also important to evaluate the performance of our models using several re-sampled test sets containing only patients who received chemotherapy. Indeed, the top-performing model (syn1417992) has the highest chemotherapy-only test score among our other models.
The top-performing syn1417992 model contains several subclassifiers that utilize orthogonal information. Based on the universal Attractor Metagenes we found in multiple cancer types [1], and several breast cancer specific Attractor Metagenes, we created an “Attractor Metagene Space” of around 15 attractor metagenes to replace the 50,000-gene molecular space. We used Cox regression, generalized boost model (GBM), K-nearest neighbor (KNN) to create prognosis models on the Attractor Metagene Space and clinical features respectively. For feature selection, we used Akaike information criterion (AIC) when performing Cox regression. The model also includes a subclassifier that used mixed clinical and molecular features, which include all three of the universal metagenes (CIN, LYM restricted on ER and HER2 low, and MES restricted on lymph-negative and tumor size less than 30 mm), and the SUSD3 metagene, which we found is highly associated with good prognosis when over-expressed. In our submissions we did not make use of any code from other Challenge participants.
Finally, we would like to thank everyone who made this wonderful challenge possible. We believe that this success validates not only the prognostic power of our model in breast cancer, but also the “pan-cancer” property of the attractor metagenes, since they were defined from other datasets of various cancer types. We hope that we will have the opportunity to collaborate with pharmaceutical companies towards the development of related diagnostic, prognostic and predictive products; and particularly to scrutinize the underlying biological mechanisms trying to think of potential therapeutic interventions that could be applicable to all types of cancer.
References
- W.Y. Cheng, T.H. Ou Yang and D. Anastassiou, “Biomolecular events in cancer revealed by attractor metagenes,” Preprint available from arXiv:1204.6538v1, April 30, 2012, PLoS Computational Biology, in Press.
- http://support.sagebase.org/sagebase/topics/mitotic_chromosomal_instability_attractor_metagene
- http://support.sagebase.org/sagebase/topics/mesenchymal_transition_attractor_metagene-znl1g
- http://support.sagebase.org/sagebase/topics/lymphocyte_specific_attractor_metagene
- http://support.sagebase.org/sagebase/topics/contours_of_the_oslo_validation_set