Breast Cancer Challenge: Team “Attractor Metagenes” nabs top overall Metabric score!

Please join all of us at Sage Bionetworks and DREAM in congratulating  Wei-yi Cheng and the entire Attractor Metagenes team for their winning model (Syn ID#1444444): this training model received the top overall score for the Metabric phase of this Challenge.  It will be so interesting to see how this model (and all the others) perform against the final validation data set that is currently being produced!  Wei-yi and his team have been invited to speak at the upcoming DREAM conference taking place in San Francisco (Nov 12-16).   Please read on to hear from Wei-Yi Cheng who submitted the winning model on behalf of his team.

Our models use the information provided by the “attractor metagenes” to evaluate their prognostic value in breast cancer. We have previously applied an iterative attractor finding algorithm on rich expression datasets from multiple cancer types identifying three universal (pan-cancer) attractors, which are the mitotic chromosomal instability attractor (CIN), the mesenchymal transition attractor (MES), and the lymphocyte-specific attractor (LYM) [1]. We like to think of these three attractors as “bioinformatic hallmarks of cancer.” In our top submission (syn1444444) we used precisely these same three metagenes as found from our previous unsupervised multi-cancer analysis.

Our experience in the Challenge was particularly rewarding as we were successively confirming that each of these attractors would indeed be helpful towards improving the breast cancer prognostic model, and we were reporting our observations to the other participants using the Synapse forum. We first found that the CIN attractor is highly prognostic, as evidenced by the fact that it was essentially recreated after ranking the individual genes in terms of their corresponding concordance index [2]. The other two main attractors were also found to be highly prognostic after being properly conditioned. For example, the MES attractor is most prognostic in early stage breast cancer (no positive lymph nodes and tumor size less than 30 mm) [3]. On the other hand, the LYM attractor is protective when ER and HER2 expressions are low, while it has the reverse effect on prognosis when there are multiple positive lymph nodes [4]. We also identified a few additional prognostic metagenes, such as the SUSD3-FGD3 metagene that is composed of two genomically adjacent genes, the ZMYND10-LRRC48-CASC1 metagene, the PGR-RAI2 metagene, the HER2 amplicon attractor metagene at chr17q11.2 – q21 and a chr17p12 meta-CNV. We compiled an attractor metagene space using these metagenes, along with TP53 and VEGFA, known to be associated with cancer, and then used them as a molecular feature space for feature selection.

In our top submission (syn1444444), we applied several subclassifiers to maximize the information used and to build a robust, generalizable model, including Cox regression, generalized boost model (GBM), and K‑nearest neighbor (KNN). We also used Akaike information criterion (AIC) on the features passed to Cox regression to avoid overfitting. We applied the AIC-based Cox regression and the GBM on the metagene space and the clinical features, respectively, and the KNN model on the combined metagene and clinical feature space. We combined the predictions of each subclassifier by directly summing up the linear predictors generated by the subclassifiers. For the KNN model, because the prediction is the survival time, we used the reciprocal of the prediction to be summed up. We also included two subclassifiers using mixed molecular and clinical features. In particular, one of them used all three of the universal metagenes (CIN, LYM and MES properly conditioned), the SUSD3-FGD3 metagene, clinical features age, radiation therapy, and chemotherapy. We found that such a simple model provides accurate prognosis, and can be treated independently with other subclassifiers.

In our submissions we did not make use of any code from other Challenge participants. We submitted the winning model prior to the October 15 model submission deadline, and the full code of this model was accessible to other participants as soon as it was posted to the leaderboard and thereafter.


  1. W-Y Cheng, D Anastassiou, Biomolecular events in cancer revealed by attractor metagenes. arXiv:1204.6538v1 [q-bio.QM].

Comments are closed.

%d bloggers like this: