Breast Cancer Challenge – Thoughts from a participant

Rich Savage, a participant in the Sage Bionetworks / DREAM Breast Cancer Challenge, has written a great summary of his experiences and recommendations for future challenges on his blog, 21st Century Scientist. Key in his thinking are the issues in balancing competition and collaboration in making an interesting challenge experience.


Science is hard

Check out a new clearScience blog post expounding on an article by David Shaywitz in today’s Forbes. Enjoy!

The Challenge’s October 1 Leaderboard Winner: A “Repeat Performance” from Attractor Metagenes Team!

Please join all of us at Sage Bionetworks and DREAM in congratulating  Wei-yi Cheng and the entire Attractor Metagenes team for their October 1 Leaderboard Winner Achievement.  Attractor Metagenes was also the September 1 leaderboard winner .  This “repeat performance” is especially impressive given that it was achieved working with two different versions of the Metabric data!  Please read on to hear from Wei-Yi Cheng who submitted the winning model on behalf of his team.   

Dear fellow BCC challenge participants and others here on Synapse ,

I would like to thank once more the organizers for the opportunity that they again give to the Attractor Metagenes Team to share some of our methods and findings. It has been another exciting month, in which many new ideas have been shared among us. It is inspiring that through these discussions on the Challenge forum, we all are gaining a better understanding of different perspectives on the data and the disease itself.

The main attributes responsible for our continuing high score is that we are making use of the three strongest attractor metagenes representing universal (multi-cancer) biomolecular events: The mitotic chromosomal instability attractor metagene; the mesenchymal transition attractor metagene; and the lymphocyte-specific attractor metagene. We are particularly excited because these metagenes are present in all solid cancers, and therefore can be used as “pancancer” biomarkers, which will be more robust, compared to using individual oncogenes. We have now posted descriptions of each of these three main attractors as items in the Synapse forum. So far we have not incorporated any code from other submissions, but we will certainly do so if we deem appropriate, giving them credit prominently. And similarly, we also welcome others to make use of our code that is always freely and readily available. The functions and metagene lists used in all our submissions are incorporated in an R package downloadable through the link given in the source code we uploaded on the leaderboard. We also have uploaded an R package for finding attractor metagenes, available under Synapse ID syn1123167 for anyone interested to use, not only in breast cancer, but in all types of cancer.

We understand that the main objective in Phase 2 of the Challenge is to build a generalizable model that will work well when evaluated against the Oslo Validation data set. Based on our experience in both Phases, we believe that achieving a generalizable model requires making use of survival data that have been “purified” by excluding causes of death unrelated to the disease itself. We understand, however, that this is difficult to achieve in general and even more so in the case of the Oslo-Val data set. And because Phase 2 uses the same overall survival data as in the Olso-Val, we modified our models to include lots of clinical features that we do not think would be otherwise required for the development of a sharp and generalizable prognostic model.

To elaborate on this last point: We are excited about building sharp, insightful and powerful “minimalist” disease models that could be used for biomarker products making use of a very small number of features. For example, we believe that we have identified such a model in breast cancer that makes use of nothing other than our three attractor metagenes mentioned above, tumor size, number of lymph nodes affected, and one more protective feature that we discovered as a result of our participation in the Challenge: The metagene defined as the average of the genes SUSD3 and FGD3, which, as we observed, are genomically adjacent at chr9q22.31. We know that simultaneous silencing of these two genes is strongly associated with bad prognosis, but we are not certain about the underlying biological mechanism (it may not be the result of a CNV). We suspect that this simultaneous silencing is one of several triggers required for ER-negativity, perhaps the most important one. This is an interesting research question and we hope that other Challenge participants with expertise in biology and medicine will join us in the effort to decipher this important mechanism in breast cancer!

Wei-Yi Cheng

Graduate Research Assistant, Ph.D. Candidate in Electrical Engineering

Genomic Information Systems Laboratory,

Columbia University