Can the crowd provide ‘The Cure’?

The participants on Sage Bionetworks’ Breast Cancer Prognosis Challenge keep surprising and delighting us!  Here is a guest post from Benjamin Good: he is a member of “Team Hive” and participating in our Challenge.  His post is about the online game he and his team just launched a few weeks ago to crowdsource ideas that Team Hive can then use to build great models for the Challenge.  Please enjoy Ben’s post and try out his cool game!!

Serious Games Serious games are games that have an underlying purpose.  When you play a game like Foldit or Phylo, you are finding entertainment like any other game but your actions are also translating into a useful end product.  In Foldit, you contribute to protein structure determination, in Phylo to multiple sequence alignment.  Reconstituting difficult or time consuming tasks into components of games opens up a new way to find and motivate volunteer contributors at potentially massive scale.  Like the SAGE competitions themselves, serious games provide the opportunity to focus widespread community attention on particular challenging problems.

The Cure The purpose of the game The Cure is to identify sets of genes that can be used to build predictors of breast cancer prognosis that will stand up to validation.  The hypothesis is that we can outperform purely data-driven approaches by infusing our gene selection algorithms with the biological knowledge and reasoning abilities of hundreds or even thousands of players.  This biological insight is captured through a simple, fun two-player card game where each card corresponds to a gene. In its current form, the game consists of a series of 100 boards, each containing 25 distinct genes from a precomputed list of interesting genes.  In the game, you compete with the computer opponent Barney to find the set of 5 genes from each board that form the best predictors of 10-year survival.  In each turn, the player takes a gene card off of the board and puts it in their hand.  To make these decisions, extensive annotation information from resources such as the Gene Ontology, Entrez Gene and PubMed is provided through the game interface and players are free to conduct their own research.  As each card is added to a player’s hand, a decision tree is constructed automatically using the genes in the hand and the training dataset from the Sage Bionetworks / DREAM challenge.  The tree is shown to the player and the hand is scored based on the performance of the decision tree algorithm, coupled with those genes, in a 10-fold cross-validation test.  If the player produces a better gene set than Barney they score points based on the cross-validation score. Play begins in a very short training stage that teaches the mechanics of the game as players select features to use to build an animal classifier.  Once this stage is passed, players are free to choose which of the gene boards to play.  Each board is shown with an indication about how many other players have already defeated it.  Once a certain number of players have finished a board, we declare it complete and close it off to encourage the player population to explore the entire board space.  The first collection of boards is nearly complete and the next level should be available soon.

Results so far In the first week that the game was live (Sept. 7-14, 2012), more than 120 players registered and collectively played more than 2000 hands.  60% of the players came from the U.S., 30% from China and the rest arrived from all over the world.  Nearly half of the players have PhDs.  While it is too early to tell whether this approach will be a contest winner, we have already used it to identify several small gene sets that have significant predictive power (far better than random).  We also know that some of the players are having a great time.  One of the top players wrote this about the game: “This is a wonderful game, which can give me happiness and knowledge at the same time.” Whether or not this game can manage to move the bar forward in cancer prognosis, it seems that it was already worth the effort to create it. Play now at !

About the creators of ‘The Cure’ The Scripps Research Institute team HIVE members include: Benjamin Good, Max Nanis, Salvatore Loguercio, Chunlei Wu, Ian MacLeod and Andrew Su. Their research explores applications of crowdsourcing in biology such as the Gene WikiBioGPS, and the emerging collection of serious games at ‘The Cure’, is specifically focused on accumulating knowledge that can be translated into good performance on the BCC challenge.


One Response to Can the crowd provide ‘The Cure’?

  1. Pingback: Did Team Hive’s online game generate a top-scoring Challenge model? « Sage Synapse

%d bloggers like this: