Google Research Blog
The latest news from Research at Google
Facilitating Genomics Research with Google Cloud Platform
Wednesday, July 30, 2014
Posted by Paul C. Boutros, Ontario Institute for Cancer Research, Josh Stuart, UC Santa Cruz, Adam Margolin, Oregon Health & Science University; Nicole Deflaux and Jonathan Bingham, Google Cloud Platform and Google Genomics
The understanding of the origin and progression of cancer remains in its infancy. However, due to rapid advances in the ability to accurately read and identify (i.e. sequence) the DNA of cancerous cells, the knowledge in this field is growing rapidly. Several
have shown that alterations of single base pairs within the DNA, known as
Single Nucleotide Variants
(SNVs), or duplications, deletions and rearrangements of larger segments of the genome, known as
(SVs), are the
primary causes of cancer
and can influence what drugs will be effective against an individual tumor.
However, one of the major roadblocks hampering progress is the availability of accurate methods for interpreting genome sequence data. Due to the sheer volume of genomics data (the entire genome of just one person produces more than 100 gigabytes of raw data!), the ability to precisely localize a genomic alteration (SNV or SV) and resolve its association with cancer remains a considerable research challenge. Furthermore, preliminary benchmark studies conducted by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) have discovered that different mutation calling software run on the same data can result in detection of different sets of mutations. Clearly, optimization and standardization of mutation detection methods is a prerequisite for realizing personalized medicine applications based on a patient’s own genome.
The ICGC and TCGA are working to address this issue through an open community-based collaborative competition, run in conjunction with leading research institutions: the
Ontario Institute for Cancer Research
University of California Santa Cruz
Oregon Health and Sciences University
. Together, they are running the
DREAM Somatic Mutation Calling Challenge
, in which researchers from across the world “compete” to find the most accurate SNV and SV detection algorithms. By creating a living benchmark for mutation detection, the DREAM Challenge aims to improve standard methods for identifying cancer-associated mutations and rearrangements in tumor and normal samples from
Given Google’s recent partnership with the
Global Alliance for Genomics and Health
, we are excited to provide cloud computing resources on Google Cloud Platform for competitors in the DREAM Challenge, enabling scientists who do not have ready access to large local computer clusters to participate with open access to contest data as well as credits that can be used for Google Compute Engine virtual machines. By leveraging the power of cloud technologies for genomics computing, contestants have access to powerful computational resources and a platform that allows the sharing of data. We hope to democratize research, foster the open access of data, and spur collaboration.
In addition to the core Google Cloud Platform infrastructure, the Google Genomics team has implemented a
simple web-based API
to store, process, explore, and share genomic data at scale. We have made the Challenge datasets available through the Google Genomics API. The challenge includes both simulated tumor data for which the correct answers are known and real tumor data for which the correct answers are not known.
Genomics API Browser
showing a particular cancer variant position (highlighted) in dataset
in silico #1
that was missed by many challenge participants.
Although submissions for the simulated data can be scored immediately, the winners on the real tumor data will not immediately be known when the challenge closes. This is a consequence of the fact that current DNA sequencing technology does not provide 100% accurate data, which adds to the complexity of the problem these algorithms are attempting to tackle. Therefore, to identify the winners, researchers must turn to alternative laboratory technologies to verify if a particular mutation that was found in sequencing data is actually (or likely) to be true. As such, additional data will be collected after the Challenge is complete in order to determine the winner. The organizers will re-sequence DNA from the cells of the real tumor using an independent sequencing technology (Ion Torrent), specifically examining regions overlapping the positions of the cancer mutations submitted by the contest participants.
As an analogy, a "scratched magnifying glass" is used to examine the genome the first time around. The second time around, a "stronger magnifying glass with scratches in different places" is used to look at the specific locations in the genome reported by the challenge participants. By combining the data collected by those two different "magnifying glasses", and then comparing that against the cancer mutations submitted by the contest participants, the winner will then be determined.
We believe we are at the beginning of a transformation in medicine and basic research, driven by advances in genome sequencing and computing at scale. With the DREAM Challenge, we are all excited to be part of bringing researchers around the world to focus on this particular cancer research problem. To learn more about how to participate in the challenge
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Cloud Platform
Google Play Apps
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog