Google Research Blog
The latest news from Research at Google
Introducing the Open Images Dataset
Friday, September 30, 2016
Posted by Ivan Krasin and Tom Duerig, Software Engineers
In the last few years, advances in machine learning have enabled
to progress rapidly, allowing for systems that can
automatically caption images
to apps that can create
natural language replies in response to shared photos
. Much of this progress can be attributed to publicly available image datasets, such as
for supervised learning, and
for unsupervised learning.
Today, we introduce
, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried to make the dataset as practical as possible: the labels cover more real-life entities than the 1000 ImageNet classes, there are enough images to train a deep neural network from scratch and the images are listed as having a
Creative Commons Attribution
The image-level annotations have been populated automatically with a vision model similar to
Google Cloud Vision API
. For the validation set, we had human raters verify these automated labels to find and remove false positives. On average, each image has about 8 labels assigned. Here are some examples:
Annotated images form the Open Images dataset.
. Both images used under
CC BY 2.0
We have trained an Inception v3 model based on Open Images annotations alone, and the model is good enough to be used for fine-tuning applications as well as for other things, like
artistic style transfer
which require a well developed hierarchy of filters. We hope to improve the quality of the annotations in Open Images the coming months, and therefore the quality of models which can be trained.
The dataset is a product of a collaboration between Google, CMU and Cornell universities, and there are a number of research papers built on top of the Open Images dataset in the works. It is our hope that datasets like
recently released YouTube-8M
will be useful tools for the machine learning community.
While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.
Image Compression with Neural Networks
Thursday, September 29, 2016
Posted by Nick Johnston and David Minnen, Software Engineers
Data compression is used nearly everywhere on the internet - the videos you watch online, the images you share, the music you listen to, even the blog you're reading right now. Compression techniques make sharing the content you want quick and efficient. Without data compression, the time and bandwidth costs for getting the information you need, when you need it, would be exorbitant!
Full Resolution Image Compression with Recurrent Neural Networks
", we expand on our
on data compression using neural networks, exploring whether machine learning can provide better results for image compression like it has for
. Furthermore, we are
releasing our compression model
so you can experiment with compressing your own images with our network.
We introduce an architecture that uses a new variant of the
Gated Recurrent Unit
(a type of
that allows units to save activations and process sequences) called Residual Gated Recurrent Unit (Residual GRU). Our Residual GRU combines existing GRUs with the residual connections introduced in "
Deep Residual Learning for Image Recognition
" to achieve significant image quality gains for a given compression rate. Instead of using a DCT to generate a new bit representation like many compression schemes in use today, we train two sets of neural networks - one to create the codes from the image (encoder) and another to create the image from the codes (decoder).
Our system works by iteratively refining a reconstruction of the original image, with both the encoder and decoder using Residual GRU layers so that additional information can pass from one iteration to the next. Each iteration adds more bits to the encoding, which allows for a higher quality reconstruction. Conceptually, the network operates as follows:
The initial residual, R, corresponds to the original image I: R = I.
Set i=1 for to the first iteration.
Iteration[i] takes R[i-1] as input and runs the encoder and binarizer to compress the image into B[i].
Iteration[i] runs the decoder on B[i] to generate a reconstructed image P[i].
The residual for Iteration[i] is calculated: R[i] = I - P[i].
Set i=i+1 and go to Step 3 (up to the desired number of iterations).
The residual image represents how different the current version of the compressed image is from the original. This image is then given as input to the network with the goal of removing the compression errors from the next version of the compressed image. The compressed image is now represented by the concatenation of B through B[N]. For larger values of N, the decoder gets more information on how to reduce the errors and generate a higher quality reconstruction of the original image.
To understand how this works, consider the following example of the first two iterations of the image compression network, shown in the figures below. We start with an image of a lighthouse. On the first pass through the network, the original image is given as an input (R = I). P is the reconstructed image. The difference between the original image and encoded image is the residual, R, which represents the error in the compression.
Original image, I = R.
Reconstructed image, P.
the residual, R, which represents the error introduced by compression.
On the second pass through the network, R is given as the network’s input (see figure below). A higher quality image P is then created. So how does the system recreate such a good image (P, center panel below) from the residual R? Because the model uses recurrent nodes with memory, the network saves information from each iteration that it can use in the next one. It learned something about the original image in Iteration that is used along with R to generate a better P from B. Lastly, a new residual, R (right), is generated by subtracting P from the original image. This time the residual is smaller since there are fewer differences between the reconstructed image, and what we started with.
The second pass through the network.
R is given as input.
A higher quality reconstruction, P.
A smaller residual R is generated by subtracting P from the original image.
At each further iteration, the network gains more information about the errors introduced by compression (which is captured by the residual image). If it can use that information to predict the residuals even a little bit, the result is a better reconstruction. Our models are able to make use of the extra bits up to a point. We see diminishing returns, and at some point the representational power of the network is exhausted.
To demonstrate file size and quality differences, we can take a photo of Vash, a
, and generate two compressed images, one JPEG and one Residual GRU. Both images target a perceptual similarity of 0.9
, a perceptual quality metric that reaches 1.0 for identical images. The image generated by our learned model results in an file 25% smaller than JPEG.
Original image (1419 KB PNG) at ~1.0 MS-SSIM.
JPEG (33 KB) at ~0.9 MS-SSIM.
Residual GRU (24 KB) at ~0.9 MS-SSIM. This is 25% smaller for a comparable image quality
Taking a look around his nose and mouth, we see that our method doesn’t have the magenta blocks and noise in the middle of the image as seen in JPEG. This is due to the
produced by JPEG, whereas our compression network works on the entire image at once. However, there's a tradeoff -- in our model the details of the whiskers and texture are lost, but the system shows great promise in reducing artifacts.
While today’s commonly used codecs perform well, our work shows that using neural networks to compress images results in a compression scheme with higher quality and smaller file sizes. To learn more about the details of our research and a comparison of other recurrent architectures, check out
. Our future work will focus on even better compression quality and faster models, so stay tuned!
Announcing YouTube-8M: A Large and Diverse Labeled Video Dataset for Video Understanding Research
Wednesday, September 28, 2016
Posted by Sudheendra Vijayanarasimhan and Paul Natsev, Software Engineers
Many recent breakthroughs in machine learning and machine perception have come from the availability of large labeled datasets, such as
, which has millions of images labeled with thousands of classes. Their availability has significantly accelerated research in image understanding, for example on
detecting and classifying objects in static images
provides even more information for detecting and recognizing objects, and understanding human actions and interactions with the world. Improving video understanding can lead to better video search and discovery, similarly to how image understanding
helped re-imagine the photos experience
. However, one of the key bottlenecks for further advancements in this area has been the lack of real-world video datasets with the same scale and diversity as image datasets.
Today, we are excited to announce the release of
, a dataset of 8 million YouTube video URLs (representing over 500,000 hours of video), along with video-level labels from a diverse set of 4800
entities. This represents a significant increase in scale and diversity compared to existing video datasets. For example,
, the largest existing labeled video dataset we are aware of, has around 1 million YouTube videos and 500 sports-specific classes--YouTube-8M represents nearly an
order of magnitude increase
in both number of videos
In order to construct a labeled video dataset of this scale, we needed to address two key challenges: (1) video is much more time-consuming to annotate manually than images, and (2) video is very computationally expensive to process and store. To overcome (1), we turned to YouTube and its video annotation system, which identifies relevant Knowledge Graph topics for all public YouTube videos. While these annotations are machine-generated, they incorporate powerful user engagement signals from millions of users as well as video metadata and content analysis. As a result, the quality of these annotations is sufficiently high to be useful for video understanding research and benchmarking purposes.
To ensure the stability and quality of the labeled video dataset, we used only public videos with more than 1000 views, and we constructed a diverse vocabulary of entities, which are visually observable and sufficiently frequent. The vocabulary construction was a combination of frequency analysis, automated filtering, verification by human raters that the entities are visually observable, and grouping into 24 top-level verticals (more details in our
). The figures below depict the
and the distribution of videos along the top-level verticals, and illustrate the dataset’s scale and diversity.
allows browsing and searching the full vocabulary of Knowledge Graph entities, grouped in 24 top-level verticals, along with corresponding videos. This screenshot depicts a subset of dataset videos annotated with the entity “Guitar”.
The distribution of videos in the top-level verticals illustrates the scope and diversity of the dataset and reflects the natural distribution of popular YouTube videos.
To address (2), we had to overcome the storage and computational resource bottlenecks that researchers face when working with videos. Pursuing video understanding at YouTube-8M’s scale would normally require a petabyte of video storage and dozens of CPU-years worth of processing. To make the dataset useful to researchers and students with limited computational resources, we pre-processed the videos and extracted frame-level
using a state-of-the-art deep learning model--the publicly available
Inception-V3 image annotation model
trained on ImageNet. These features are extracted at 1 frame-per-second temporal resolution, from 1.9 billion video frames, and are further compressed to fit on a single commodity hard disk (less than 1.5 TB). This makes it possible to download this dataset and train a baseline
model at full scale on a single GPU in less than a day!
We believe this dataset can significantly accelerate research on video understanding as it enables researchers and students without access to big data or big machines to do their research at previously unprecedented scale. We hope this dataset will spur exciting new research on video modeling architectures and representation learning, especially approaches that deal effectively with noisy or incomplete labels, transfer learning and domain adaptation. In fact, we show that pre-training models on this dataset and applying / fine-tuning on other external datasets leads to state of the art performance on them (e.g.
). You can read all about our experiments using this dataset, along with more details on how we constructed it, in our
A Neural Network for Machine Translation, at Production Scale
Tuesday, September 27, 2016
Posted by Quoc V. Le & Mike Schuster, Research Scientists, Google Brain Team
Ten years ago, we announced the
launch of Google Translate
, together with the use of
Phrase-Based Machine Translation
as the key algorithm behind this service. Since then, rapid advances in machine intelligence have improved our
capabilities, but improving machine translation remains a challenging goal.
Today we announce the Google Neural Machine Translation system (GNMT), which utilizes state-of-the-art training techniques to achieve the largest improvements to date for machine translation quality. Our full research results are described in a new technical report we are releasing today: “
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
A few years ago we started using
Recurrent Neural Networks
(RNNs) to directly learn the mapping between an input sequence (e.g. a sentence in one language) to an output sequence (that same sentence in another language) . Whereas Phrase-Based Machine Translation (PBMT) breaks an input sentence into words and phrases to be translated largely independently, Neural Machine Translation (NMT) considers the entire input sentence as a unit for translation.The advantage of this approach is that it requires fewer engineering design choices than previous Phrase-Based translation systems. When it first came out, NMT showed equivalent accuracy with existing Phrase-Based translation systems on modest-sized public benchmark data sets.
Since then, researchers have proposed many techniques to improve NMT, including work on handling rare words by mimicking an external alignment model , using attention to align input words and output words  and breaking words into smaller units to cope with rare words [5,6]. Despite these improvements, NMT wasn't fast or accurate enough to be used in a production system, such as Google Translate. Our new paper  describes how we overcame the many challenges to make NMT work on very large data sets and built a system that is sufficiently fast and accurate enough to provide better translations for Google’s users and services.
Data from side-by-side evaluations, where human raters compare the quality of translations for a given source sentence. Scores range from 0 to 6, with 0 meaning “completely nonsense translation”, and 6 meaning “perfect translation."
The following visualization shows the progression of GNMT as it translates a Chinese sentence to English. First, the network encodes the Chinese words as a list of vectors, where each vector represents the meaning of all words read so far (“Encoder”). Once the entire sentence is read, the decoder begins, generating the English sentence one word at a time (“Decoder”). To generate the translated word at each step, the decoder pays attention to a weighted distribution over the encoded Chinese vectors most relevant to generate the English word (“Attention”; the blue link transparency represents how much the decoder pays attention to an encoded word).
Using human-rated side-by-side comparison as a metric, the GNMT system produces translations that are vastly improved compared to the previous phrase-based production system. GNMT reduces translation errors by more than 55%-85% on several major language pairs measured on sampled sentences from Wikipedia and news websites with the help of bilingual human raters.
An example of a translation produced by our system for an input sentence sampled from a news site. Go
for more examples of translations for input sentences sampled randomly from news sites and books.
In addition to releasing this research paper today, we are announcing the launch of GNMT in production on a notoriously difficult language pair: Chinese to English. The Google Translate mobile and web apps are now using GNMT for 100% of machine translations from Chinese to English—about 18 million translations per day. The production deployment of GNMT was made possible by use of our publicly available machine learning toolkit
Tensor Processing Units
(TPUs), which provide sufficient computational power to deploy these powerful GNMT models while meeting the stringent latency requirements of the Google Translate product. Translating from Chinese to English is one of the more than 10,000 language pairs supported by Google Translate, and we will be working to roll out GNMT to many more of these over the coming months.
Machine translation is by no means solved. GNMT can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms, and translating sentences in isolation rather than considering the context of the paragraph or page. There is still a lot of work we can do to serve our users better. However, GNMT represents a significant milestone. We would like to celebrate it with the many researchers and engineers—both within Google and the wider community—who have contributed to this direction of research in the past few years.
We thank members of the
Google Brain team
Google Translate team
for the help with the project. We thank Nikhil Thorat and the
Big Picture team
for the visualization.
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. Technical Report, 2016.
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever, Oriol Vinyals, Quoc V. Le. Advances in Neural Information Processing Systems, 2014.
Addressing the rare word problem in neural machine translation
Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. Proceedings of the 53th Annual Meeting of the Association for Computational Linguistics, 2015.
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. International Conference on Learning Representations, 2015.
Japanese and Korean voice search
Mike Schuster, and Kaisuke Nakajima. IEEE International Conference on Acoustics, Speech and Signal Processing, 2012.
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich, Barry Haddow, Alexandra Birch. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016.
Show and Tell: image captioning open sourced in TensorFlow
Thursday, September 22, 2016
Posted by Chris Shallue, Software Engineer, Google Brain Team
In 2014, research scientists on the
Google Brain team
machine learning system to automatically produce captions that accurately describe images
. Further development of that system led to its success in the
Microsoft COCO 2015 image captioning challenge
, a competition to compare the best algorithms for computing accurate image captions, where it tied for first place.
Today, we’re making the latest version of our image captioning system
available as an open source model
. This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge
, published in
IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatically captioned by our system.
So what’s new?
Our 2014 system used the
image classification model to initialize the image encoder, which produces the encodings that are useful for recognizing different objects in the images. This was the best image model available at the time, achieving 89.6% top-5 accuracy on the benchmark ImageNet 2012 image classification task. We replaced this in 2015 with the newer
image classification model, which achieves 91.8% accuracy on the same task. The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.
Today’s code release initializes the image encoder using the
model, which achieves 93.9% accuracy on the ImageNet classification task. Initializing the image encoder with a better vision model gives the image captioning system a better ability to recognize different objects in the images, allowing it to generate more detailed and accurate descriptions. This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.
Another key improvement to the vision component comes from
the image model. This step addresses the problem that the image encoder is initialized by a model trained to
objects in images, whereas the goal of the captioning system is to
the objects in images using the encodings produced by the image model. For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee.
In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular, after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions - otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper
the better image model allows the captioning model to generate more detailed and accurate descriptions.
after fine-tuning the image model, the image captioning system is more likely to describe the colors of objects correctly.
Until recently our image captioning system was implemented in the
DistBelief software framework
. The TensorFlow implementation released today achieves the same level of accuracy with significantly faster performance: time per training step is just 0.7 seconds in TensorFlow compared to 3 seconds in DistBelief on an Nvidia K20 GPU, meaning that total training time is just 25% of the time previously required.
A natural question is whether our captioning system can generate novel descriptions of previously unseen contexts and interactions. The system is trained by showing it hundreds of thousands of images that were captioned manually by humans, and it often re-uses human captions when presented with scenes similar to what it’s seen before.
When the model is presented with scenes similar to what it’s seen before, it will often re-use human generated captions.
So does it really understand the objects and their interactions in each image? Or does it always regurgitate descriptions from the training data? Excitingly, our model
develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images. Moreover, it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.
Our model generates a completely new caption using concepts learned from similar scenes in the training set.
We hope that sharing this model in TensorFlow will help push forward image captioning research and applications, and will also allow interested people to learn and have fun. To get started training your own image captioning system, and for more details on the neural network architecture, navigate to the model’s home-page
. While our system uses the Inception V3 image classification model, you could even try training our system with the
recently released Inception-ResNet-v2 model
to see if it can do even better!
The 280-Year-Old Algorithm Inside Google Trips
Tuesday, September 20, 2016
Posted by Bogdan Arsintescu, Software Engineer & Sreenivas Gollapudi, Kostas Kollias, Tamas Sarlos and Andrew Tomkins, Research Scientists
is a lot of fun because algorithms do not go out of fashion: one never knows when an oldie-but-goodie might come in handy. Case in point: Yesterday, Google
announced Google Trips
, a new app to assist you in your travels by helping you create your own “perfect day” in a city. Surprisingly, deep inside Google Trips, there is an algorithm that was invented 280 years ago.
authored a brief but
beautiful mathematical paper
regarding the town of Königsberg and its 7 bridges, shown here:
In the paper, Euler studied the following question: is it possible to walk through the city crossing each bridge exactly once? As it turns out, for the city of Königsberg, the answer is no. To reach this answer, Euler developed a general approach to represent any layout of landmasses and bridges in terms of what he dubbed the
(the “Geometry of Place”), which we now call
. He represented each landmass as a “node” in the graph, and each bridge as an “edge,” like this:
Euler noticed that if all the nodes in the graph have an even number of edges (such graphs are called “Eulerian” in his honor) then, and only then, a cycle can be found that visits every edge exactly once. Keep this in mind, as we’ll rely on this fact later in the post.
Our team in Google Research has been fascinated by the “Geometry of Place” for some time, and we started investigating a question related to Euler’s: rather than visiting just the bridges, how can we visit as many interesting places as possible during a particular trip? We call this the “itineraries” problem. Euler didn’t study it, but it is a well known topic in Optimization, where it is often called the “
While Euler’s problem has an efficient and exact solution, the itineraries problem is not just hard to solve, it is hard to even
solve! The difficulty lies in the interplay between two conflicting goals: first, we should pick great places to visit, but second, we should pick them to allow a good itinerary: not too much travel time; don’t visit places when they’re closed; don’t visit too many museums, etc. Embedded in such problems is the challenge of finding efficient routes, often referred to as the
Travelling Salesman Problem
Algorithms for Travel Itineraries
Fortunately, the real world has a property called the “
” that says adding an extra stop to a route never makes it shorter. When the underlying geometry satisfies the triangle inequality, the TSP can be approximately solved using another
algorithm discovered by Christofides
in 1976. This is an important part of our solution, and builds on Euler’s paper, so we’ll give a quick four-step rundown of how it works here:
We start with all our destinations separate, and repeatedly connect together the closest two that aren’t yet connected. This doesn’t yet give us an itinerary, but it does connect all the destinations via a
minimum spanning tree
of the graph.
We take all the destinations that have an odd number of connections in this tree (Euler proved there must be an even number of these), and carefully pair them up.
Because all the destinations now have an even number of edges, we’ve created an Eulerian graph, so we create a route that crosses each edge exactly once.
We now have a great route, but it might visit some places more than once. No problem, we find any double visits and simply bypass them, going directly from the predecessor to the successor.
Christofides gave an elegant proof that the resulting route is always close to the shortest possible. Here’s an example of the Christofides’ algorithm in action on a location graph with the nodes representing places and the edges with costs representing the travel time between the places.
Construction of an Eulerian Tour in a location graph
Armed with this efficient route-finding subroutine, we can now start building itineraries one step at a time. At each step, we estimate the benefit to the user of each possible new place to visit, and likewise estimate the cost using the Christofides algorithm. A user’s benefit can be derived from a host of natural factors such as the popularity of the place and how different the place is relative to places already visited on the tour. We then pick whichever new place has the best benefit per unit of extra cost (e.g., time needed to include the new place in the tour). Here’s an example of our algorithm actually building a route in London using the location graph shown above:
Itineraries in Google Trips
With our first good approximate solution to the itineraries problem in hand, we started working with our colleagues from the Google Trips team, and we realized we’d barely scratched the surface. For instance, even if we produce the absolute perfect itinerary, any particular user of the system will very reasonably say, “That’s great, but all my friends say I also need to visit this other place. Plus, I’m only around for the morning, and I don’t want to miss this place you listed in the afternoon. And I’ve already seen Big Ben twice.” So rather than just producing an itinerary once and calling it a perfect day, we needed a fast dynamic algorithm for itineraries that users can modify on the fly to suit their individual taste. And because many people have bad data connections while traveling, the solution had to be efficient enough to run disconnected on a phone.
Better Itineraries Through the Wisdom of Crowds
While the algorithmic aspects of the problem were highly challenging, we realized that producing high-quality itineraries was just as dependent on our understanding of the many possible stopping points on the itinerary. We had Google’s extensive travel database to identify the interesting places to visit, and we also had great data from Google’s existing systems about how to travel from any place to any other. But we didn’t have a good sense for how people typically move through this geometry of places.
For this, we turned to the wisdom of crowds. This type of wisdom is used by Google to
estimate delays on highways
, and to discover
when restaurants are most busy
. Here, we use the same techniques to learn about common visit sequences that we can stitch together into itineraries that feel good to our users. We combine Google's knowledge of
when places are popular
, with the directions between those places to gather an idea of what tourists like to do when travelling.
And the crowd has a lot more wisdom to offer in the future. For example, we noticed that visits to Buckingham Palace spike around 11:30 and stay a bit longer than at other times of the day. This seemed a little strange to us, but when we looked more closely, it turns out to be the time of the
Changing of the Guard
. We’re looking now at ways to incorporate this type of timing information into the itinerary selection algorithms.
So give it a try: Google Trips, available now on
, has you covered from departure to return.
The 2016 Google Earth Engine User Summit: Turning pixels into insights
Monday, September 19, 2016
Posted by Chris Herwig, Program Manager, Google Earth Engine
We are trying new methods [of flood modeling] in Earth Engine based on machine learning techniques which we think are cheaper, more scalable, and could exponentially drive down the cost of flood mapping and make it accessible to everyone."
-Beth Tellman, Arizona State University and
Cloud to Street
Recently, Google headquarters hosted the
Google Earth Engine User Summit 2016
, a three-day hands-on technical workshop for scientists and students interested in using Google Earth Engine for planetary-scale cloud-based geospatial analysis. Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with a simple, yet powerful API backed by Google's cloud, which scientists and researchers use to detect, measure, and predict changes to the Earth's surface.
Earth Engine founder Rebecca Moore kicking off the first day of the summit
Summit attendees could choose among
twenty-five hands-on workshops
over the course of the three day summit, most generated for the summit specifically, giving attendees an exclusive introduction to the latest features in our platform. The sessions covered a wide range of topics and Earth Engine experience levels, from image classifiers and classifications, time series analysis, building custom web applications, all the way to arrays, matrices, and linear algebra in Earth Engine.
Product Manager, Kristi Bohl, taught a
session on using SkySat imagery
, like the image above over Sydney, Australia, for change detection. Workshop attendees also learned how to take advantage of the deep temporal stack the SkySat archive offers for change-over-time analyses.
Cross-correlation between Landsat 8 NDVI and the sum of CHIRPS precipitation. Red is high cross-correlation and blue is low. The gap in data is because CHIRPS is masked over water.
Nick Clinton, a developer advocate for Earth Engine, taught
a time series session
that covered statistical techniques as applied to satellite imagery data. Students learned how to make graphics like the above, which shows the cross-correlation between
Landsat 8 NDVI
and the sum of
from the previous month over San Francisco, CA. The correlation should be high for relatively
plants like grasses and weeds and relatively low for perennials, shrubs, or forest.
covered how users can upload their own data into Earth Engine and the many different ways to take the results of their analyses with them, including rendering static map tiles hosted on Google Cloud Storage, exporting images, creating new assets, and even making movies, like this timelapse video of all the
images captured over Sydney Australia.
Along with the workshop sessions, we hosted five plenary speakers and 18 lightning talk presenters. These presenters shared how Earth Engine fits into their research, spanning from drought monitoring, agriculture, conservation, flood risk mapping, and hydrological analysis.
Agriculture in the Sentinel era: scaling up with Earth Engine
, Guido Lemoine, European Commission's Joint Research Centre
lood Vulnerability from the Cloud to the Street (and back!) powered by Google Earth Engine
, Beth Tellman, Arizona State University and Cloud to Street
Accelerating Rangeland Conservation
, Brady Allred, University of Montana
Monitoring Drought with Google Earth Engine: From Archives to Answers
, Justin Huntington, Desert Research Institute / Western Regional Climate Center
Automated methods for surface water detection
, Gennadii Donchytes, Deltares
Mapping the Behavior of Rivers
, Alex Bryk, University of California, Berkeley
Climate Data for Crisis and Health Applications
, Pietro Ceccato, Columbia University
Appalachian Communities at Risk
, Matt Wasson, Jeff Deal, Appalachian Voices
Water, Wildlife and Working Lands
, Patrick Donnelly, U.S. Fish and Wildlife Service
Stream-side NDVI and The Salmonid Population Viability Project
, Kurt Fesenmyer, Trout Unlimited
Mapping Evapotranspiration for Water Use and Availability
, Mac Friedrichs, USGS
Dynamic Wildfire Modeling in Earth Engine
, Miranda Gray, Conservation Science Partners
Fishing at Scale, now also in Earth Engine
, David Kroodsma, Skytruth
Mapping crop yields from field to national scales in Earth Engine
, David Lobell, Stanford University
Mapping Pacific Wildfires Impacts with Earth Engine
, Matthew Lucas, University of Hawaii
EarthEnv.org - Environmental layers for accessing status and trends in biodiversity, ecosystems and climate
, Jeremy Malczyk, Map of Life
Building a Landsat 8 Mosaic of Antarctica
, Allen Pope, University of Colorado Boulder
Monitoring Primary Production at Broad Spatial and Temporal Scales
, Nathaniel Robinson, University of Montana
Assessing Urbanization Trends for Public Health: Modelling Nighttime Lights Imagery in Africa with Earth Engine
, David Savory, University of California, San Francisco
National-scale mapping of forest carbon
, Ty Wilson, US Forest Service
Utilizing Google Earth Engine to Enhance Decision-Making Capabilities
, Brittany Zajic, NASA DEVELOP National Program
Keeping our users first
It is always inspiring to see such a diverse group of people come together to celebrate, learn, and share all the amazing and wondrous things people are doing with Earth Engine. It is not only an opportunity for our users to learn the latest techniques; it is also a way for the Earth Engine team to experience the new and exciting ways people are harnessing Earth Engine to solve some of the most pressing environmental issues facing humanity.
We've already begun planning for next year's user summit, and based on the success of this year's, we're hoping to hold an even larger one.
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Cloud Platform
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog