Google Research Blog
The latest news from Research at Google
PhotoScan: Taking Glare-Free Pictures of Pictures
Thursday, April 20, 2017
Posted by Ce Liu, Michael Rubinstein, Mike Krainin and Bill Freeman, Research Scientists
Yesterday, we released an
update to PhotoScan
, an app for
that allows you to digitize photo prints with just a smartphone. One of the key features of PhotoScan is the ability to remove glare from prints, which are often glossy and reflective, as are the plastic album pages or glass-covered picture frames that host them. To create this feature, we developed a unique blend of computer vision and image processing techniques that can carefully align and combine several slightly different pictures of a print to separate the glare from the image underneath.
A regular digital picture of a physical print.
Glare-free digital output from PhotoScan
When taking a single picture of a photo, determining which regions of the picture are the actual photo and which regions are glare is challenging to do automatically. Moreover, the glare may often saturate regions in the picture, rendering it impossible to see or recover the parts of the photo underneath it. But if we take several pictures of the photo while moving the camera, the position of the glare tends to change, covering different regions of the photo. In most cases we found that every pixel of the photo is likely not to be covered by glare in at least one of the pictures. While no single view may be glare-free, we can combine multiple pictures of the printed photo taken at different angles to remove the glare. The challenge is that the images need to be aligned very accurately in order to combine them properly, and this processing needs to run very quickly on the phone to provide a near instant experience.
The captured, input images (5 in total).
If we stabilize the images on the photo, we can see just the glare moving, covering different parts of the photo. Notice no single image is glare-free.
Our technique is inspired by our earlier work published at
, which we dubbed “
”. It uses similar principles to remove various types of obstructions from the field of view. However, the algorithm we originally proposed was based on a generative model where the motion and appearance of both the main scene and the obstruction layer are estimated. While that model is quite powerful and can remove a variety of obstructions, it is too computationally expensive to be run on smartphones. We therefore developed a simpler model that treats glare as an outlier, and only attempts to register the underlying, glare-free photo. While this model is simpler, the task is still quite challenging as the registration needs to be highly accurate and robust.
How it Works
We start from a series of pictures of the print taken by the user while moving the camera. The first picture - the “reference frame” - defines the desired output viewpoint. The user is then instructed to take four additional frames. In each additional frame, we detect
sparse feature points
) and use them to establish
mapping each frame to the reference frame.
Left: Detected feature matches between the reference frame and each other frame (left), and the warped frames according to the estimated homographies (right).
While the technique may sound straightforward, there is a catch - homographies are only able to align flat images. But printed photos are often not entirely flat (as is the case with the example shown above). Therefore, we use
— a fundamental, computer vision representation for motion, which establishes pixel-wise mapping between two images — to correct the non-planarities. We start from the homography-aligned frames, and compute “flow fields” to warp the images and further refine the registration. In the example below, notice how the corners of the photo on the left slightly “move” after registering the frames using only homographies. The right hand side shows how the photo is better aligned after refining the registration using optical flow.
Comparison between the warped frames using homographies (left) and after the additional warp refinement using optical flow (right).
The difference in the registration is subtle, but has a big impact on the end result. Notice how small misalignments manifest themselves as duplicated image structures in the result, and how these artifacts are alleviated with the additional flow refinement.
Comparison between the glare removal result with (right) and without (left) optical flow refinement. In the result using homographies only (left), notice artifacts around the eye, nose and teeth of the person, and duplicated stems and flower petals on the fabric.
Here too, the challenge was to make optical flow, a naturally slow algorithm, work very quickly on the phone. Instead of computing optical flow at each pixel as done traditionally (the number of flow vectors computed is equal to the number of input pixels), we represent a flow field by a smaller number of control points, and express the motion at each pixel in the image as a function of the motion at the control points. Specifically, we divide each image into tiled, non-overlapping cells to form a coarse grid, and represent the flow of a pixel in a cell as the
of the flow at the four corners of the cell that contains it.
The grid setup for grid optical flow. A point p is represented as the
of the four corner points of the cell that encapsulates it.
Illustration of the computed flow field on one of the frames.
The flow color coding: orientation and magnitude represented by hue and saturation, respectively.
This results in a much smaller problem to solve, since the number of flow vectors to compute now equals the number of grid points, which is typically much smaller than the number of pixels. This process is similar in nature to the spline-based image registration described in
Szeliski and Coughlan (1997)
. With this algorithm, we were able to reduce the optical flow computation time by a factor of ~40 on a Pixel phone!
Flipping between the homography-registered frame and the flow-refined warped frame (using the above flow field), superimposed on the (clean) reference frame, shows how the computed flow field “snaps” image parts to their corresponding parts in the reference frame, improving the registration.
Finally, in order to compose the glare-free output, for any given location in the registered frames, we examine the pixel values, and use a soft minimum algorithm to obtain the darkest observed value. More specifically, we compute the expectation of the minimum brightness over the registered frames, assigning less weight to pixels close to the (warped) image boundaries. We use this method rather than computing the minimum directly across the frames due to the fact that corresponding pixels at each frame may have slightly different brightness. Therefore, per-pixel minimum can produce visible seams due to sudden intensity changes at boundaries between overlaid images.
Regular minimum (left) versus soft minimum (right) over the registered frames.
The algorithm can support a variety of scanning conditions — matte and gloss prints, photos inside or outside albums, magazine covers.
To get the final result, the Photos team has developed a method that automatically detects and crops the photo area, and rectifies it to a frontal view. Because of perspective distortion, the scanned rectangular photo usually appears to be a quadrangle on the image. The method analyzes image signals, like color and edges, to figure out the exact boundary of the original photo on the scanned image, then applies a geometric transformation to rectify the quadrangle area back to its original rectangular shape yielding high-quality, glare-free digital version of the photo.
So overall, quite a lot going on under the hood, and all done almost instantaneously on your phone! To give PhotoScan a try, download the app on
Teaching Machines to Draw
Thursday, April 13, 2017
Posted by David Ha, Google Brain Resident
Abstract visual communication is a key part of how people convey ideas to one another. From a young age, children develop the ability to depict objects, and arguably even emotions, with only a few pen strokes. These simple drawings may not resemble reality as captured by a photograph, but they do tell us something about how people represent and reconstruct images of the world around them.
Vector drawings produced by sketch-rnn.
In our recent paper, “
A Neural Representation of Sketch Drawings
”, we present a generative
recurrent neural network
capable of producing sketches of common objects, with the goal of training a machine to draw and generalize abstract concepts in a manner similar to humans. We train our model on a dataset of hand-drawn sketches, each represented as a sequence of motor actions controlling a pen: which direction to move, when to lift the pen up, and when to stop drawing. In doing so, we created a model that potentially has many applications, from assisting the creative process of an artist, to helping teach students how to draw.
While there is a already a large body of existing work on generative modelling of images using neural networks, most of the work focuses on modelling
represented as a 2D grid of pixels. While these models are currently able to generate realistic images, due to the high dimensionality of a 2D grid of pixels, a key challenge for them is to generate images with coherent structure. For example, these models sometimes produce amusing images of cats with three or more eyes, or dogs with multiple heads.
Examples of animals generated with the wrong number of body parts, produced using previous GAN models trained on 128x128 ImageNet dataset. The image above is Figure 29 of
Generative Adversarial Networks
, Ian Goodfellow, NIPS 2016 Tutorial.
In this work, we investigate a lower-dimensional vector-based representation inspired by how people draw. Our model, sketch-rnn, is based on the
(seq2seq) autoencoder framework. It incorporates
as recurrent neural network cells. The goal of a seq2seq autoencoder is to train a network to encode an input sequence into a vector of floating point numbers, called a
vector, and from this latent vector reconstruct an output sequence using a decoder that replicates the input sequence as closely as possible.
Schematic of sketch-rnn.
In our model, we deliberately add noise to the latent vector. In our paper, we show that by inducing noise into the communication channel between the encoder and the decoder, the model is no longer be able to reproduce the input sketch exactly, but instead must learn to capture the essence of the sketch as a noisy latent vector. Our decoder takes this latent vector and produces a sequence of motor actions used to construct a new sketch. In the figure below, we feed several actual sketches of cats into the encoder to produce reconstructed sketches using the decoder.
Reconstructions from a model trained on cat sketches.
It is important to emphasize that the reconstructed cat sketches are
copies of the input sketches, but are instead
sketches of cats with similar characteristics as the inputs. To demonstrate that the model is not simply copying from the input sequence, and that it actually learned something about the way people draw cats, we can try to feed in non-standard sketches into the encoder:
When we feed in a sketch of a three-eyed cat, the model generates a similar looking cat that has two eyes instead, suggesting that our model has learned that cats usually only have two eyes. To show that our model is not simply choosing the closest normal-looking cat from a large collection of memorized cat-sketches, we can try to input something totally different, like a sketch of a toothbrush. We see that the network generates a cat-like figure with long whiskers that mimics the features and orientation of the toothbrush. This suggests that the network has learned to encode an input sketch into a set of abstract cat-concepts embedded into the latent vector, and is also able to reconstruct an entirely new sketch based on this latent vector.
Not convinced? We repeat the experiment again on a model trained on pig sketches and arrive at similar conclusions. When presented with an eight-legged pig, the model generates a similar pig with only four legs. If we feed a truck into the pig-drawing model, we get a pig that looks a bit like the truck.
Reconstructions from a model trained on pig sketches.
To investigate how these latent vectors encode conceptual animal features, in the figure below, we first obtain two latent vectors encoded from two very different pigs, in this case a pig head (in the green box) and a full pig (in the orange box). We want to get a sense of how our model learned to represent pigs, and one way to do this is to interpolate between the two different latent vectors, and visualize each generated sketch from each interpolated latent vector. In the figure below, we visualize how the sketch of the pig head slowly morphs into the sketch of the full pig, and in the process show how the model organizes the concepts of pig sketches. We see that the latent vector controls the relatively position and size of the nose relative to the head, and also the existence of the body and legs in the sketch.
Latent space interpolations generated from a model trained on pig sketches.
We would also like to know if our model can learn representations of multiple animals, and if so, what would they look like? In the figure below, we generate sketches from interpolating latent vectors between a cat head and a full pig. We see how the representation slowly transitions from a cat head, to a cat with a tail, to a cat with a fat body, and finally into a full pig. Like a child learning to draw animals, our model learns to construct animals by attaching a head, feet, and a tail to its body. We see that the model is also able to draw cat heads that are distinct from pig heads.
Latent Space Interpolations from a model trained on sketches of both cats and pigs.
These interpolation examples suggest that the latent vectors indeed encode conceptual features of a sketch. But can we use these features to augment other sketches without such features - for example, adding a body to a cat's head?
Learned relationships between abstract concepts, explored using latent vector arithmetic.
Indeed, we find that sketch drawing analogies are possible for our model trained on both cat and pig sketches. For example, we can subtract the latent vector of an encoded pig head from the latent vector of a full pig, to arrive at a vector that represents the concept of a body. Adding this difference to the latent vector of a cat head results in a full cat (i.e. cat head + body = full cat). These drawing analogies allow us to explore how the model organizes its latent space to represent different concepts in the manifold of generated sketches.
In addition to the research component of this work, we are also super excited about potential creative applications of sketch-rnn. For instance, even in the simplest use case, pattern designers can apply sketch-rnn to generate a large number of similar, but unique designs for textile or wallpaper prints.
Similar, but unique cats, generated from a single input sketch (green and yellow boxes).
As we saw earlier, a model trained to draw pigs can be made to draw pig-like trucks if given an input sketch of a truck. We can extend this result to applications that might help creative designers come up with abstract designs that can resonate more with their target audience.
For instance, in the figure below, we feed sketches of four different chairs into our cat-drawing model to produce four chair-like cats. We can go further and incorporate the interpolation methodology described earlier to explore the latent space of chair-like cats, and produce a large grid of generated designs to select from.
Exploring the latent space of generated chair-cats.
Exploring the latent space between different objects can potentially enable creative designers to find interesting intersections and relationships between different drawings.
Exploring the latent space of generated sketches of everyday objects.
Latent space interpolation from left to right, and then top to bottom.
We can also use the decoder module of sketch-rnn as a standalone model and train it to predict different possible endings of incomplete sketches. This technique can lead to applications where the model assists the creative process of an artist by suggesting alternative ways to finish an incomplete sketch. In the figure below, we draw different incomplete sketches (in red), and have the model come up with different possible ways to complete the drawings.
The model can start with incomplete sketches (the red partial sketches to the left of the vertical line) and automatically generate different completions.
We can take this concept even further, and have different models complete the same incomplete sketch. In the figures below, we see how to make the same circle and square figures become a part of various ants, flamingos, helicopters, owls, couches and even paint brushes. By using a diverse set of models trained to draw various objects, designers can explore creative ways to communicate meaningful visual messages to their audience.
Predicting the endings of the same circle and square figures (center) using various sketch-rnn models trained to draw different objects.
We are very excited about the future possibilities of generative vector image modelling. These models will enable many exciting new creative applications in a variety of different directions. They can also serve as a tool to help us improve our understanding of our own creative thought processes. Learn more about sketch-rnn by reading our paper, “
A Neural Representation of Sketch Drawings
We thank Ian Johnson, Jonas Jongejan, Martin Wattenberg, Mike Schuster, Ben Poole, Kyle Kastner, Junyoung Chung, Kyle McDonald for their help with this project. This work was done as part of the
Google Brain Residency
Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow
Tuesday, April 11, 2017
Posted by Anna Goldie and Denny Britz, Research Software Engineer and Google Brain Resident, Google Brain Team
(Crossposted on the
Google Open Source Blog
Last year, we announced
Google Neural Machine Translation
(GNMT), a sequence-to-sequence (“
”) model which is now used in Google Translate production systems. While GNMT achieved huge improvements in translation quality, its impact was limited by the fact that the framework for training these models was unavailable to external researchers.
Today, we are excited to introduce
, an open source seq2seq framework in TensorFlow that makes it easy to experiment with seq2seq models and achieve state-of-the-art results. To that end, we made the tf-seq2seq codebase clean and modular, maintaining full test coverage and documenting all of its functionality.
Our framework supports various configurations of the standard seq2seq model, such as depth of the encoder/decoder, attention mechanism, RNN cell type, or beam size. This versatility allowed us to discover optimal hyperparameters and outperform other frameworks, as described in our paper, “
Massive Exploration of Neural Machine Translation Architectures
A seq2seq model translating from Mandarin to English. At each time step, the encoder takes in one Chinese character and its own previous state (black arrow), and produces an output vector (blue arrow). The decoder then generates an English translation word-by-word, at each time step taking in the last word, the previous state, and a weighted combination of all the outputs of the encoder (aka attention , depicted in blue) and then producing the next English word. Please note that in our implementation we use wordpieces  to handle rare words.
In addition to machine translation, tf-seq2seq can also be applied to any other sequence-to-sequence task (i.e. learning to produce an output sequence given an input sequence), including machine summarization, image captioning, speech recognition, and conversational modeling. We carefully designed our framework to maintain this level of generality and provide tutorials, preprocessed data, and other utilities for
We hope that you will use tf-seq2seq to accelerate (or kick off) your own deep learning research. We also welcome your contributions to our
, where we have a variety of open issues that we would love to have your help with!
We’d like to thank Eugene Brevdo, Melody Guan, Lukasz Kaiser, Quoc V. Le, Thang Luong, and Chris Olah for all their help. For a deeper dive into how seq2seq models work, please see the resources below.
Massive Exploration of Neural Machine Translation Architectures
Denny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever, Oriol Vinyals, Quoc V. Le. NIPS, 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. ICLR, 2015
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. Technical Report, 2016
Attention and Augmented Recurrent Neural Networks
Chris Olah, Shan Carter. Distill, 2016
Neural Machine Translation and Sequence-to-sequence Models: A Tutorial
Announcing the 2017 Google PhD Fellows for North America, Europe and the Middle East
Monday, April 10, 2017
Posted by Michael Rennaker, Program Manager
Google created the
PhD Fellowship program
in 2009 to recognize and support outstanding graduate students doing exceptional research in Computer Science and related disciplines. Now in its eighth year, our fellowship program has supported hundreds of future faculty, industry researchers, innovators and entrepreneurs.
Reflecting our continuing commitment to supporting and building relationships with the academic community, we are excited to announce the 33 recipients from North America, Europe and the Middle East. We offer our sincere congratulations to Google’s 2017 Class of Google PhD Fellows.
Algorithms, Optimizations and Markets
Chiu Wai Sam Wong,
University of California, Berkeley
University of Southern California
University of Illinois, Urbana-Champaign
University of Minnesota - Twin Cities
Fondation Sciences Mathématiques de Paris
University College London
New York University
University of Amsterdam
University of Toronto
Machine Perception, Speech Technology and Computer Vision
Saarland University - Saarbrücken GSCS and MPI Institute for Informatics
Imperial College London
University of California, San Diego
University of Texas, Austin
Natural Language Processing
Jianpeng Cheng, The University of Edinburgh
Kevin Clark, Stanford University
Tim Rocktaschel, University College London
Privacy and Security
ENS - École Normale Supérieure
University of Maryland, College Park
Programming Languages and Software Engineering
Christoffer Quist Adamsen,
Muhammad Ali Gulzar,
University of California, Los Angeles
Structured Data and Database Management
University of Illinois, Urbana-Champaign
Systems and Networking
Ahmed M. Said Mohamed Tawfik Issa,
Georgia Institute of Technology
University of California, Irvine
University of California, Berkeley
Predicting Properties of Molecules with Machine Learning
Friday, April 07, 2017
Posted by George Dahl, Research Scientist, Google Brain Team
Recently there have been many exciting applications of machine learning (ML) to chemistry, particularly in chemical search problems, from
to finding better
. Historically, chemists have used numerical approximations to Schrödinger’s equation, such as
Density Functional Theory
(DFT), in these sorts of chemical searches. However, the computational cost of these approximations limits the size of the search. In the hope of enabling larger searches, several research groups have created ML models to predict chemical properties using training data generated by DFT (e.g.
Rupp et al.
Behler and Parrinello
). Expanding upon this previous work, we have been applying various modern ML methods to the
–a public collection of molecules paired with DFT-computed electronic, thermodynamic, and vibrational properties.
We have recently posted two papers describing our research in this area that grew out of a collaboration between the
Google Brain team
, the Google Accelerated Science team,
, and the
University of Basel
includes a new featurization of molecules and a systematic assessment of a multitude of machine learning methods on the QM9 benchmark. After trying many existing approaches on this benchmark, we worked on improving the most promising deep neural network models.
The resulting second paper, “
Neural Message Passing for Quantum Chemistry
,” describes a model family called Message Passing Neural Networks (MPNNs), which are defined abstractly enough to include many previous neural net models that are invariant to graph symmetries. We developed novel variations within the MPNN family which significantly outperform all baseline methods on the QM9 benchmark, with improvements of nearly a factor of four on some targets.
One reason molecular data is so interesting from a machine learning standpoint is that one natural representation of a molecule is as a graph with
atoms as nodes and bonds as edges
. Models that can leverage inherent symmetries in data will tend to generalize better — part of the success of convolutional neural networks on images is due to their ability to incorporate our prior knowledge about the invariances of image data (e.g. a picture of a dog shifted to the left is still a picture of a dog). Invariance to graph symmetries is a particularly desirable property for machine learning models that operate on graph data, and there has been a lot of interesting research in this area as well (e.g.
Li et al.
Duvenaud et al.
Kearnes et al.
Defferrard et al.
). However, despite this progress, much work remains. We would like to find the best versions of these models for chemistry (and other) applications and map out the connections between different models proposed in the literature.
Our MPNNs set a new state of the art for predicting all 13 chemical properties in QM9. On this particular set of molecules, our model can predict 11 of these properties accurately enough to potentially be useful to chemists, but up to 300,000 times faster than it would take to simulate them using DFT. However, much work remains to be done before MPNNs can be of real practical use to chemists. In particular, MPNNs must be applied to a significantly more diverse set of molecules (e.g. larger or with a more varied set of heavy atoms) than exist in QM9. Of course, even with a realistic training set, generalization to very different molecules could still be poor. Overcoming both of these challenges will involve making progress on questions–such as generalization–that are at the heart of machine learning research.
Predicting the properties of molecules is a practically important problem that both benefits from advanced machine learning techniques and presents interesting fundamental research challenges for learning algorithms. Eventually, such predictions could aid the design of new medicines and materials that benefit humanity. At Google, we feel that it’s important to disseminate our research and to help train new researchers in machine learning. As such, we are delighted that the first and second authors of our MPNN paper are
Google Brain residents
Federated Learning: Collaborative Machine Learning without Centralized Training Data
Thursday, April 06, 2017
Posted by Brendan McMahan and Daniel Ramage, Research Scientists
Standard machine learning approaches require centralizing the training data on one machine or in a datacenter. And Google has built one of the most secure and robust cloud infrastructures for processing this data to make our services better. Now for models trained from user interaction with mobile devices, we're introducing an additional approach:
Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. This goes beyond the use of local models that make predictions on mobile devices (like the
Mobile Vision API
On-Device Smart Reply
) by bringing model
to the device as well.
It works like this: your device downloads the current model, improves it by learning from data on your phone, and then summarizes the changes as a small focused update. Only this update to the model is sent to the cloud, using encrypted communication, where it is immediately averaged with other user updates to improve the shared model. All the training data remains on your device, and no individual updates are stored in the cloud.
Your phone personalizes the model locally, based on your usage (A). Many users' updates are aggregated (B) to form a consensus change (C) to the shared model, after which the procedure is repeated.
Federated Learning allows for smarter models, lower latency, and less power consumption, all while ensuring privacy. And this approach has another immediate benefit: in addition to providing an update to the shared model, the improved model on your phone can also be used immediately, powering experiences personalized by the way you use your phone.
We're currently testing Federated Learning in
Gboard on Android
, the Google Keyboard. When Gboard shows a suggested query, your phone locally stores information about the current context and whether you clicked the suggestion. Federated Learning processes that history on-device to suggest improvements to the next iteration of Gboard’s query suggestion model.
To make Federated Learning possible, we had to overcome many algorithmic and technical challenges. In a typical machine learning system, an optimization algorithm like
Stochastic Gradient Descent
(SGD) runs on a large dataset partitioned homogeneously across servers in the cloud. Such highly iterative algorithms require low-latency, high-throughput connections to the training data. But in the Federated Learning setting, the data is distributed across millions of devices in a highly uneven fashion. In addition, these devices have significantly higher-latency, lower-throughput connections and are only intermittently available for training.
These bandwidth and latency limitations motivate our
Federated Averaging algorithm
, which can train deep networks using 10-100x less communication compared to a naively federated version of SGD. The key idea is to use the powerful processors in modern mobile devices to compute higher quality updates than simple gradient steps. Since it takes fewer iterations of high-quality updates to produce a good model, training can use much less communication. As upload speeds are typically
than download speeds, we also developed a novel way to reduce upload communication costs up to another 100x by
using random rotations and quantization. While these approaches are focused on training deep networks, we've also
for high-dimensional sparse convex models which excel on problems like click-through-rate prediction.
Deploying this technology to millions of heterogenous phones running Gboard requires a sophisticated technology stack. On device training uses a miniature version of
. Careful scheduling ensures training happens only when the device is idle, plugged in, and on a free wireless connection, so there is no impact on the phone's performance.
Your phone participates in Federated Learning only
when it won't negatively impact your experience.
The system then needs to communicate and aggregate the model updates in a secure, efficient, scalable, and fault-tolerant way. It's only the combination of research with this infrastructure that makes the benefits of Federated Learning possible.
Federated learning works without the need to store user data in the cloud, but we're not stopping there. We've developed a
Secure Aggregation protocol
that uses cryptographic techniques so a coordinating server can only decrypt the average update if 100s or 1000s of users have participated — no individual phone's update can be inspected before averaging. It's the first protocol of its kind that is practical for deep-network-sized problems and real-world connectivity constraints. We designed Federated Averaging so the coordinating server only needs the average update, which allows Secure Aggregation to be used; however the protocol is general and can be applied to other problems as well. We're working hard on a production implementation of this protocol and expect to deploy it for Federated Learning applications in the near future.
Our work has only scratched the surface of what is possible. Federated Learning can't solve all machine learning problems (for example, learning to
recognize different dog breeds
by training on carefully labeled examples), and for many other models the necessary training data is already stored in the cloud (like training spam filters for Gmail). So Google will continue to advance the state-of-the-art for cloud-based ML, but we are also committed to ongoing research to expand the range of problems we can solve with Federated Learning. Beyond Gboard query suggestions, for example, we hope to improve the language models that power your keyboard based on what you actually type on your phone (which can have a style all its own) and photo rankings based on what kinds of photos people look at, share, or delete.
Applying Federated Learning requires machine learning practitioners to adopt new tools and a new way of thinking: model development, training, and evaluation with no direct access to or labeling of raw data, with communication cost as a limiting factor. We believe the user benefits of Federated Learning make tackling the technical challenges worthwhile, and are publishing our work with hopes of a widespread conversation within the machine learning community.
This post reflects the work of many people in Google Research, including Blaise Agüera y Arcas, Galen Andrew, Dave Bacon, Keith Bonawitz, Chris Brumme, Arlie Davis, Jac de Haan, Hubert Eichner, Wolfgang Grieskamp, Wei Huang, Vladimir Ivanov, Chloé Kiddon, Jakub Konečný, Nicholas Kong, Ben Kreuter, Alison Lentz, Stefano Mazzocchi, Sarvar Patel, Martin Pelikan, Aaron Segal, Karn Seth, Ananda Theertha Suresh, Iulia Turc, Felix Yu, Antonio Marcedone and our partners in the Gboard team.
Keeping fake listings off Google Maps
Thursday, April 06, 2017
Posted by Doug Grundman, Maps Anti-Abuse, and Kurt Thomas, Security & Anti-Abuse Research
(Crossposted on the
Google Security blog
Google My Business
enables millions of business owners to create listings and share information about their business on Google Maps and Search, making sure everything is up-to-date and accurate for their customers. Unfortunately, some actors attempt to abuse this service to register fake listings in order to defraud legitimate business owners, or to
charge exorbitant service fees for services
Over a year ago, we teamed up with the University of California, San Diego to research the actors behind fake listings, in order to improve our products and keep our users safe. The full report, “
Pinning Down Abuse on Google Maps
”, will be presented tomorrow at the 2017
International World Wide Web Conference
Our study shows that fewer than 0.5% of local searches lead to fake listings. We’ve also improved how we verify new businesses, which has reduced the number of fake listings by 70% from its all-time peak back in June 2015.
What is a fake listing?
For over a year, we tracked the bad actors behind fake listings. Unlike email-based scams
selling knock-off products online
, local listing scams require physical proximity to potential victims. This fundamentally changes both the scale and types of abuse possible.
Bad actors posing as locksmiths, plumbers, electricians, and other contractors were the most common source of abuse—roughly 2 out of 5 fake listings. The actors operating these fake listings would cycle through non-existent postal addresses and disposable VoIP phone numbers even as their listings were discovered and disabled. The purported addresses for these businesses were irrelevant as the contractors would travel directly to potential victims.
Another 1 in 10 fake listings belonged to real businesses that bad actors had improperly claimed ownership over, such as hotels and restaurants. While making a reservation or ordering a meal was indistinguishable from the real thing, behind the scenes, the bad actors would deceive the actual business into paying referral fees for organic interest.
How does Google My Business verify information?
Google My Business currently verifies the information provided by business owners before making it available to users. For freshly created listings, we physically mail a postcard to the new listings’ address to ensure the location really exists. For businesses changing owners, we make an automated call to the listing’s phone number to verify the change.
Unfortunately, our research showed that these processes can be abused to get fake listings on Google Maps. Fake contractors would request hundreds of postcard verifications to non-existent suites at a single address, such as 123 Main St #456 and 123 Main St #789, or to stores that provided PO boxes. Alternatively, a phishing attack could maliciously repurpose freshly verified business listings by tricking the legitimate owner into sharing verification information sent either by phone or postcard.
Keeping deceptive businesses out — by the numbers
Leveraging our study’s findings, we’ve made significant changes to how we verify addresses and are even
piloting an advanced verification process
for locksmiths and plumbers. Improvements we’ve made include prohibiting bulk registrations at most addresses, preventing businesses from relocating impossibly far from their original address without additional verification, and detecting and ignoring intentionally mangled text in address fields designed to confuse our algorithms. We have also adapted our anti-spam machine learning systems to detect data discrepancies common to fake or deceptive listings.
Combined, here’s how these defenses stack up:
We detect and disable 85% of fake listings before they even appear on Google Maps.
We’ve reduced the number of abusive listings by 70% from its peak back in June 2015.
We’ve also reduced the number of impressions to abusive listings by 70%.
As we’ve shown, verifying local information comes with a number of unique anti-abuse challenges. While fake listings may slip through our defenses from time to time, we are constantly improving our systems to better serve both users and business owners.
Security and Privacy
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Cloud Platform
Google Play Apps
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog