Google Research Blog
The latest news from Research at Google
Keeping fake listings off Google Maps
Thursday, April 06, 2017
Posted by Doug Grundman, Maps Anti-Abuse, and Kurt Thomas, Security & Anti-Abuse Research
(Crossposted on the
Google Security blog
)
Google My Business
enables millions of business owners to create listings and share information about their business on Google Maps and Search, making sure everything is up-to-date and accurate for their customers. Unfortunately, some actors attempt to abuse this service to register fake listings in order to defraud legitimate business owners, or to
charge exorbitant service fees for services
.
Over a year ago, we teamed up with the University of California, San Diego to research the actors behind fake listings, in order to improve our products and keep our users safe. The full report, “
Pinning Down Abuse on Google Maps
”, will be presented tomorrow at the 2017
International World Wide Web Conference
.
Our study shows that fewer than 0.5% of local searches lead to fake listings. We’ve also improved how we verify new businesses, which has reduced the number of fake listings by 70% from its all-time peak back in June 2015.
What is a fake listing?
For over a year, we tracked the bad actors behind fake listings. Unlike email-based scams
selling knock-off products online
, local listing scams require physical proximity to potential victims. This fundamentally changes both the scale and types of abuse possible.
Bad actors posing as locksmiths, plumbers, electricians, and other contractors were the most common source of abuse—roughly 2 out of 5 fake listings. The actors operating these fake listings would cycle through non-existent postal addresses and disposable VoIP phone numbers even as their listings were discovered and disabled. The purported addresses for these businesses were irrelevant as the contractors would travel directly to potential victims.
Another 1 in 10 fake listings belonged to real businesses that bad actors had improperly claimed ownership over, such as hotels and restaurants. While making a reservation or ordering a meal was indistinguishable from the real thing, behind the scenes, the bad actors would deceive the actual business into paying referral fees for organic interest.
How does Google My Business verify information?
Google My Business currently verifies the information provided by business owners before making it available to users. For freshly created listings, we physically mail a postcard to the new listings’ address to ensure the location really exists. For businesses changing owners, we make an automated call to the listing’s phone number to verify the change.
Unfortunately, our research showed that these processes can be abused to get fake listings on Google Maps. Fake contractors would request hundreds of postcard verifications to non-existent suites at a single address, such as 123 Main St #456 and 123 Main St #789, or to stores that provided PO boxes. Alternatively, a phishing attack could maliciously repurpose freshly verified business listings by tricking the legitimate owner into sharing verification information sent either by phone or postcard.
Keeping deceptive businesses out — by the numbers
Leveraging our study’s findings, we’ve made significant changes to how we verify addresses and are even
piloting an advanced verification process
for locksmiths and plumbers. Improvements we’ve made include prohibiting bulk registrations at most addresses, preventing businesses from relocating impossibly far from their original address without additional verification, and detecting and ignoring intentionally mangled text in address fields designed to confuse our algorithms. We have also adapted our anti-spam machine learning systems to detect data discrepancies common to fake or deceptive listings.
Combined, here’s how these defenses stack up:
We detect and disable 85% of fake listings before they even appear on Google Maps.
We’ve reduced the number of abusive listings by 70% from its peak back in June 2015.
We’ve also reduced the number of impressions to abusive listings by 70%.
As we’ve shown, verifying local information comes with a number of unique anti-abuse challenges. While fake listings may slip through our defenses from time to time, we are constantly improving our systems to better serve both users and business owners.
Helping webmasters re-secure their sites
Monday, April 18, 2016
Posted by Kurt Thomas and Yuan Niu, Spam & Abuse Research
Every week, over
10 million users encounter harmful websites
that deliver malware and scams. Many of these sites are compromised personal blogs or small business pages that have fallen victim due to a weak password or outdated software. Safe Browsing and Google Search protect visitors from dangerous content by displaying browser warnings and labeling search results with
‘this site may harm your computer’
. While this helps keep users safe in the moment, the compromised site remains a problem that needs to be fixed.
Unfortunately, many webmasters for compromised sites are unaware anything is amiss. Worse yet, even when they learn of an incident, they may lack the security expertise to take action and address the root cause of compromise. Quoting one webmaster from a survey we conducted, “our daily and weekly backups were both infected” and even after seeking the help of a specialist, after “lots of wasted hours/days” the webmaster abandoned all attempts to restore the site and instead refocused his efforts on “rebuilding the site from scratch”.
In order to find the best way to help webmasters clean-up from compromise, we recently teamed up with the University of California, Berkeley to explore how to quickly contact webmasters and expedite recovery while minimizing the distress involved. We’ve summarized our key lessons below. The full study, which you can read
here
, was recently presented at the
International World Wide Web Conference
.
When Google works directly with webmasters during critical moments like security breaches, we can help 75% of webmasters re-secure their content. The whole process takes a median of 3 days. This is a better experience for webmasters and their audience.
How many sites get compromised?
Number of freshly compromised sites Google detects every week.
Over the last year Google detected nearly 800,000 compromised websites—roughly 16,500 new sites every week from around the globe. Visitors to these sites are exposed to low-quality scam content and malware via
drive-by downloads
. While browser and search warnings help protect visitors from harm, these warnings can at times feel punitive to webmasters who learn only after-the-fact that their site was compromised. To balance the safety of our users with the experience of webmasters, we set out to find the best approach to help webmasters recover from security breaches and ultimately reconnect websites with their audience.
Finding the most effective ways to aid webmasters
Getting in touch with webmasters:
One of the hardest steps on the road to recovery is first getting in contact with webmasters. We tried three notification channels: email, browser warnings, and search warnings. For webmasters who proactively registered their site with
Search Console
, we found that email communication led to 75% of webmasters re-securing their pages. When we didn’t know a webmaster’s email address, browser warnings and search warnings helped 54% and 43% of sites clean up respectively.
Providing tips on cleaning up harmful content:
Attackers rely on hidden files, easy-to-miss redirects, and remote inclusions to serve scams and malware. This makes clean-up increasingly tricky. When we emailed webmasters, we included tips and samples of exactly which pages contained harmful content. This, combined with expedited notification, helped webmasters clean up 62% faster compared to no tips—usually within 3 days.
Making sure sites stay clean:
Once a site is no longer serving harmful content, it’s important to make sure attackers don’t reassert control. We monitored recently cleaned websites and found 12% were compromised again in 30 days. This illustrates the challenge involved in identifying the root cause of a breach versus dealing with the side-effects.
Making security issues less painful for webmasters—and everyone
We hope that webmasters never have to deal with a security incident. If you are a webmaster, there are some quick steps you can take to reduce your risk. We’ve made it
easier to receive security notifications through Google Analytics
as well as through
Search Console
. Make sure to register for both services. Also, we have laid out helpful tips for
updating your site’s software
and
adding additional authentication
that will make your site safer.
If you’re a hosting provider or building a service that needs to notify victims of compromise, understand that the entire process is distressing for users. Establish a reliable communication channel before a security incident occurs, make sure to provide victims with clear recovery steps, and promptly reply to inquiries so the process feels helpful, not punitive.
As we work to make the web a safer place, we think it’s critical to empower webmasters and users to make good security decisions. It’s easy for the security community to be pessimistic about incident response being ‘too complex’ for victims, but as our findings demonstrate, even just starting a dialogue can significantly expedite recovery.
Lessons learned while protecting Gmail
Tuesday, March 29, 2016
Posted by Elie Bursztein - anti-abuse & security research, Nicolas Lidzborski - Gmail security engineering, and Vijay Eranti - Gmail anti-abuse engineering
Earlier this year in San Francisco,
USENIX
hosted their inaugural
Enigma Conference
, which focused on security, privacy and electronic crime through the lens of emerging threats and novel attacks. We were
excited to help make this conference happen
and to participate in it.
At the conference, we heard from a variety of terrific speakers including:
Ron Rivest
, Professor at MIT and inventor of RSA, who spoke about
the consequences of backdooring encryption
Rob Joyce
, Chief of the NSA Tailored Access Operations organization, who spoke about about
defending against state attackers
George “Geohot” Hotz
, Hacker extraordinaire, who discussed
state of the art software debugging
In addition, we were able to
share the lessons we’ve learned
about protecting Gmail users since it was launched over a decade ago. Those lessons are summarized in the infographic below (the talk slides are
also available
).
We were proud to sponsor this year's inaugural Enigma conference, and it is our hope that the core lessons that we have learned over the years can benefit other online products and services. We're looking forward to participating again next year when
Enigma returns in 2017
. We hope to see you there!
Why attend USENIX Enigma?
Monday, January 11, 2016
Parisa Tabriz, Security Princess & Enigma Program Co-Chair
Last August,
we announced USENIX Enigma
, a new conference intended to shine a light on great, thought-provoking research in security, privacy, and electronic crime. With Enigma beginning in just a few short weeks, I wanted to share a couple of the reasons I’m personally excited about this new conference.
Enigma aims to bridge the divide that exists between experts working in academia, industry, and public service, explicitly bringing researchers from different sectors together to share their work. Our speakers include those spearheading the defense of digital rights (
Electronic Frontier Foundation
,
Access Now
), practitioners at a number of well known industry leaders (
Akamai
,
Blackberry
,
Facebook
,
LinkedIn
,
Netflix
,
Twitter
), and researchers from multiple universities in the U.S. and abroad. With the diverse
session topics and organizations represented
, I expect interesting—and perhaps spirited—coffee break and lunchtime discussions among the equally diverse list of conference attendees.
Of course, I’m very proud to have some of my Google colleagues speaking at Enigma:
Adrienne Porter Felt will talk about blending research and engineering to solve usable security problems. You’ll hear how Chrome’s usable security team runs user studies and experiments to motivate engineering and design decisions. Adrienne will share the challenges they’ve faced when trying to adapt existing usable security research to practice, and give insight into how they’ve achieved successes.
Ben Hawkes will be speaking about
Project Zero
, a security research team dedicated to the mission of, “making
0day
hard.” Ben will talk about why Project Zero exists, and some of the recent trends and technologies that make vulnerability discovery and exploitation fundamentally harder.
Kostya Serebryany will be presenting a 3-pronged approach to securing C++ code based on his many years of experiencing wrangling complex, buggy software. Kostya will survey multiple dynamic sanitizing tools him and his team have made publicly available, review control-flow and data-flow guided fuzzing, and explain a method to harden your code in the presence of any bugs that remain.
Elie Bursztein will go through key lessons the Gmail team learned over the past 11 years while protecting users from spam, phishing, malware, and web attacks. Illustrated with concrete numbers and examples from one of the largest email systems on the planet, attendees will gain insight into specific techniques and approaches useful in fighting abuse and securing their online services.
In addition to raw content, my Program Co-Chair,
David Brumley
, and I have prioritized talk quality. Researchers dedicate months or years of their time to thinking about a problem and conducting the technical work of research, but a common criticism of technical conferences is that the actual presentation of that research seems like an afterthought. Rather than be a regurgitation of a research paper in slide format, a presentation is an opportunity for a researcher to explain the context and impact of their work in their own voice; a chance to inspire the audience to want to learn more or dig deeper. Taking inspiration from the
TED conference
, Enigma will have shorter presentations, and the program committee has worked with each speaker to help them craft the best version of their talk.
Hope to see some of you at
USENIX Enigma
later this month!
Say hello to the Enigma conference
Tuesday, August 18, 2015
Posted by Elie Bursztein - Anti-abuse team, Parisa Tabriz - Chrome Security and Niels Provos - Security team
USENIX Enigma
is a new conference focused on security, privacy and electronic crime through the lens of emerging threats and novel attacks. The goal of this conference is to help industry, academic, and public-sector practitioners better understand the threat landscape. Enigma will have a single track of 30-minute talks that are curated by a panel of experts, featuring strong technical content with practical applications to current and emerging threats.
Google is excited to both sponsor and help USENIX build Enigma, since we share many of its core principles: transparency, openness, and cutting-edge security research. Furthermore, we are proud to provide Enigma with with engineering and design support, as well as volunteer participation in program and steering committees.
The first instantiation of Enigma will be held January 25-27 in San Francisco. You can sign up for more information about the conference or propose a talk through the official conference site at
http://enigma.usenix.org
Call for Research Proposals to participate in the Open Web of Things Expedition
Friday, December 12, 2014
Posted Vint Cerf, Chief Internet Evangelist, Roy Want and Max Senges, Google Research
Imagine a world in which access to networked technology
defies the constraints
of desktops, laptops or smartphones. A future where we work seamlessly with
connected systems, services, devices and “things”
to support work practices, education, and daily interactions. While the Internet of Things (IoT) conjures a vision of “anytime, any place” connectivity for all things,
the realization is complex
given the need to work across interconnected and heterogeneous systems, and the special considerations needed for security, privacy, and safety.
Google is excited about the opportunities the IoT presents for future products and services. To further the development of open standards, facilitate ease of use, and ensure that privacy and security are fundamental values throughout the evolution of the field, we are in the process of establishing an open innovation and research program around the IoT. We plan to bring together a community of academics, Google experts and potentially other parties to pursue an open and shared mission in this area.
As a first step, we are announcing an
open call for research proposals
for the Open Web of Things:
Researchers interested in the Expedition Lead Grant should build a team of PIs and put forward a proposal outlining a draft research roadmap both for their team(s), as well as how they propose to integrate related research that is implemented outside their labs (e.g., Individual Project Grants).
For the Individual Project Grants we are seeking research proposals relating to the IoT in the following areas (1) user interface and application development, (2) privacy & security, and (3) systems & protocols research.
Importantly, we are open to new and unorthodox solutions in all three of these areas, for example, novel interactions, usable security models, and new approaches for open standards and evolution of protocols.
Additionally, to facilitate hands-on research supporting our mission driven research, we plan to provide participating faculty access to hardware, software and systems from Google. We look forward to your submission by January 21, 2015 and expect to select proposals early Spring. Selected PIs will be invited to participate in a kick-off workshop at Google shortly after.
Learning Statistics with Privacy, aided by the Flip of a Coin
Thursday, October 30, 2014
Posted by Úlfar Erlingsson, Tech Lead Manager, Security Research
(Cross-posted on the
Chromium Blog
and the
Google Online Security Blog
)
At Google, we are constantly trying to improve the techniques we use to
protect our users' security and privacy
. One such project, RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response), provides a new state-of-the-art, privacy-preserving way to learn software statistics that we can use to better safeguard our users’ security, find bugs, and improve the overall user experience.
Building on the concept of
randomized response
, RAPPOR enables learning statistics about the behavior of users’ software while guaranteeing client privacy. The guarantees of
differential privacy
, which are widely accepted as being the
strongest form of privacy
, have almost never been used in practice despite
intense research in academia
. RAPPOR introduces a practical method to achieve those guarantees.
To understand RAPPOR, consider the following example. Let’s say you wanted to count how many of your online friends were dogs, while respecting the maxim that,
on the Internet, nobody should know you’re a dog
. To do this, you could ask each friend to answer the question “Are you a dog?” in the following way. Each friend should flip a coin in secret, and answer the question truthfully if the coin came up heads; but, if the coin came up tails, that friend should always say “Yes” regardless. Then you could get a good estimate of the true count from the greater-than-half fraction of your friends that answered “Yes”. However, you still wouldn’t know which of your friends was a dog: each answer “Yes” would most likely be due to that friend’s coin flip coming up tails.
RAPPOR builds on the above concept, allowing software to send reports that are effectively indistinguishable from the results of random coin flips and are free of any unique identifiers. However, by aggregating the reports we can learn the common statistics that are shared by many users. We’re currently testing the use of RAPPOR in Chrome, to learn statistics about how
unwanted software
is
hijacking
users’ settings.
We believe that RAPPOR has the potential to be applied for a number of different purposes, so we're making it freely available for all to use. We'll continue development of RAPPOR as a standalone
open-source project
so that anybody can inspect and test its reporting and analysis mechanisms, and help develop the technology. We’ve written up the technical details of RAPPOR in a
report
that will be published next week at the
ACM Conference on Computer and Communications Security
.
We’re encouraged by the
feedback
we’ve received so far from academics and other stakeholders, and we’re looking forward to additional comments from the community. We hope that everybody interested in preserving user privacy will review the technology and share their feedback at
rappor-discuss@googlegroups.com
Excellent Papers for 2011
Thursday, March 22, 2012
Posted by Corinna Cortes and Alfred Spector, Google Research
UPDATE: Added
Theo Vassilakis
as an author for "Dremel: Interactive Analysis of Web-Scale Datasets"
Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our
publications
offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google.
In an effort to highlight some of our work, we periodically select a number of publications to be featured on this blog. We first posted a
set of papers
on this blog in mid-2010 and subsequently discussed them in more detail in the following blog postings. In a
second round
, we highlighted new noteworthy papers from the later half of 2010. This time we honor the influential papers authored or co-authored by Googlers covering all of 2011 -- covering roughly 10% of our total publications. It’s tough choosing, so we may have left out some important papers. So, do see the
publications list
to review the complete group.
In the coming weeks we will be offering a more in-depth look at these publications, but here are some summaries:
Audio processing
“
Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function
”,
Richard F. Lyon
,
Journal of the Acoustical Society of America
, vol. 130 (2011), pp. 3893-3904.
Lyon's long title summarizes a result that he has been working toward over many years of modeling sound processing in the inner ear. This nonlinear cochlear model is shown to be "good" with respect to psychophysical data on masking, physiological data on mechanical and neural response, and computational efficiency. These properties derive from the close connection between wave propagation and filter cascades. This filter-cascade model of the ear is used as an efficient sound processor for several machine hearing projects at Google.
Electronic Commerce and Algorithms
“
Online Vertex-Weighted Bipartite Matching and Single-bid Budgeted Allocations
”,
Gagan Aggarwal
,
Gagan Goel
,
Chinmay Karande
,
Aranyak Mehta
,
SODA 2011
.
The authors introduce an elegant and powerful algorithmic technique to the area of online ad allocation and matching: a hybrid of random perturbations and greedy choice to make decisions on the fly. Their technique sheds new light on classic matching algorithms, and can be used, for example, to pick one among a set of relevant ads, without knowing in advance the demand for ad slots on future web page views.
“
Milgram-routing in social networks
”,
Silvio Lattanzi
, Alessandro Panconesi, D. Sivakumar,
Proceedings of the 20th International Conference on World Wide Web, WWW 2011
, pp. 725-734.
Milgram’s "six-degrees-of-separation experiment" and the fascinating small world hypothesis that follows from it, have generated a lot of interesting research in recent years. In this landmark experiment, Milgram showed that people unknown to each other are often connected by surprisingly short chains of acquaintances. In the paper we prove theoretically and experimentally how a recent model of social networks, "Affiliation Networks", offers an explanation to this phenomena and inspires interesting technique for local routing within social networks.
“
Non-Price Equilibria in Markets of Discrete Goods
”, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Noam Nisan,
EC
, 2011.
We present a correspondence between markets of indivisible items, and a family of auction based n player games. We show that a market has a price based (Walrasian) equilibrium if and only if the corresponding game has a pure Nash equilibrium. We then turn to markets which do not have a Walrasian equilibrium (which is the interesting case), and study properties of the mixed Nash equilibria of the corresponding games.
HCI
“
From Basecamp to Summit: Scaling Field Research Across 9 Locations
”,
Jens Riegelsberger
, Audrey Yang, Konstantin Samoylov, Elizabeth Nunge, Molly Stevens, Patrick Larvie,
CHI 2011 Extended Abstracts
.
The paper reports on our experience with a basecamp research hub to coordinate logistics and ongoing real-time analysis with research teams in the field. We also reflect on the implications for the meaning of research in a corporate context, where much of the value may be less in a final report, but more in the curated impressions and memories our colleagues take away from the the research trip.
“
User-Defined Motion Gestures for Mobile Interaction
”, Jaime Ruiz,
Yang Li
, Edward Lank,
CHI 2011: ACM Conference on Human Factors in Computing Systems
, pp. 197-206.
Modern smartphones contain sophisticated sensors that can detect rich motion gestures — deliberate movements of the device by end-users to invoke commands. However, little is known about best-practices in motion gesture design for the mobile computing paradigm. We systematically studied the design space of motion gestures via a guessability study that elicits end-user motion gestures to invoke commands on a smartphone device. The study revealed consensus among our participants on parameters of movement and on mappings of motion gestures onto commands, by which we developed a taxonomy for motion gestures and compiled an end-user inspired motion gesture set. The work lays the foundation of motion gesture design—a new dimension for mobile interaction.
Information Retrieval
“
Reputation Systems for Open Collaboration
”, B.T. Adler, L. de Alfaro,
A. Kulshreshtha
, I. Pye,
Communications of the ACM
, vol. 54 No. 8 (2011), pp. 81-87.
This paper describes content based reputation algorithms, that rely on automated content analysis to derive user and content reputation, and their applications for Wikipedia and google Maps. The Wikipedia reputation system WikiTrust relies on a chronological analysis of user contributions to articles, metering positive or negative increments of reputation whenever new contributions are made. The Google Maps system Crowdsensus compares the information provided by users on map business listings and computes both a likely reconstruction of the correct listing and a reputation value for each user. Algorithmic-based user incentives ensure the trustworthiness of evaluations of Wikipedia entries and Google Maps business information.
Machine Learning and Data Mining
“
Domain adaptation in regression
”,
Corinna Cortes
,
Mehryar Mohri
,
Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011
.
Domain adaptation is one of the most important and challenging problems in machine learning. This paper presents a series of theoretical guarantees for domain adaptation in regression, gives an adaptation algorithm based on that theory that can be cast as a semi-definite programming problem, derives an efficient solution for that problem by using results from smooth optimization, shows that the solution can scale to relatively large data sets, and reports extensive empirical results demonstrating the benefits of this new adaptation algorithm.
“
On the necessity of irrelevant variables
”, David P. Helmbold,
Philip M. Long
,
ICML
, 2011
Relevant variables sometimes do much more good than irrelevant variables do harm, so that it is possible to learn a very accurate classifier using predominantly irrelevant variables. We show that this holds given an assumption that formalizes the intuitive idea that the variables are non-redundant. For problems like this it can be advantageous to add many additional variables, even if only a small fraction of them are relevant.
“
Online Learning in the Manifold of Low-Rank Matrices
”,
Gal Chechik
, Daphna Weinshall, Uri Shalit,
Neural Information Processing Systems (NIPS 23)
, 2011, pp. 2128-2136.
Learning measures of similarity from examples of similar and dissimilar pairs is a problem that is hard to scale. LORETA uses retractions, an operator from matrix optimization, to learn low-rank similarity matrices efficiently. This allows to learn similarities between objects like images or texts when represented using many more features than possible before.
Machine Translation
“
Training a Parser for Machine Translation Reordering
”, Jason Katz-Brown,
Slav Petrov
,
Ryan McDonald
,
Franz Och
, David Talbot, Hiroshi Ichikawa, Masakazu Seno,
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11)
.
Machine translation systems often need to understand the syntactic structure of a sentence to translate it correctly. Traditionally, syntactic parsers are evaluated as standalone systems against reference data created by linguists. Instead, we show how to train a parser to optimize reordering accuracy in a machine translation system, resulting in measurable improvements in translation quality over a more traditionally trained parser.
“
Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation
”, Ashish Venugopal,
Jakob Uszkoreit
, David Talbot,
Franz Och
, Juri Ganitkevitch,
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP)
.
We propose a general method to watermark and probabilistically identify the structured results of machine learning algorithms with an application in statistical machine translation. Our approach does not rely on controlling or even knowing the inputs to the algorithm and provides probabilistic guarantees on the ability to identify collections of results from one’s own algorithm, while being robust to limited editing operations.
“
Inducing Sentence Structure from Parallel Corpora for Reordering
”,
John DeNero
,
Jakob Uszkoreit
,
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP)
.
Automatically discovering the full range of linguistic rules that govern the correct use of language is an appealing goal, but extremely challenging. Our paper describes a targeted method for discovering only those aspects of linguistic syntax necessary to explain how two different languages differ in their word ordering. By focusing on word order, we demonstrate an effective and practical application of unsupervised grammar induction that improves a Japanese to English machine translation system.
Multimedia and Computer Vision
“
Kernelized Structural SVM Learning for Supervised Object Segmentation
”,
Luca Bertelli
,
Tianli Yu
, Diem Vu, Burak Gokturk,
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011
.
The paper proposes a principled way for computers to learn how to segment the foreground from the background of an image given a set of training examples. The technology is build upon a specially designed nonlinear segmentation kernel under the recently proposed structured SVM learning framework.
“
Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths
”,
Matthias Grundmann
,
Vivek Kwatra
, Irfan Essa,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011).
Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our stabilization technique automatically converts casual shaky footage into more pleasant and professional looking videos by mimicking these cinematographic principles. The original, shaky camera path is divided into a set of segments, each approximated by either constant, linear or parabolic motion, using an algorithm based on robust L1 optimization. The stabilizer has been part of the YouTube Editor (
youtube.com/editor
) since March 2011.
“
The Power of Comparative Reasoning
”,
Jay Yagnik
, Dennis Strelow,
David Ross
, Ruei-Sung Lin,
International Conference on Computer Vision
(2011).
The paper describes a theory derived vector space transform that converts vectors into sparse binary vectors such that Euclidean space operations on the sparse binary vectors imply rank space operations in the original vector space. The transform a) does not need any data-driven supervised/unsupervised learning b) can be computed from polynomial expansions of the input space in linear time (in the degree of the polynomial) and c) can be implemented in 10-lines of code. We show competitive results on similarity search and sparse coding (for classification) tasks.
NLP
“
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
”, Dipanjan Das,
Slav Petrov
,
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL '11)
, 2011,
Best Paper Award
.
We would like to have natural language processing systems for all languages, but obtaining labeled data for all languages and tasks is unrealistic and expensive. We present an approach which leverages existing resources in one language (for example English) to induce part-of-speech taggers for languages without any labeled training data. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in a hidden Markov model trained with the Expectation Maximization algorithm.
Networks
“
TCP Fast Open
”, Sivasankar Radhakrishnan,
Yuchung Cheng
,
Jerry Chu
,
Arvind Jain
, Barath Raghavan,
Proceedings of the 7th International Conference on emerging Networking EXperiments and Technologies (CoNEXT)
, 2011.
TCP Fast Open enables data exchange during TCP’s initial handshake. It decreases application network latency by one full round-trip time, a significant speedup for today's short Web transfers. Our experiments on popular websites show that Fast Open reduces the whole-page load time over 10% on average, and in some cases up to 40%.
“
Proportional Rate Reduction for TCP
”,
Nandita Dukkipati
, Matt Mathis,
Yuchung Cheng
, Monia Ghobadi,
Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement 2011, Berlin, Germany - November 2-4, 2011
.
Packet losses increase latency of Web transfers and negatively impact user experience. Proportional rate reduction (PRR) is designed to recover from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs during TCP’s fast recovery. Experiments on Google Web and YouTube servers in U.S. and India demonstrate that PRR reduces the TCP latency of connections experiencing losses by 3-10% depending on response size.
Security and Privacy
“
Automated Analysis of Security-Critical JavaScript APIs
”, Ankur Taly,
Úlfar Erlingsson
, John C. Mitchell,
Mark S. Miller
, Jasvir Nagra,
IEEE Symposium on Security & Privacy (SP)
, 2011.
As software is increasingly written in high-level, type-safe languages, attackers have fewer means to subvert system fundamentals, and attacks are more likely to exploit errors and vulnerabilities in application-level logic. This paper describes a generic, practical defense against such attacks, which can protect critical application resources even when those resources are partially exposed to attackers via software interfaces. In the context of carefully-crafted fragments of JavaScript, the paper applies formal methods and semantics to prove that these defenses can provide complete, non-circumventable mediation of resource access; the paper also shows how an implementation of the techniques can establish the properties of widely-used software, and find previously-unknown bugs.
“
App Isolation: Get the Security of Multiple Browsers with Just One
”, Eric Y. Chen, Jason Bau,
Charles Reis
, Adam Barth, Collin Jackson,
18th ACM Conference on Computer and Communications Security
, 2011.
We find that anecdotal advice to use a separate web browser for sites like your bank is indeed effective at defeating most cross-origin web attacks. We also prove that a single web browser can provide the same key properties, for sites that fit within the compatibility constraints.
Speech
“
Improving the speed of neural networks on CPUs
”,
Vincent Vanhoucke
,
Andrew Senior
, Mark Z. Mao,
Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.
As deep neural networks become state-of-the-art in real-time machine learning applications such as speech recognition, computational complexity is fast becoming a limiting factor in their adoption. We show how to best leverage modern CPU architectures to significantly speed-up their inference.
“
Bayesian Language Model Interpolation for Mobile Speech Input
”,
Cyril Allauzen
,
Michael Riley
,
Interspeech 2011.
Voice recognition on the Android platform must contend with many possible target domains - e.g. search, maps, SMS. For each of these, a domain-specific language model was built by linearly interpolating several n-gram LMs from a common set of Google corpora. The current work has found a way to efficiently compute a single n-gram language model with accuracy very close to the domain-specific LMs but with considerably less complexity at recognition time.
Statistics
“
Large-Scale Parallel Statistical Forecasting Computations in R
”,
Murray Stokely
, Farzan Rohani, Eric Tassone,
JSM Proceedings, Section on Physical and Engineering Sciences
, 2011.
This paper describes the implementation of a framework for utilizing distributed computational infrastructure from within the R interactive statistical computing environment, with applications to timeseries forecasting. This system is widely used by the statistical analyst community at Google for data analysis on very large data sets.
Structured Data
“
Dremel: Interactive Analysis of Web-Scale Datasets
”, Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton,
Theo Vassilakis
,
Communications of the ACM
, vol. 54 (2011), pp. 114-123.
Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. Besides continued growth internally to Google, Dremel now also backs an increasing number of external customers including BigQuery and UIs such as AdExchange front-end.
“
Representative Skylines using Threshold-based Preference Distributions
”,
Atish Das Sarma
, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, Jim Xu,
International Conference on Data Engineering (ICDE)
, 2011.
The paper adopts principled approach towards representative skylines and formalizes the problem of displaying k tuples such that the probability that a random user clicks on one of them is maximized. This requires mathematically modeling (a) the likelihood with which a user is interested in a tuple, as well as (b) how one negotiates the lack of knowledge of an explicit set of users. This work presents theoretical and experimental results showing that the suggested algorithm significantly outperforms previously suggested approaches.
“
Hyper-local, directions-based ranking of places
”, Petros Venetis, Hector Gonzalez,
Alon Y. Halevy
, Christian S. Jensen,
PVLDB
, vol. 4(5) (2011), pp. 290-30.
Click through information is one of the strongest signals we have for ranking web pages. We propose an equivalent signal for raking real world places: The number of times that people ask for precise directions to the address of the place. We show that this signal is competitive in quality with human reviews while being much cheaper to collect, we also show that the signal can be incorporated efficiently into a location search system.
Systems
“
Power Management of Online Data-Intensive Services
”, David Meisner, Christopher M. Sadler,
Luiz André Barroso
,
Wolf-Dietrich Weber
, Thomas F. Wenisch,
Proceedings of the 38th ACM International Symposium on Computer Architecture
, 2011.
Compute and data intensive Web services (such as Search) are a notoriously hard target for energy savings techniques. This article characterizes the statistical hardware activity behavior of servers running Web search and discusses the potential opportunities of existing and proposed energy savings techniques.
“
The Impact of Memory Subsystem Resource Sharing on Datacenter Applications
”, Lingjia Tang, Jason Mars, Neil Vachharajani,
Robert Hundt
, Mary-Lou Soffa,
ISCA
, 2011.
In this work, the authors expose key characteristics of an emerging class of Google-style workloads and show how to enhance system software to take advantage of these characteristics to improve efficiency in data centers. The authors find that across datacenter applications, there is both a sizable benefit and a potential degradation from improperly sharing micro-architectural resources on a single machine (such as on-chip caches and bandwidth to memory). The impact of co-locating threads from multiple applications with diverse memory behavior changes the optimal mapping of thread to cores for each application. By employing an adaptive thread-to-core mapper, the authors improved the performance of the datacenter applications by up to 22% over status quo thread-to-core mapping, achieving performance within 3% of optimal.
“
Language-Independent Sandboxing of Just-In-Time Compilation and Self-Modifying Code
”, Jason Ansel, Petr Marchenko,
Úlfar Erlingsson
, Elijah Taylor,
Brad Chen
, Derek Schuff, David Sehr,
Cliff L. Biffle
, Bennet S. Yee,
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
, 2011.
Since its introduction in the early 90's, Software Fault Isolation, or SFI, has been a static code technique, commonly perceived as incompatible with dynamic libraries, runtime code generation, and other dynamic code. This paper describes how to address this limitation and explains how the SFI techniques in Google Native Client were extended to support modern language implementations based on just-in-time code generation and runtime instrumentation. This work is already deployed in Google Chrome, benefitting millions of users, and was developed over a summer collaboration with three Ph.D. interns; it exemplifies how Research at Google is focused on rapidly bringing significant benefits to our users through groundbreaking technology and real-world products.
“
Thialfi: A Client Notification Service for Internet-Scale Applications
”, Atul Adya, Gregory Cooper,
Daniel Myers
,
Michael Piatek
,
Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP)
, 2011, pp. 129-142.
This paper describes a notification service that scales to hundreds of millions of users, provides sub-second latency in the common case, and guarantees delivery even in the presence of a wide variety of failures. The service has been deployed in several popular Google applications including Chrome, Google Plus, and Contacts.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gmail
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PhD Fellowship
PhotoScan
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2017
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.