Google Research Blog
The latest news from Research at Google
Keeping fake listings off Google Maps
Thursday, April 06, 2017
Posted by Doug Grundman, Maps Anti-Abuse, and Kurt Thomas, Security & Anti-Abuse Research
(Crossposted on the
Google Security blog
)
Google My Business
enables millions of business owners to create listings and share information about their business on Google Maps and Search, making sure everything is up-to-date and accurate for their customers. Unfortunately, some actors attempt to abuse this service to register fake listings in order to defraud legitimate business owners, or to
charge exorbitant service fees for services
.
Over a year ago, we teamed up with the University of California, San Diego to research the actors behind fake listings, in order to improve our products and keep our users safe. The full report, “
Pinning Down Abuse on Google Maps
”, will be presented tomorrow at the 2017
International World Wide Web Conference
.
Our study shows that fewer than 0.5% of local searches lead to fake listings. We’ve also improved how we verify new businesses, which has reduced the number of fake listings by 70% from its all-time peak back in June 2015.
What is a fake listing?
For over a year, we tracked the bad actors behind fake listings. Unlike email-based scams
selling knock-off products online
, local listing scams require physical proximity to potential victims. This fundamentally changes both the scale and types of abuse possible.
Bad actors posing as locksmiths, plumbers, electricians, and other contractors were the most common source of abuse—roughly 2 out of 5 fake listings. The actors operating these fake listings would cycle through non-existent postal addresses and disposable VoIP phone numbers even as their listings were discovered and disabled. The purported addresses for these businesses were irrelevant as the contractors would travel directly to potential victims.
Another 1 in 10 fake listings belonged to real businesses that bad actors had improperly claimed ownership over, such as hotels and restaurants. While making a reservation or ordering a meal was indistinguishable from the real thing, behind the scenes, the bad actors would deceive the actual business into paying referral fees for organic interest.
How does Google My Business verify information?
Google My Business currently verifies the information provided by business owners before making it available to users. For freshly created listings, we physically mail a postcard to the new listings’ address to ensure the location really exists. For businesses changing owners, we make an automated call to the listing’s phone number to verify the change.
Unfortunately, our research showed that these processes can be abused to get fake listings on Google Maps. Fake contractors would request hundreds of postcard verifications to non-existent suites at a single address, such as 123 Main St #456 and 123 Main St #789, or to stores that provided PO boxes. Alternatively, a phishing attack could maliciously repurpose freshly verified business listings by tricking the legitimate owner into sharing verification information sent either by phone or postcard.
Keeping deceptive businesses out — by the numbers
Leveraging our study’s findings, we’ve made significant changes to how we verify addresses and are even
piloting an advanced verification process
for locksmiths and plumbers. Improvements we’ve made include prohibiting bulk registrations at most addresses, preventing businesses from relocating impossibly far from their original address without additional verification, and detecting and ignoring intentionally mangled text in address fields designed to confuse our algorithms. We have also adapted our anti-spam machine learning systems to detect data discrepancies common to fake or deceptive listings.
Combined, here’s how these defenses stack up:
We detect and disable 85% of fake listings before they even appear on Google Maps.
We’ve reduced the number of abusive listings by 70% from its peak back in June 2015.
We’ve also reduced the number of impressions to abusive listings by 70%.
As we’ve shown, verifying local information comes with a number of unique anti-abuse challenges. While fake listings may slip through our defenses from time to time, we are constantly improving our systems to better serve both users and business owners.
And the award goes to...
Wednesday, April 05, 2017
Posted by Evgeniy Gabrilovich, Senior Staff Research Scientist, Google Research, and WWW 2017 Technical Program Co-Chair
Today, Google's
Andrei Broder
,
Ravi Kumar
,
Prabhakar Raghavan
,
Sridhar Rajagopalan
, and
Andrew Tomkin
s, along with their coauthors, Farzin Maghoul, Raymie Stata, and Janet Wiener, have received the prestigious 2017 Seoul Test of Time Award for their classic paper “
Graph Structure in the Web
”. This award is given to the authors of a previous World Wide Web conference paper that has demonstrated significant scientific, technical, or social impact over the years. The first award, introduced in 2015,
was given to Google founders Larry Page and Sergey Brin
.
Originally presented in 2000 at the 9th WWW conference in Amsterdam, “
Graph Structure in the Web
” represents the seminal study of the structure of the World Wide Web. At the time of publication, it received the Best Paper Award from the WWW conference, and in the following 17 years proved to be highly influential, accumulating over
3,500 citations
.
The paper made two major contributions to the study of the structure of the Internet. First, it reported the results of a very large scale experiment to confirm that the
indegree
of Web nodes is distributed according to a power law. To wit, the probability that a node of the Web graph has
i
incoming links is roughly proportional to
1/i
2.1
. Second, in contrast to previous research that assumed the Web to be almost fully connected, “
Graph Structure in the Web
” described a much more elaborate structure of the Web, which since then has been depicted with the iconic “bowtie” shape:
Original “bowtie” schematic from “Graph Structure in the Web”
The authors presented a refined model of the Web graph, and described several characteristic classes of Web pages:
the
strongly connected core component
, where each page is reachable from any other page,
the so-called
IN
and
OUT clusters
, which only have unidirectional paths to or from the core,
tendrils
dangling from the two clusters, and tubes connecting the clusters while bypassing the core, and finally
disconnected components
, which are isolated from the rest of the graph.
Whereas the core component is fully connected and each node can be reached from any other node, Broder et al. discovered that as a whole the Web is much more loosely connected than previously believed, while the probability that any two given pages can be reached from one another is just under
1/4
.
Ravi Kumar, presenting the original paper in Amsterdam at WWW 2000
Curiously, the original study was done back in 1999 on two Altavista crawls having 200 million pages and 1.5 billion links. Today, Google indexes over 100 billion links merely within apps, and overall processes over 130 trillion web addresses in its web crawls.
Over the years, the power law was found to be characteristic of many other Web-related phenomena, including the structure of social networks and the distribution of search query frequencies. The description of the macroscopic structure of the Web graph proposed by Broder et al. provided a solid mathematical foundation for numerous subsequent studies on crawling and searching the Web, which profoundly influenced the architecture of modern search engines.
Hearty congratulations to all the authors on the well-deserved award!
Helping webmasters re-secure their sites
Monday, April 18, 2016
Posted by Kurt Thomas and Yuan Niu, Spam & Abuse Research
Every week, over
10 million users encounter harmful websites
that deliver malware and scams. Many of these sites are compromised personal blogs or small business pages that have fallen victim due to a weak password or outdated software. Safe Browsing and Google Search protect visitors from dangerous content by displaying browser warnings and labeling search results with
‘this site may harm your computer’
. While this helps keep users safe in the moment, the compromised site remains a problem that needs to be fixed.
Unfortunately, many webmasters for compromised sites are unaware anything is amiss. Worse yet, even when they learn of an incident, they may lack the security expertise to take action and address the root cause of compromise. Quoting one webmaster from a survey we conducted, “our daily and weekly backups were both infected” and even after seeking the help of a specialist, after “lots of wasted hours/days” the webmaster abandoned all attempts to restore the site and instead refocused his efforts on “rebuilding the site from scratch”.
In order to find the best way to help webmasters clean-up from compromise, we recently teamed up with the University of California, Berkeley to explore how to quickly contact webmasters and expedite recovery while minimizing the distress involved. We’ve summarized our key lessons below. The full study, which you can read
here
, was recently presented at the
International World Wide Web Conference
.
When Google works directly with webmasters during critical moments like security breaches, we can help 75% of webmasters re-secure their content. The whole process takes a median of 3 days. This is a better experience for webmasters and their audience.
How many sites get compromised?
Number of freshly compromised sites Google detects every week.
Over the last year Google detected nearly 800,000 compromised websites—roughly 16,500 new sites every week from around the globe. Visitors to these sites are exposed to low-quality scam content and malware via
drive-by downloads
. While browser and search warnings help protect visitors from harm, these warnings can at times feel punitive to webmasters who learn only after-the-fact that their site was compromised. To balance the safety of our users with the experience of webmasters, we set out to find the best approach to help webmasters recover from security breaches and ultimately reconnect websites with their audience.
Finding the most effective ways to aid webmasters
Getting in touch with webmasters:
One of the hardest steps on the road to recovery is first getting in contact with webmasters. We tried three notification channels: email, browser warnings, and search warnings. For webmasters who proactively registered their site with
Search Console
, we found that email communication led to 75% of webmasters re-securing their pages. When we didn’t know a webmaster’s email address, browser warnings and search warnings helped 54% and 43% of sites clean up respectively.
Providing tips on cleaning up harmful content:
Attackers rely on hidden files, easy-to-miss redirects, and remote inclusions to serve scams and malware. This makes clean-up increasingly tricky. When we emailed webmasters, we included tips and samples of exactly which pages contained harmful content. This, combined with expedited notification, helped webmasters clean up 62% faster compared to no tips—usually within 3 days.
Making sure sites stay clean:
Once a site is no longer serving harmful content, it’s important to make sure attackers don’t reassert control. We monitored recently cleaned websites and found 12% were compromised again in 30 days. This illustrates the challenge involved in identifying the root cause of a breach versus dealing with the side-effects.
Making security issues less painful for webmasters—and everyone
We hope that webmasters never have to deal with a security incident. If you are a webmaster, there are some quick steps you can take to reduce your risk. We’ve made it
easier to receive security notifications through Google Analytics
as well as through
Search Console
. Make sure to register for both services. Also, we have laid out helpful tips for
updating your site’s software
and
adding additional authentication
that will make your site safer.
If you’re a hosting provider or building a service that needs to notify victims of compromise, understand that the entire process is distressing for users. Establish a reliable communication channel before a security incident occurs, make sure to provide victims with clear recovery steps, and promptly reply to inquiries so the process feels helpful, not punitive.
As we work to make the web a safer place, we think it’s critical to empower webmasters and users to make good security decisions. It’s easy for the security community to be pessimistic about incident response being ‘too complex’ for victims, but as our findings demonstrate, even just starting a dialogue can significantly expedite recovery.
Sergey and Larry awarded the Seoul Test-of-Time Award from WWW 2015
Friday, May 22, 2015
Posted by Andrei Broder, Google Distinguished Scientist
Today, at the
24th International World Wide Web Conference
(WWW) in Florence, Italy, our company founders, Sergey Brin and Larry Page, received the inaugural
Seoul Test-of-Time Award
for their 1998 paper “
The Anatomy of a Large-Scale Hypertextual Web Search Engine
”, which introduced Google to the world at the
7th WWW conference in Brisbane, Australia
. I had the pleasure and honor to accept the award on behalf of Larry and Sergey from
Professor Chin-Wan Chung
, who led the committee that created the award.
Except for the fact that I was myself in Brisbane, it is hard to believe that Google began just as a two-student research project at Stanford University 17 years ago with the goal to “produce much more satisfying search results than existing systems.” Their paper presented two breakthrough concepts: first, using a distributed system built on inexpensive commodity hardware to deal with the size of the index, and second, using the hyperlink structure of the Web as a powerful new relevance signal. By now these ideas are common wisdom, but their paper continues to be very influential: it has over
13,000 citations
so far and more are added every day.
Since those beginnings Google has continued to grow, with tools that enable
small business owners to reach customers
,
help long lost friends to reunite
, and
empower users to discover answers
. We keep pursuing new ideas and products, generating discoveries that both affect the world and advance the state-of-the-art in Computer Science and related disciplines. From products like
Gmail
,
Google Maps
and
Google Earth Engine
to advances in
Machine Intelligence
,
Computer Vision
, and
Natural Language Understanding
, it is our continuing goal to create useful tools and services that benefit our users.
Larry and Sergey sent a video message to the conference expressing their thanks and their encouragement for future research, in which Sergey said “There is still a ton of work left to do in Search, and on the Web as a whole and I couldn’t think of a more exciting time to be working in this space.” I certainly share this view, and was very gratified by the number of young computer scientists from all over the world that came by the Google booth at the conference to share their thoughts about the future of search, and to explore the possibility of joining our efforts.
Google, the World Wide Web and WWW conference: years of progress, prosperity and innovation
Monday, May 07, 2012
Posted by Prabhakar Raghavan, Vice President of Engineering
More than forty members of Google’s technical staff gathered in Lyon, France in April to participate in the global dialogue around the state of the web at the
World Wide Web conference
(WWW) 2012. A decade ago, Larry Page and Sergey Brin applied their research to an information retrieval problem and their work—presented at WWW in 1998—led to the invention of today’s most popular search engine.
As I've watched the WWW conference series evolve over the years, a couple of larger trends struck me in this year's edition. First, there seems to be more of a Mobile Web presence in the technical program, relative to recent years. The refereed program included several interesting Mobile papers, including the Best Student Paper Awardee from Stanford University researchers:
Who Killed My Battery: Analyzing Mobile Browser Energy Consumption
,
Narendran Thiagarajan, Gaurav Aggarwal, Angela Nicoara, Dan Boneh, Jatinder Singh.
Second, one gets the sense that the WWW community is moving from the classic "bag of words" view of web pages, to an entity-centric view. There were a number of papers on identifying and using entities in Web pages. While I'm loathe to view this as a vindication of "the Semantic Web" (mainly because this has become an overloaded phrase that people elect to interpret as suits them), the technical capability to get at entities is clearly here. The question is -- what is the killer application? Finally, it’s nice to see that recommendation systems are becoming a major topic of focus at WWW. This paper was a personal favorite:
Build Your Own Music Recommender by Modeling Internet Radio Streams
,
Natalie Aizenberg, Yehuda Koren, Oren Somekh.
In keeping with tradition, Google was a major supporter, sponsoring the conference, the Best Paper Award (
Counting beyond a Yottabyte, or how SPARQL 1.1 Property Paths will prevent adoption of the standard
,
Marcelo Arenas, Sebastián Conca
and
Jorge Pérez
) and four PhD student travel grants. We chatted with hundreds of attendees who hung out with us at the Google booth to chat and see demos about the latest Google product and research developments (
see full schedule of booth talks
).
Googlers were also active member of the vibrant research community at WWW:
David Assouline delivered the keynote for the Demo Track -- to a standing-room-only crowd -- on the
Google Art Project
, which uses a combination of various Google technologies and expert information provided by our museum partners to create a unique online art experience. Googler Alon Halevy served as a program committee member. Googlers were also co-authors of the following papers:
Risk-Aware Revenue Maximization in Display Advertising
by Ana Radovanovic and William Heavlin (Googlers)
SessionJuggler: Secure Web Login From an Untrusted Terminal Using Session Hijacking
by Elie Bursztein (Googler), Chinmay Soman, Dan Boneh and John Mitchell
Spotting Fake Reviewer Groups in Consumer Reviews
by Arjun Murkherjee, Bing Liu, and Natalie Glance (Googler)
Your Two Weeks of Fame and Your Grandmother’s
by James Cook, Atish Das Sarma, Alexander Fabrikant and Andrew Tomkins (Googlers)
YouTube Around the World: Geographic Popularity of Videos
by Mirjam Wattenhofer (Googler), Anders Brodersen (Googler), and Salvatore Scellato
Who Killed My Battery: Analyzing Mobile Browser Energy Consumption
by Narendran Thiagarajan, Gaurav Aggarwal (Googler), Angela Nicoara, Dan Boneh and Jatinder Singh
A Multimodal Search Engine based on Rich Unified Content Description
by Thomas Steiner (Googler), Lorenzo Sutton, Sabine Spiller, Marilena Lazzaro, Francesco Saverio Nucci, Vincenzo Croce, Alberto Massari, Antonio Camurri, Anne Verroust-Blondet, Laurent Joyeux
Enabling on-the-fly Video Shot Detection on YouTube
by Thomas Steiner (Googler), Ruben Verborgh, Joaquim Gabarro, Michael Hausenblas, Raphael Troncy and Rik Van De Walle
Fixing the Web one page at a time, or actually implementing xkcd #37
by Thomas Steiner (Googler), Ruben Verborgh, and Rik Van de Valle
Googlers co-organized three workshops:
Appification of the Web
by Ed Chi (Googler), Brian Davison, and Evgeniy Gabrilovich (Googler)
Extracting Unambiguous Keywords from Microposts Using Web and Query Logs Data
, as part of the
Making Sense of Microsposts
workshop by Davi Reis, Felipe Portavales Goldstein, and Fred Quintao (Googlers)
Human Computation Must Be Reproducible
, as part of the
CrowdSearch: Crowdsourcing Web search
workshop by Praveen Paritosh (Googler)
WebQuality 2012: The Anti-Social Web
by Zoltan Gyongyi (Googler), Carlos Castillo, Adam Jatowt, and Katsumi Tanaka
Additionally, a Googler led a tutorial:
The Role of Human-Generated and Automatically-Extracted Lexico-Semantic Resources in Web Search
by Marius Pasca (Googler)
Googlers presented a poster:
Google Image Swirl
by Yushi Jing, Henry Rowley, Jingbin Wang, David Tsai, Chuck Rosenberg, Michele Covell (Googlers)
At the conference, we also paid homage to the founding of the World Wide Web and the strong community and enterprise it’s created since the 1990s, seen in the Euronews report:
Web inventor Tim Berners-Lee on imagining worlds
. Through our products and support of WWW in 2013, we look forward to continuing to nurture the world wide web’s open ecosystem of knowledge, innovation and progress.
Add Research at Google to your circles on G+
to learn more about our academic conference involvement, view pictures from events, and hear about upcoming programming and presence at conferences
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gmail
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PhD Fellowship
PhotoScan
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2017
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.