Google Research Blog
The latest news from Research at Google
Speed Matters
Tuesday, June 23, 2009
Posted by Jake Brutlag, Web Search Infrastructure
At Google, we've gathered hard data to reinforce our intuition that "speed matters" on the Internet. Google runs
experiments
on the search results page to understand and improve the search experience. Recently, we conducted some experiments to determine how users react when web search takes longer. We've always viewed speed as a competitive advantage, so this research is important to understand the trade-off between speed and other features we might introduce. We wanted to share this information with the public because we hope it will give others greater insight into how important speed can be.
Speed as perceived by the end user is driven by multiple factors, including how fast results are returned and how long it takes a browser to display the content. Our experiments injected server-side delay to model one of these factors: extending the processing time before and during the time that the results are transmitted to the browser. In other words, we purposefully slowed the delivery of search results to our users to see how they might respond.
All other things being equal, more usage, as measured by number of searches, reflects more satisfied users. Our experiments demonstrate that slowing down the search results page by 100 to 400 milliseconds has a measurable impact on the number of searches per user of -0.2% to -0.6% (averaged over four or six weeks depending on the experiment). That's 0.2% to 0.6% fewer searches for changes under half a second!
Furthermore, users do fewer and fewer searches the longer they are exposed to the experiment. Users exposed to a 200 ms delay since the beginning of the experiment did 0.22% fewer searches during the first three weeks, but 0.36% fewer searches during the second three weeks. Similarly, users exposed to a 400 ms delay since the beginning of the experiment did 0.44% fewer searches during the first three weeks, but 0.76% fewer searches during the second three weeks. Even if the page returns to the faster state, users who saw the longer delay take time to return to their previous usage level. Users exposed to the 400 ms delay for six weeks did 0.21% fewer searches on average during the five week period after we stopped injecting the delay.
While these numbers may seem small, a daily impact of 0.5% is of real consequence at the scale of Google web search, or indeed at the scale of most Internet sites. Because the cost of slower performance increases over time and persists, we encourage site designers to think twice about adding a feature that hurts performance if the benefit of the feature is unproven. To learn more on how to improve the performance of your website visit
code.google.com/speed
. For more details on our experiments, download this
PDF
.
A new landmark in computer vision
Monday, June 22, 2009
Posted by Jay Yagnik, Head of Computer Vision Research
[Cross-posted with the
Official Google Blog
]
Science fiction books and movies have long imagined that computers will someday be able to see and interpret the world. At Google, we think computer vision has tremendous potential benefits for consumers, which is why we're dedicated to research in this area. And today, a Google team is presenting a paper on landmark recognition (think: Statue of Liberty, Eiffel Tower) at the
Computer Vision and Pattern Recognition (CVPR) conference
in Miami, Florida. In the paper, we present a new technology that enables computers to quickly and efficiently identify images of more than 50,000 landmarks from all over the world with 80% accuracy.
To be clear up front, this is a research paper, not a new Google product, but we still think it's cool. For our demonstration, we begin with an unnamed, untagged picture of a landmark, enter its web address into the recognition engine, and
poof
— the computer identifies and names it: "Recognized Landmark: Acropolis, Athens, Greece." Thanks computer.
How did we do it? It wasn't easy. For starters, where do you find a good list of thousands of landmarks? Even if you have that list, where do you get the pictures to develop visual representations of the locations? And how do you pull that source material together in a coherent model that actually works, is fast, and can process an enormous corpus of data? Think about all the different photographs of the
Golden Gate Bridge
you've seen — the different perspectives, lighting conditions, and image qualities. Recognizing a landmark can be difficult for a human, let alone a computer.
Our research builds on the vast number of images on the web, the ability to search those images, and advances in object recognition and clustering techniques. First, we generated a list of landmarks relying on two sources: 40 million GPS-tagged photos (from
Picasa
and
Panoramio
) and online tour guide webpages. Next, we found candidate images for each landmark using these sources and
Google Image Search
, which we then "pruned" using efficient image matching and unsupervised clustering techniques. Finally, we developed a highly efficient indexing system for fast image recognition. The following image provides a visual representation of the resulting clustered recognition model:
In the above image, related views of the Acropolis are "clustered" together, allowing for a more efficient image matching system.
While we've gone a long way towards unlocking the information stored in text on the web, there's still much work to be done unlocking the information stored in pixels. This research demonstrates the feasibility of efficient computer vision techniques based on large, noisy datasets. We expect the insights we've gained will lay a useful foundation for future research in computer vision.
If you're interested to learn more about this research, check out the
paper
.
Large-scale graph computing at Google
Monday, June 15, 2009
Posted by Grzegorz Czajkowski, Systems Infrastructure Team
If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by
Web 2.0
, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. Mining the web has become an important branch of information technology, and at least one
major Internet company
has been founded upon this graph.
Despite differences in structure and origin, many graphs out there have two things in common: each of them keeps growing in size, and there is a seemingly endless number of facts and details people would like to know about each one. Take, for example, geographic locations. A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs. The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.
In order to achieve that, we have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology (experts in parallel processing will recognize that the
Bulk Synchronous Parallel Model
inspired Pregel).
Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing
PageRank
, for example, takes only about 15 lines of code. Developers of dozens of Pregel applications within Google have found that "thinking like a vertex," which is the essence of programming in Pregel, is intuitive.
We've been using Pregel internally for a while now, but we are beginning to share information about it outside of Google.
Greg Malewicz
will be speaking at the joint industrial track between
ACM PODC
and
ACM SPAA
this August on the very subject. In case you aren't able to join us there, here's a spoiler:
The seven bridges of Königsberg
— inspiration for
Leonhard Euler's
famous theorem that established the basics of graph theory — spanned the Pregel river.
Google Fusion Tables
Tuesday, June 09, 2009
Posted by Alon Halevy, Google Research and Rebecca Shapley, User Experience
Database systems are notorious for being hard to use. It is even more difficult to integrate data from multiple sources and collaborate on large data sets with people outside your organization. Without an easy way to offer all the collaborators access to the same server, data sets get copied, emailed and ftp'd--resulting in multiple versions that get out of sync very quickly.
Today we're introducing
Google Fusion Tables
on
Labs
, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we get feedback from users.
In the version we're launching today, you can upload tabular data sets (right now, we're supporting up to 100 MB per data set, 250 MB of data per user) and share them with your collaborators or with the world. You can choose to share all of your data with your collaborators, or keep parts of it hidden. You can even share different portions of your data with different collaborators.
When you edit the data in place, your collaborators always get the latest version. The attribution feature means your data will get credit for its contribution to any data set built with it. And yes, you can export your data back out of the cloud as CSV files.
Want to understand your data better? You can filter and aggregate the data, and you can visualize it on Google Maps or with other visualizations from the Google Visualization API. In this example, an intensity map of the world shows countries that won more than 10 gold medals in the Summer Olympics. You can then embed these visualizations in other properties on the Web (e.g., blogs and discussion groups) by simply pasting some HTML code we provide you.
The power of data is truly harnessed when you combine data from multiple sources. For example, consider combining data about access to fresh water in various countries with data about malaria rates in those countries, or as shown here, showing three sources of GDP data side by side. Fusion Tables enables you to fuse multiple sets of data when they are about the same entities. In database speak, we call this a join on a primary key but the data originates from multiple independent sources. This is just the start, more join capabilities will come soon.
But Fusion Tables doesn't require you and your collaborators to stop there. What if you don't agree on all of the values? Or need to understand the assumptions behind the data better? Fusion Tables enables you to discuss data at different granularity levels -- you can discuss individual rows or columns or even individual cells. If a collaborator with edit permission changes data during the discussion, viewers will see the change as part of the discussion trail.
We hope you find
Fusion Tables
useful. As usual with first releases, we realize there is much missing, and we look forward to hearing your
feedback
.
Remembering Rajeev Motwani
Monday, June 08, 2009
Posted by Alfred Spector, VP of Research
Many hundreds of us at Google were fortunate to have been educated, advised, and inspired by Professor Rajeev Motwani. Six of us were his PhD students and very many others (including our founders) were advised by or took courses from him. Others Googlers, who were not students at Stanford, had close collegial relations. But, no matter what the relationship, we respected Rajeev as a great man. He was not just a mathematically deep computer scientist, not just an entrepreneurial computer scientist who catalyzed value at the intersection of his work and the real world, he was also a thoughtful, caring, and honorable friend.
The words of just a few of us speak louder than any summary I can make:
Sergey Brin wrote in
his blog
, “Officially, Rajeev was not my advisor, and yet he played just as big a role in my research, education, and professional development. In addition to being a brilliant computer scientist, Rajeev was a very kind and amicable person and his door was always open. No matter what was going on with my life or work, I could always stop by his office for an interesting conversation and a friendly smile.”
Zoltan Gyongyi wrote, “Not only a great educator and one of the brightest researchers of his generation, Rajeev was also a catalyst of Silicon Valley innovation--Google itself standing as a proof. Moreover, he was a mentor, colleague, role model, friend to many Googlers. I am utterly unable to find words that would properly express my personal gratitude to him and the weight of this loss.”
Mayur Datar wrote, “I was fortunate to have Rajeev as my PhD advisor for five years at Stanford. Beyond graduation, he often helped me with priceless career guidance and professional help in terms of meetings with other people in Silicon Valley. There are only a handful of people I can think of who are such high caliber academics and entrepreneurs. His contributions and impact on CS theory community, Stanford CS Dept, and Silicon Valley enterprises and entrepreneurs is unfathomable. I still find it hard to come to terms with his horrible reality. My deepest condolences and prayers go out to his family. He will be fondly remembered and dearly missed by all of us!"
An Zhu wrote, “I am both fortunate and honored to have Rajeev as my PhD advisor. The 5 years at Stanford is very memorable to me. I’m eternally grateful for his advice and support throughout. It is indeed a sad day for many, including his students.”
Alon Halevy wrote, “Rajeev was an inspiration to me and my colleagues on so many levels. As a young graduate student, I remember him working on some of the toughest theoretical computer science problems of the day. Later, his taste for good theory and ability to apply it to practice had a huge impact on various aspects of data management research. As a professor, and now as a Googler, I am awed at the amazing stream of high-caliber students that he mentored. As an entrepreneur, he gave me some generous and well-timed advice. And most of all, as a person, his kindness and willingness to help anyone was a true inspiration.”
Vibhu Mittal wrote, “He was a brilliant researcher and a great professor. And yet the only thing that I can remember right now is that he was a fun, generous, helpful guy who was always willing to sit down and chat for a few minutes. I hope wherever he is, he is still doing it. And I hope there’ll be more people like him in this world to help people like us. I wish his family well — words cannot express what I feel for them.”
Gagan Aggarwal wrote, “I feel extremely fortunate to have had Rajeev as my PhD advisor. He was a wonderful advisor--always very flexible and willing to let his students work at their own pace, while making sure that things are going alright and providing guidance when needed. One of the several striking features of Rajeev's research was his ability to translate real life problems into clean, well-motivated, abstract questions (that he would promptly pose to his students). He was for me an eternal source of fresh problems and great ideas, a source I could tap into whenever my own ideas dried up (and was planning to, just last week). It is impossible to come to terms with the fact that I am never going to do this again. Rajeev had an unmatched clarity of thought and perceptiveness that was evident not only in doing research with him but also in the invaluable advice he gave me about career choices and life in general. ...Rajeev took on many diverse roles: teacher, entrepreneur, advisor and friend, and filled them all as only he could have. His passing will leave an impossible-to-fill void among all those whose lives he touched.”
There are more notes from Googlers, among those of many others, on the
Stanford blog commemorating Rajeev
.
I’d like to close by noting that Rajeev Motwani’s work on the intersection of theory and practice inspired not only the way Google processes information, but also Google's core scientific values: we fundamentally believe in the power of applying mathematical analysis and algorithmic thinking to challenging real world problems. This philosophy was inherent in Rajeev’s research, the education he gave PhD students, and the advice and classes he provided to many more.
With his and the recent untimely deaths of other influential computer scientists and friends, we are all reminded to seize each day and make the most of it. I think Rajeev would have wanted us to keep this in mind.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2018
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.