Google Research Blog
The latest news from Research at Google
Cloud Computing and the Internet
Tuesday, April 28, 2009
Posted by Vinton Cerf, Chief Internet Evangelist
[adapted from the speech given on the occasion of the honoris causa ceremony
at the Universidad Politecnico de Madrid]
The Internet is largely a software artifact and a layered one as my distinguished colleague, Sir Tim Berners-Lee has observed on many occasions. The layering has permitted a remarkable versatility in the implementation of the Internet and its applications. New technology can be used to implement each layer and as long as the interfaces between the layers remain static, the changes do not affect the functionality of the system. In this way, the Internet has evolved and adapted new transmission and switching technology into its lower layers and has supported new upper layers such as the HTTP, HTML and SSL protocols of the World Wide Web.
In recent years, the term “cloud computing” has emerged to make reference to the idea that from the standpoint of a device, say a laptop, on the Internet, many of the applications appear to be operating somewhere in the network “cloud.” Google, Amazon, Microsoft and others, as well as enterprise operators, are constructing these cloud computing centers. Generally, each cloud knows only about itself and is unaware of the existence of other cloud computing facilities. In some ways, cloud computing is like the networks of the 1960s when my colleagues and I began to think about connecting computers together on networks. Each network was typically proprietary. IBM had Systems Network Architecture; Digital Equipment Corporation had its DECNET; Hewlett-Packard had its Distributed System. These networks were specific to each manufacturer and did not interconnect nor even have a way to express the idea of connecting to another network. The Internet was the solution that Robert Kahn and I developed to allow all such networks to be interconnected in a uniform way.
Cloud computing is at the same stage. Each cloud is a system unto itself. There is no way to express the idea of exchanging information between distinct computing clouds because there is no way to express the idea of “another cloud.” Nor is there any way to describe the information that is to be exchanged. Moreover, if the information contained in one computing cloud is protected from access by any but authorized users, there is no way to express how that protection is provided and how information about it should be propagated to another cloud when the data is transferred.
Interestingly, my colleague, Sir Tim Berners-Lee, has been pursuing ideas that may inform the so-called “inter-cloud” problem. His idea of data linking may prove to be a part of the vocabulary needed to interconnect computing clouds. The semantics of data and of the actions one can take on the data, and the vocabulary in which these actions are expressed appear to me to constitute the beginning of an inter-cloud computing language. This seems to me to be an extremely open field in which creative minds everywhere can be free to contribute ideas and to experiment with new concepts. It is a new layer in the Internet architecture and, like the many layers that have been invented before, it is an open opportunity to add functionality to an increasingly global network.
There are many unanswered questions that can be posed about this new problem. How should one reference another cloud system? What functions can one ask another cloud system to perform? How can one move data from one cloud to another? Can one request that two or more cloud systems carry out a series of transactions? If a laptop is interacting with multiple clouds, does the laptop become a sort of “cloudlet”? Could the laptop become an unintended channel of information exchange between two clouds? If we implement an inter-cloud system of computing, what abuses may arise? How will information be protected within a cloud and when transferred between clouds. How will we refer to the identity of authorized users of cloud systems? What strong authentication methods will be adequate to implement data access controls?
Because the Internet is primarily a software artifact, there seems to be no end to its possibilities. It is an endless frontier, open to exploration by virtually anyone. I cannot guess what will be discovered in these explorations but I am sure that we will continue to be surprised by the richness of the Internet’s undiscovered territory in the decades ahead.
The Continuing Metamorphosis of the Web
Monday, April 27, 2009
Posted by Alfred Spector, VP Research and Special Initiatives
I just returned from giving a talk at the
18th World Wide Web Conference in Madrid
and was pleased to see a healthy and dynamic conference despite difficult economic conditions. Madrid had beautiful spring weather, and a magnificent modern architecture abounds throughout the city. I will say, though, that the Madrid subway does not vibrate (shake, rattle, and roll) one’s soul quite as much as does our local NYC subway.
My talk was entitled
The Continuing Metamorphosis of the Web
. In it, I noted that the initial web standards were so simple and sensible that they engendered a path of stepwise innovations, which taken together have aggregated into amazing accomplishments. Metaphorically, I feel our community has been on a kind of pseudo-random walk that has taken us to remarkable places. The truly great results have included the creation of a virtual Library of Alexandria, the creation of the search engine (to be that library’s super-card catalog), the empowerment of the long tail (in diverse communities), and great innovations to doing business. I argued that the bottom up evolution is continuing (perhaps even accelerating) today, and that the current stepwise improvements are still leading to broad innovations, which we will come to view as extraordinary as any that have occurred to-date.
Here are three great achievements currently a-brewing:
“Totally Transparent Processing.” By this, I argued that our use of the web (whether for search, communication, or information access) can increasingly occur in a fluid manner that is independent of the device we are using, independent of the human language we prefer, independent of the modality of the data, and independent of the corpus of information on which our interaction is based. In effect, processing can be transparent
∀d∈D, ∀l∈L, ∀m∈M, ∀c∈C
. Our barriers to using information technology are fading away and becoming transparent.
“Ideal Distributed Computing.” While we have known the fundamentals of distributed computing for many decades, only today are we reaching a state where we can achieve a powerful and efficient balance of computation between all end-user devices and a vast collection of shared storage and computational resources. Cloud computing is today’s term d’arte, but I talked more generally about systems with the flexibility that computation and data can move across computers within a cluster, across clusters of computers and—of course—between clusters and all other (say, end user) devices. The result is the efficient, even awesome, capability to provide communication, computation and data to a vast collection of people and applications.
“Hybrid, Not Artificial, Intelligence.” Systems are regularly augmenting the capability of all of us in day-to-day life, and our collective use of those systems is, in turn, augmenting the capabilities of those systems in a beneficial virtuous circle. The virtuous circle is operating already in the search engine, voice recognition systems, recommendation systems, and more. There is every reason to think the effect will become ever more potent as computers are applied to more domains and and used by larger populations. The result may not be artificially intelligent machines that pass the Turing Test, but instead systems that will be ever more capable of helping us achieve our goals in life -- in a kind of partnership. For a related take on this, you might look at a
Google Official Blog
The Intelligent Cloud
and I posted last Fall.
More explanation and many examples, based on Google research and services, are available in the
I used with my talk. A PDF file of those slides is available on the
under the papers and presentations link.
Congratulations to NSF CLuE Grant awardees
Thursday, April 23, 2009
Posted by Jeff Walz and Andrea Held
The first goal of the
Academic Cluster Computing Initiative
was to familiarize the academic community with the methods necessary to run very large datasets on massive distributed computer networks. By
expanding that program
to include research grants through the National Science Foundation's Cluster Exploratory (CLuE) program, we're also hoping to enable new and better approaches to data-intensive research across a range of disciplines.
Now that the NSF has
announced the 2009 CLuE grants
in addition to some previous
Small Grant for Exploratory Research
(SGER) grants, we're excited to congratulate the recipient researchers and wish them the best as they bring new projects online and continue to run existing SGER projects on the Google/IBM cluster.
The NSF selected projects based on their potential to advance computer science as well as to benefit society as a whole, and researchers at 14 institutions are tackling ambitious problems in everything from computer science to bioinformatics. The institutions receiving CLuE grants are Purdue, UC Santa Barbara, University of Washington, University of Massachussetts-Amherst, UC San Diego, University of Virginia, Yale, MIT, University of Wisconsin-Madison, Carnegie Mellon, University of Maryland- College Park, University of Utah and UC Irvine. Florida International University, Carnegie Mellon and University of Maryland will continue other projects with exiting SGER grants. These grantees will run their projects on a Google/IBM-provided cluster running an open source implementation of Google's
We're excited to help foster new approaches to difficult, data-intensive problems across a range of fields, and we can't wait to see more students and researchers come up with creative applications for massive, highly distributed computing.
Socially Adjusted CAPTCHAs
Thursday, April 16, 2009
Posted by Rich Gossweiler, Maryam Kamvar, Shumeet Baluja
Unfortunately, there is a war going on between humans and 'bots. Software
'bots are attempting to generate massive numbers of computer accounts
which are then sold in bulk to spammers. Spammers use these accounts to
inundate emails and discussion boards. Meanwhile humans are trying to
simply create an account and don't want to spend a lot of time proving
that they are not a program.
Typically we use CAPTCHAs -- we present an image of some distorted text
and then ask the applicant to type in the letters. As image processing gets
more sophisticated, these letter sequences tend to get longer and more
distorted, sometimes to the point where humans fail too.
So we switched the game. We show an image, say an airplane, but it
is randomly rotated and we ask the applicant to rotate it to "up." This
is generally hard for computers but easy for people. Well, for the most
Since computers are good at faces, skies, text, etc. we sift
through our database of images running state-of-the-art up detectors to
remove those images. But of the images that remain, some are too hard
for people to figure out. What is up for a plate or a piece of
So here is where it gets interesting. We show people several images, one
of which is a "candidate" and we see how people do. If everyone rotates
it the same way, it is a keeper. If there is a lot of variation, we
discard it. As extra credit it turns out that even if the original image were
taken at an angle, it does not matter, since people, in large numbers,
socially adjust the CAPTCHA.
Read the full paper
(posted with the permission of WWW'09).
The Grill: Google's Alfred Spector on the hot seat
Wednesday, April 15, 2009
Posted by Ben Bayer, Google Research
, Google's VP of Research, tells COMPUTERWORLD the ins and outs of Research at Google and where it's headed for the future. Read the complete interview
Predicting the Present with Google Trends
Thursday, April 02, 2009
Posted by Hal Varian, Chief Economist and Hyunyoung Choi, Decision Support Engineering Analyst
Can Google queries help predict economic activity?
The answer depends on what you mean by "predict."
Google Insights for Search
provide a real time report on query volume, while economic data is typically released several days after the close of the month. Given this time lag, it is not implausible that Google queries in a category like "Automotive/Vehicle Shopping" during the first few weeks of March may help predict what actual March automotive sales will be like when the official data is released halfway through April.
That famous economist Yogi Berra once said "It's tough to make predictions, especially about the future." This inspired our approach: let us lower the bar and just try to predict the present.
Our work to date is summarized in a paper called
Predicting the Present with Google Trends
. We find that Google Trends data
help improve forecasts of the current level of activity for a number of different economic time series, including
Even predicting the present is useful, since it may help identify "turning points" in economic time series. If people start doing significantly more searches for "Real Estate Agents" in a certain location, it is tempting to think that house sales might increase in that area in the near future.
Our paper outlines one approach to short-term economic prediction, but we expect that there are several other interesting ideas out there. So we suggest that forecasting wannabes download some Google Trends data and try to relate it to other economic time series. If you find an interesting pattern, post your findings on a website and send a link to firstname.lastname@example.org. We'll report on the most interesting results in a later blog post.
It has been said that if you put a million monkeys in front of a million computers, you would eventually produce an accurate economic forecast. Let's see how well that theory works ...
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Cloud Platform
Google Play Apps
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog