Google Research Blog
The latest news from Research at Google
Machine Learning Meeting
Tuesday, May 20, 2008
is a branch of
in which, naturally enough, the aim is to get computers to learn: things like improving performance over time, and recognizing general tendencies among a number of specific cases. We have many ways to exploit Machine Learning programs, and a lot of data to give them. Machine Learning helps us to estimate what content users like most, what content is even legitimate, and how to match ads to content. It also plays key roles in products such as
Last week, Google held a Machine Learning Summit in
, gathering together engineers from around the world, including participants from Beijing, Zurich, Boston, Haifa, and Mountain View. The program included invited talks by
, and many shorter presentations by Googlers on topics including broadly applicable Machine Learning software, especially challenging applications, and some mathematical analysis.
Can You Publish at Google?
Tuesday, May 06, 2008
As part of the
at Google we try to describe what it is like to do
here. A common question I get is "How hard is it to publish at Google?" I want to dispel the myth that it is hard.
It is easy to publish
, easy to put code into
, easy to give
, etc. But it is also easy for great research to become great engineering, and that is an incredible lure. Great barriers of despair exist between research and development at many companies; researchers can find it hard to have impact beyond demos, papers, or patents.
Here at Google it is not uncommon for researchers to work on products (and is modus operandi for me); you can come up with something interesting,
to convince yourself and others that it is worthwhile, and then work as part of the team to build it and help create world-changing products. But are you willing to do that? That decision is the hard part, not publishing a paper.
I think from a Google standpoint, we need to make sure these barriers don't form, that making products, experimenting and having a venue for trying bold new approaches continues to be part of the
Thursday, May 01, 2008
Posted by Shumeet Baluja and Yushi Jing
At WWW-2008, in Beijing, China, we presented our paper "
PageRank for Product Image Search
". In this paper, we presented a system that used visual cues, instead of solely text information, to determine the rank of images. The idea was simple: find common visual themes in a set of images, and then find a small set of images that best represented those themes. The resulting algorithm wound up being PageRank, but on an entirely inferred graph of image similarities. Since the release of the paper, we've noticed lots of
in the press and have received quite a few questions. We thought we could answer a few of them here.
"Why did we choose to use products for our test case?"
First and foremost, product queries are popular in actual usage; addressing them is important. Second, users have strong expectations of what results we should return for these queries; therefore, this category provides an important set of examples that we need to address especially carefully. Third, on a pragmatic note, they lend themselves well to the type of "image features" that we selected in this study. Since the publication of the paper, we've also extended our results to other query types, including travel-related queries. One of the nice features of the approach is that (we hope) it will be easy to extend to new domains; as research in measuring image or object similarity continues, the advances can easily be incorporated into the similarity calculation to compute the underlying graph; the computations on the graph do not change.
"Where are we going from here?"
Besides broadening the sets of queries (and sets of features) for which we can use this approach, there are three directions we're exploring. First, estimating similarity measures for all of the images on the web is computationally expensive; approximations or alternative computations are needed. Second, we hope to evaluate our approach with respect to the large number of recently proposed alternative clustering methods. Third, many variations of PageRank can be used in quite interesting ways for image search. For example, we can use some of these previously published methods to reintroduce, in a meaningful manner, the textual information that the VisualRank algorithm removed. In the end, we have an approach that has an easy integration with both text and visual clues. Stay tuned for more on that in the coming months.
And now to answer the most commonly asked question,
"Is it live?"
Not yet. Currently, it is research in progress (
click here to help speed up the process
). In the meantime, though, if you'd like another sneak peek of our research on large graphs, this time in the context of YouTube datamining,
just follow the link
Finally, we want to extend our deepest thanks to the people who helped on this project, especially the image-search team; without their help, this research would not have been possible.
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Cloud Platform
Google Play Apps
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog