Google Research Blog
The latest news from Research at Google
Tone: An experimental Chrome extension for instant sharing over audio
Tuesday, May 19, 2015
Posted by Alex Kauffmann, Interaction Researcher, and Boris Smus, Software Engineer
Sometimes in the course of exploring new ideas, we'll stumble upon a technology application that gets us excited.
Tone
is a perfect example: it's a
Chrome
extension that broadcasts the URL of the current tab to any machine within earshot that also has the extension installed. Tone is an experiment that we’ve enjoyed and found useful, and we think you may as well.
As digital devices have multiplied, so has the complexity of coordinating them and moving stuff between them. Tone grew out of the idea that while digital communication methods like email and chat have made it infinitely easier, cheaper, and faster to share things with people across the globe, they've actually made it more complicated to share things with the people standing right next to you. Tone aims to make sharing digital things with nearby people as easy as talking to them.
The first version was built in an afternoon for fun (which resulted in numerous
rickrolls
), but we increasingly found ourselves using it to share documents with everyone in a meeting quickly, to exchange design files back and forth while collaborating on UI design, and to contribute relevant links without interrupting conversations.
Tone provides an easy-to-understand broadcast mechanism that behaves like the human voice—it doesn't pass through walls like radio or require pairing or addressing. The initial prototype used an efficient audio transmission scheme that sounded terrible, so we played it beyond the range of human hearing. However, because many laptop microphones and nearly all video conferencing systems are optimized for voice, it improved reliability considerably to also include a minimal
DTMF
-based audible codec. The combination is reliable for short distances in the majority of audio environments even at low volumes, and it even works over Hangouts.
Because it's audio based, Tone behaves like speech in interesting ways. The orientation of laptops relative to each other, the acoustic characteristics of the space, the particular speaker volume and mic sensitivity, and even where you're standing will all affect Tone's reliability. Not every nearby machine will always receive every broadcast, just like not everyone will always hear every word someone says. But resending is painless and debugging generally just requires raising the volume. Many groups at Google have found that the tradeoffs between ease and reliability worthwhile—it is our hope that small teams, students in classrooms, and families with multiple computers will too.
To get started, first install the
Tone extension for Chrome
. Then simply open a tab with the URL you want to share, make sure your volume is on, and press the Tone button. Your machine will then emit a short sequence of beeps. Nearby machines receive a clickable notification that will open the same tab. Getting everyone on the same page has never been so easy!
Sawasdeee ka Voice Search
Wednesday, April 02, 2014
Posted by Keith Hall and Richard Sproat, Staff Research Scientists, Speech
Typing on mobile devices can be difficult, especially when you're on the go. Google Voice Search gives you a fast, easy, and natural way to search by speaking your queries instead of typing them. In Thailand, Voice Search has been one of the most requested services, so we’re excited to now offer users there the ability to speak queries in Thai, adding to over 75 languages and accents in which you can talk to Google.
To power Voice Search, we teach computers to understand the sounds and words that build spoken language. We trained our speech recognizer to understand Thai by collecting speech samples from hundreds of volunteers in Bangkok, which enabled us to build this recognizer in just a fraction of the time it took to build other models. Our helpers are asked to read popular queries in their native tongue, in a variety of acoustic conditions such as in restaurants, out on busy streets, and inside cars.
Each new language for voice recognition often requires our research team to tackle new challenges, including Thai.
Segmentation is a major challenge in Thai, as the Thai script has no spaces between words, so it is harder to know when a word begins and ends. Therefore, we created a Thai segmenter to help our system recognize words better. For example: ตากลม can be segmented to ตาก ลม or ตา กลม. We collected a large corpus of text and asked Thai speakers to manually annotate plausible segmentations. We then trained a sequence segmenter on this data allowing it to generalize beyond the annotated data.
Numbers are an important part of any language: the string “87” appears on a web page and we need to know how people would say that. As with over 40 other languages, we included a number grammar for Thai, that tells you that “87” would be read as แปดสิบเจ็ด.
Thai users often mix English words with Thai, such as brand or artist names, in both spoken and written Thai which adds complexity to our acoustic models, lexicon models, and segmentation models. We addressed this by introducing ‘code switching’, which allows Voice Search to recognize when different languages are being spoken interchangeably and adjust phonetic transliteration accordingly.
Many Thai users frequently leave out accents and tone markers when they search (eg โน๊ตบุก instead of โน้ตบุ๊ก OR หมูหยอง instead of หมูหย็อง) so we had to create a special algorithm to ensure accents and tones were restored in search results provided and our Thai users would see properly formatted text in the majority of cases.
We’re particularly excited that Voice Search can help people find locally relevant information, ranging from travel directions to the nearest restaurant, without having to type long phrases in Thai.
Voice Search is available for Android devices running Jelly Bean and above. It will be available for older Android releases and iOS users soon.
Under the hood of Croatian, Filipino, Ukrainian, and Vietnamese in Google Voice Search
Thursday, July 25, 2013
Posted by Eugene Weinstein and Pedro Moreno, Google Speech Team
Although we’ve been working on speech recognition for several years, every new language requires our engineers and scientists to tackle unique challenges. Our most recent additions - Croatian, Filipino, Ukrainian, and Vietnamese - required creative solutions to reflect how each language is used across devices and in everyday conversations.
For example, since Vietnamese is a
tonal language
, we had to explore how to take tones into consideration. One simple technique is to model the tone and vowel combinations (
tonemes
) directly in our lexicons. This, however, has the side effect of a larger phonetic inventory. As a result we had to come up with special algorithms to handle the increased complexity. Additionally, Vietnamese is a heavily diacritized language, with tone markers on a majority of syllables. Since Google Search is very good at returning valid results even when diacritics are omitted, our Vietnamese users frequently omit the diacritics when typing their queries. This creates difficulties for the speech recognizer, which selects its vocabulary from typed queries. For this purpose, we created a special diacritic restoration algorithm which enables us to present properly formatted text to our users in the majority of cases.
Filipino also presented interesting challenges. Much like in other multilingual societies such as Hong Kong, India, South Africa, etc., Filipinos often mix several languages in their daily life. This is called
code switching
. Code switching complicates the design of pronunciation, language, and acoustic models. Speech scientists are effectively faced with a dilemma: should we build one system per language, or should we combine all languages into one?
In such situations we prefer to model the reality of daily language use in our speech recognizer design. If users mix several languages, our recognizers should do their best in modeling this behavior. Hence our Filipino voice search system, while mainly focused on the Filipino language, also allows users to mix in English terms.
The algorithms we’re using to model how speech sounds are spoken in each language make use of our distributed large-scale
neural network
learning infrastructure (yes, the same one that spontaneously
discovered cats
on YouTube!). By partitioning the gigantic parameter set of the model, and by evaluating each partition on a separate computation server, we’re able to achieve unprecedented levels of parallelism in training acoustic models.
The more people use Google speech recognition products, the more accurate the technology becomes. These new neural network technologies will help us bring you lots of improvements and many more languages in the future.
Large Scale Language Modeling in Automatic Speech Recognition
Wednesday, October 31, 2012
Posted by Ciprian Chelba, Research Scientist
At Google, we’re able to use the large amounts of data made available by the Web’s fast growth. Two such data sources are the anonymized queries on google.com and the web itself. They help improve automatic speech recognition through large language models: Voice Search makes use of the former, whereas YouTube speech transcription benefits significantly from the latter.
The language model is the component of a speech recognizer that assigns a probability to the next word in a sentence given the previous ones. As an example, if the previous words are “new york”, the model would assign a higher probability to “pizza” than say “granola”. The n-gram approach to language modeling (predicting the next word based on the previous n-1 words) is particularly well-suited to such large amounts of data: it scales gracefully, and the non-parametric nature of the model allows it to grow with more data. For example, on Voice Search we were able to train and evaluate 5-gram language models consisting of 12 billion n-grams, built using large vocabularies (1 million words), and trained on as many as 230 billion words.
The computational effort pays off, as highlighted by the plot above: both word error rate (a measure of speech recognition accuracy) and search error rate (a metric we use to evaluate the output of the speech recognition system when used in a search engine) decrease significantly with larger language models.
A more detailed
summary of results on Voice Search and a few YouTube speech transcription tasks
(authors: Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar) presents our results when increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, as well as language model size and the performance of the underlying speech recognizer, we observe reductions in word error rate between 6% and 10% relative, for systems on a wide range of operating points.
Cross-posted with the Research at Google G+ Page
Speech Recognition and Deep Learning
Monday, August 06, 2012
Posted by Vincent Vanhoucke, Research Scientist, Speech Team
The New York Times recently published
an article
about Google’s large scale deep learning project, which learns to discover patterns in large datasets, including... cats on YouTube!
What’s the point of building a gigantic cat detector you might ask? When you combine large amounts of data, large-scale distributed computing and powerful machine learning algorithms, you can apply the technology to address a large variety of practical problems.
With the launch of the latest Android platform release, Jelly Bean, we’ve taken a significant step towards making that technology useful: when you speak to your Android phone, chances are, you are talking to a neural network trained to recognize your speech.
Using neural networks for speech recognition is nothing new: the first proofs of concept were developed in the late 1980s
(1)
, and after what can only be described as a 20-year dry-spell, evidence that the technology could scale to modern computing resources has recently begun to emerge
(2)
. What changed? Access to larger and larger databases of speech, advances in computing power, including GPUs and fast distributed computing clusters such as the
Google Compute Engine
, unveiled at
Google I/O
this year, and a better understanding of how to scale the algorithms to make them effective learners.
The research, which reduces the error rate by over 20%, will be presented
(3)
at a conference this September, but true to our
philosophy of integrated research
, we’re delighted to bring the bleeding edge to our users first.
--
1 Phoneme recognition using time-delay neural networks, A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K.J. Lang. IEEE Transactions on Acoustics, Speech and Signal Processing, vol.37, no.3, pp.328-339, Mar 1989.
2 Acoustic Modeling using Deep Belief Networks, A. Mohamed, G. Dahl and G. Hinton. Accepted for publication in IEEE Transactions on Audio, Speech and Language Processing.
3 Application Of Pretrained Deep Neural Networks To Large Vocabulary Speech Recognition, N. Jaitly, P. Nguyen, A. Senior and V. Vanhoucke, Accepted for publication in the Proceedings of Interspeech 2012.
Natural Language in Voice Search
Tuesday, July 31, 2012
Posted by Jakob Uszkoreit, Software Engineer
On July 26 and 27, we held our eighth annual
Computer Science Faculty Summit
on our Mountain View Campus. During the event, we brought you a series of blog posts dedicated to sharing the Summit's talks, panels and sessions, and we continue with this glimpse into natural language in voice search. --Ed
At this year’s Faculty Summit, I had the opportunity to showcase the newest version of
Google Voice Search
. This version hints at how Google Search, in particular on mobile devices and by voice, will become increasingly capable of responding to natural language queries.
I first outlined the trajectory of Google Voice Search, which was initially released in 2007.
Voice actions
, launched in 2010 for Android devices, made it possible to control your device by speaking to it. For example, if you wanted to set your device alarm for 10:00 AM, you could say “set alarm for 10:00 AM. Label: meeting on voice actions.” To indicate the subject of the alarm, a meeting about voice actions, you would have to use the keyword “label”! Certainly not everyone would think to frame the requested action this way. What if you could speak to your device in a more natural way and have it understand you?
At last month’s
Google I/O 2012
, we announced a version of voice actions that supports much more natural commands. For instance, your device will now set an alarm if you say “my meeting is at 10:00 AM, remind me”. This makes even previously existing functionality, such as sending a text message or calling someone, more discoverable on the device -- that is, if you express a voice command in whatever way feels natural to you, whether it be “let David know I’ll be late via text” or “make sure I buy milk by 3 pm”, there is now a good chance that your device will respond how you anticipated it to.
I then discussed some of the possibly unexpected decisions we made when designing the system we now use for interpreting natural language queries or requests. For example, as you would expect from Google, our approach to interpreting natural language queries is data-driven and relies heavily on machine learning. In complex machine learning systems, however, it is often difficult to figure out the underlying cause for an error: after supplying them with training and test data, you merely obtain a set of metrics that hopefully give a reasonable indication about the system’s quality but they fail to provide an explanation for why a certain input lead to a given, possibly wrong output.
As a result, even understanding why some mistakes were made requires experts in the field and detailed analysis, rendering it nearly impossible to harness non-experts in analyzing and improving such systems. To avoid this, we aim to make every partial decision of the system as interpretable as possible. In many cases, any random speaker of English could look at its possibly erroneous behavior in response to some input and quickly identify the underlying issue - and in some cases even fix it!
We are especially interested in working with our academic colleagues on some of the many fascinating research and engineering challenges in building large-scale, yet interpretable natural language understanding systems and devising the machine learning algorithms this requires.
Excellent Papers for 2011
Thursday, March 22, 2012
Posted by Corinna Cortes and Alfred Spector, Google Research
UPDATE: Added
Theo Vassilakis
as an author for "Dremel: Interactive Analysis of Web-Scale Datasets"
Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our
publications
offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google.
In an effort to highlight some of our work, we periodically select a number of publications to be featured on this blog. We first posted a
set of papers
on this blog in mid-2010 and subsequently discussed them in more detail in the following blog postings. In a
second round
, we highlighted new noteworthy papers from the later half of 2010. This time we honor the influential papers authored or co-authored by Googlers covering all of 2011 -- covering roughly 10% of our total publications. It’s tough choosing, so we may have left out some important papers. So, do see the
publications list
to review the complete group.
In the coming weeks we will be offering a more in-depth look at these publications, but here are some summaries:
Audio processing
“
Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function
”,
Richard F. Lyon
,
Journal of the Acoustical Society of America
, vol. 130 (2011), pp. 3893-3904.
Lyon's long title summarizes a result that he has been working toward over many years of modeling sound processing in the inner ear. This nonlinear cochlear model is shown to be "good" with respect to psychophysical data on masking, physiological data on mechanical and neural response, and computational efficiency. These properties derive from the close connection between wave propagation and filter cascades. This filter-cascade model of the ear is used as an efficient sound processor for several machine hearing projects at Google.
Electronic Commerce and Algorithms
“
Online Vertex-Weighted Bipartite Matching and Single-bid Budgeted Allocations
”,
Gagan Aggarwal
,
Gagan Goel
,
Chinmay Karande
,
Aranyak Mehta
,
SODA 2011
.
The authors introduce an elegant and powerful algorithmic technique to the area of online ad allocation and matching: a hybrid of random perturbations and greedy choice to make decisions on the fly. Their technique sheds new light on classic matching algorithms, and can be used, for example, to pick one among a set of relevant ads, without knowing in advance the demand for ad slots on future web page views.
“
Milgram-routing in social networks
”,
Silvio Lattanzi
, Alessandro Panconesi, D. Sivakumar,
Proceedings of the 20th International Conference on World Wide Web, WWW 2011
, pp. 725-734.
Milgram’s "six-degrees-of-separation experiment" and the fascinating small world hypothesis that follows from it, have generated a lot of interesting research in recent years. In this landmark experiment, Milgram showed that people unknown to each other are often connected by surprisingly short chains of acquaintances. In the paper we prove theoretically and experimentally how a recent model of social networks, "Affiliation Networks", offers an explanation to this phenomena and inspires interesting technique for local routing within social networks.
“
Non-Price Equilibria in Markets of Discrete Goods
”, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Noam Nisan,
EC
, 2011.
We present a correspondence between markets of indivisible items, and a family of auction based n player games. We show that a market has a price based (Walrasian) equilibrium if and only if the corresponding game has a pure Nash equilibrium. We then turn to markets which do not have a Walrasian equilibrium (which is the interesting case), and study properties of the mixed Nash equilibria of the corresponding games.
HCI
“
From Basecamp to Summit: Scaling Field Research Across 9 Locations
”,
Jens Riegelsberger
, Audrey Yang, Konstantin Samoylov, Elizabeth Nunge, Molly Stevens, Patrick Larvie,
CHI 2011 Extended Abstracts
.
The paper reports on our experience with a basecamp research hub to coordinate logistics and ongoing real-time analysis with research teams in the field. We also reflect on the implications for the meaning of research in a corporate context, where much of the value may be less in a final report, but more in the curated impressions and memories our colleagues take away from the the research trip.
“
User-Defined Motion Gestures for Mobile Interaction
”, Jaime Ruiz,
Yang Li
, Edward Lank,
CHI 2011: ACM Conference on Human Factors in Computing Systems
, pp. 197-206.
Modern smartphones contain sophisticated sensors that can detect rich motion gestures — deliberate movements of the device by end-users to invoke commands. However, little is known about best-practices in motion gesture design for the mobile computing paradigm. We systematically studied the design space of motion gestures via a guessability study that elicits end-user motion gestures to invoke commands on a smartphone device. The study revealed consensus among our participants on parameters of movement and on mappings of motion gestures onto commands, by which we developed a taxonomy for motion gestures and compiled an end-user inspired motion gesture set. The work lays the foundation of motion gesture design—a new dimension for mobile interaction.
Information Retrieval
“
Reputation Systems for Open Collaboration
”, B.T. Adler, L. de Alfaro,
A. Kulshreshtha
, I. Pye,
Communications of the ACM
, vol. 54 No. 8 (2011), pp. 81-87.
This paper describes content based reputation algorithms, that rely on automated content analysis to derive user and content reputation, and their applications for Wikipedia and google Maps. The Wikipedia reputation system WikiTrust relies on a chronological analysis of user contributions to articles, metering positive or negative increments of reputation whenever new contributions are made. The Google Maps system Crowdsensus compares the information provided by users on map business listings and computes both a likely reconstruction of the correct listing and a reputation value for each user. Algorithmic-based user incentives ensure the trustworthiness of evaluations of Wikipedia entries and Google Maps business information.
Machine Learning and Data Mining
“
Domain adaptation in regression
”,
Corinna Cortes
,
Mehryar Mohri
,
Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011
.
Domain adaptation is one of the most important and challenging problems in machine learning. This paper presents a series of theoretical guarantees for domain adaptation in regression, gives an adaptation algorithm based on that theory that can be cast as a semi-definite programming problem, derives an efficient solution for that problem by using results from smooth optimization, shows that the solution can scale to relatively large data sets, and reports extensive empirical results demonstrating the benefits of this new adaptation algorithm.
“
On the necessity of irrelevant variables
”, David P. Helmbold,
Philip M. Long
,
ICML
, 2011
Relevant variables sometimes do much more good than irrelevant variables do harm, so that it is possible to learn a very accurate classifier using predominantly irrelevant variables. We show that this holds given an assumption that formalizes the intuitive idea that the variables are non-redundant. For problems like this it can be advantageous to add many additional variables, even if only a small fraction of them are relevant.
“
Online Learning in the Manifold of Low-Rank Matrices
”,
Gal Chechik
, Daphna Weinshall, Uri Shalit,
Neural Information Processing Systems (NIPS 23)
, 2011, pp. 2128-2136.
Learning measures of similarity from examples of similar and dissimilar pairs is a problem that is hard to scale. LORETA uses retractions, an operator from matrix optimization, to learn low-rank similarity matrices efficiently. This allows to learn similarities between objects like images or texts when represented using many more features than possible before.
Machine Translation
“
Training a Parser for Machine Translation Reordering
”, Jason Katz-Brown,
Slav Petrov
,
Ryan McDonald
,
Franz Och
, David Talbot, Hiroshi Ichikawa, Masakazu Seno,
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11)
.
Machine translation systems often need to understand the syntactic structure of a sentence to translate it correctly. Traditionally, syntactic parsers are evaluated as standalone systems against reference data created by linguists. Instead, we show how to train a parser to optimize reordering accuracy in a machine translation system, resulting in measurable improvements in translation quality over a more traditionally trained parser.
“
Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation
”, Ashish Venugopal,
Jakob Uszkoreit
, David Talbot,
Franz Och
, Juri Ganitkevitch,
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP)
.
We propose a general method to watermark and probabilistically identify the structured results of machine learning algorithms with an application in statistical machine translation. Our approach does not rely on controlling or even knowing the inputs to the algorithm and provides probabilistic guarantees on the ability to identify collections of results from one’s own algorithm, while being robust to limited editing operations.
“
Inducing Sentence Structure from Parallel Corpora for Reordering
”,
John DeNero
,
Jakob Uszkoreit
,
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP)
.
Automatically discovering the full range of linguistic rules that govern the correct use of language is an appealing goal, but extremely challenging. Our paper describes a targeted method for discovering only those aspects of linguistic syntax necessary to explain how two different languages differ in their word ordering. By focusing on word order, we demonstrate an effective and practical application of unsupervised grammar induction that improves a Japanese to English machine translation system.
Multimedia and Computer Vision
“
Kernelized Structural SVM Learning for Supervised Object Segmentation
”,
Luca Bertelli
,
Tianli Yu
, Diem Vu, Burak Gokturk,
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011
.
The paper proposes a principled way for computers to learn how to segment the foreground from the background of an image given a set of training examples. The technology is build upon a specially designed nonlinear segmentation kernel under the recently proposed structured SVM learning framework.
“
Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths
”,
Matthias Grundmann
,
Vivek Kwatra
, Irfan Essa,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011).
Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our stabilization technique automatically converts casual shaky footage into more pleasant and professional looking videos by mimicking these cinematographic principles. The original, shaky camera path is divided into a set of segments, each approximated by either constant, linear or parabolic motion, using an algorithm based on robust L1 optimization. The stabilizer has been part of the YouTube Editor (
youtube.com/editor
) since March 2011.
“
The Power of Comparative Reasoning
”,
Jay Yagnik
, Dennis Strelow,
David Ross
, Ruei-Sung Lin,
International Conference on Computer Vision
(2011).
The paper describes a theory derived vector space transform that converts vectors into sparse binary vectors such that Euclidean space operations on the sparse binary vectors imply rank space operations in the original vector space. The transform a) does not need any data-driven supervised/unsupervised learning b) can be computed from polynomial expansions of the input space in linear time (in the degree of the polynomial) and c) can be implemented in 10-lines of code. We show competitive results on similarity search and sparse coding (for classification) tasks.
NLP
“
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
”, Dipanjan Das,
Slav Petrov
,
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL '11)
, 2011,
Best Paper Award
.
We would like to have natural language processing systems for all languages, but obtaining labeled data for all languages and tasks is unrealistic and expensive. We present an approach which leverages existing resources in one language (for example English) to induce part-of-speech taggers for languages without any labeled training data. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in a hidden Markov model trained with the Expectation Maximization algorithm.
Networks
“
TCP Fast Open
”, Sivasankar Radhakrishnan,
Yuchung Cheng
,
Jerry Chu
,
Arvind Jain
, Barath Raghavan,
Proceedings of the 7th International Conference on emerging Networking EXperiments and Technologies (CoNEXT)
, 2011.
TCP Fast Open enables data exchange during TCP’s initial handshake. It decreases application network latency by one full round-trip time, a significant speedup for today's short Web transfers. Our experiments on popular websites show that Fast Open reduces the whole-page load time over 10% on average, and in some cases up to 40%.
“
Proportional Rate Reduction for TCP
”,
Nandita Dukkipati
, Matt Mathis,
Yuchung Cheng
, Monia Ghobadi,
Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement 2011, Berlin, Germany - November 2-4, 2011
.
Packet losses increase latency of Web transfers and negatively impact user experience. Proportional rate reduction (PRR) is designed to recover from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs during TCP’s fast recovery. Experiments on Google Web and YouTube servers in U.S. and India demonstrate that PRR reduces the TCP latency of connections experiencing losses by 3-10% depending on response size.
Security and Privacy
“
Automated Analysis of Security-Critical JavaScript APIs
”, Ankur Taly,
Úlfar Erlingsson
, John C. Mitchell,
Mark S. Miller
, Jasvir Nagra,
IEEE Symposium on Security & Privacy (SP)
, 2011.
As software is increasingly written in high-level, type-safe languages, attackers have fewer means to subvert system fundamentals, and attacks are more likely to exploit errors and vulnerabilities in application-level logic. This paper describes a generic, practical defense against such attacks, which can protect critical application resources even when those resources are partially exposed to attackers via software interfaces. In the context of carefully-crafted fragments of JavaScript, the paper applies formal methods and semantics to prove that these defenses can provide complete, non-circumventable mediation of resource access; the paper also shows how an implementation of the techniques can establish the properties of widely-used software, and find previously-unknown bugs.
“
App Isolation: Get the Security of Multiple Browsers with Just One
”, Eric Y. Chen, Jason Bau,
Charles Reis
, Adam Barth, Collin Jackson,
18th ACM Conference on Computer and Communications Security
, 2011.
We find that anecdotal advice to use a separate web browser for sites like your bank is indeed effective at defeating most cross-origin web attacks. We also prove that a single web browser can provide the same key properties, for sites that fit within the compatibility constraints.
Speech
“
Improving the speed of neural networks on CPUs
”,
Vincent Vanhoucke
,
Andrew Senior
, Mark Z. Mao,
Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.
As deep neural networks become state-of-the-art in real-time machine learning applications such as speech recognition, computational complexity is fast becoming a limiting factor in their adoption. We show how to best leverage modern CPU architectures to significantly speed-up their inference.
“
Bayesian Language Model Interpolation for Mobile Speech Input
”,
Cyril Allauzen
,
Michael Riley
,
Interspeech 2011.
Voice recognition on the Android platform must contend with many possible target domains - e.g. search, maps, SMS. For each of these, a domain-specific language model was built by linearly interpolating several n-gram LMs from a common set of Google corpora. The current work has found a way to efficiently compute a single n-gram language model with accuracy very close to the domain-specific LMs but with considerably less complexity at recognition time.
Statistics
“
Large-Scale Parallel Statistical Forecasting Computations in R
”,
Murray Stokely
, Farzan Rohani, Eric Tassone,
JSM Proceedings, Section on Physical and Engineering Sciences
, 2011.
This paper describes the implementation of a framework for utilizing distributed computational infrastructure from within the R interactive statistical computing environment, with applications to timeseries forecasting. This system is widely used by the statistical analyst community at Google for data analysis on very large data sets.
Structured Data
“
Dremel: Interactive Analysis of Web-Scale Datasets
”, Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton,
Theo Vassilakis
,
Communications of the ACM
, vol. 54 (2011), pp. 114-123.
Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. Besides continued growth internally to Google, Dremel now also backs an increasing number of external customers including BigQuery and UIs such as AdExchange front-end.
“
Representative Skylines using Threshold-based Preference Distributions
”,
Atish Das Sarma
, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, Jim Xu,
International Conference on Data Engineering (ICDE)
, 2011.
The paper adopts principled approach towards representative skylines and formalizes the problem of displaying k tuples such that the probability that a random user clicks on one of them is maximized. This requires mathematically modeling (a) the likelihood with which a user is interested in a tuple, as well as (b) how one negotiates the lack of knowledge of an explicit set of users. This work presents theoretical and experimental results showing that the suggested algorithm significantly outperforms previously suggested approaches.
“
Hyper-local, directions-based ranking of places
”, Petros Venetis, Hector Gonzalez,
Alon Y. Halevy
, Christian S. Jensen,
PVLDB
, vol. 4(5) (2011), pp. 290-30.
Click through information is one of the strongest signals we have for ranking web pages. We propose an equivalent signal for raking real world places: The number of times that people ask for precise directions to the address of the place. We show that this signal is competitive in quality with human reviews while being much cheaper to collect, we also show that the signal can be incorporated efficiently into a location search system.
Systems
“
Power Management of Online Data-Intensive Services
”, David Meisner, Christopher M. Sadler,
Luiz André Barroso
,
Wolf-Dietrich Weber
, Thomas F. Wenisch,
Proceedings of the 38th ACM International Symposium on Computer Architecture
, 2011.
Compute and data intensive Web services (such as Search) are a notoriously hard target for energy savings techniques. This article characterizes the statistical hardware activity behavior of servers running Web search and discusses the potential opportunities of existing and proposed energy savings techniques.
“
The Impact of Memory Subsystem Resource Sharing on Datacenter Applications
”, Lingjia Tang, Jason Mars, Neil Vachharajani,
Robert Hundt
, Mary-Lou Soffa,
ISCA
, 2011.
In this work, the authors expose key characteristics of an emerging class of Google-style workloads and show how to enhance system software to take advantage of these characteristics to improve efficiency in data centers. The authors find that across datacenter applications, there is both a sizable benefit and a potential degradation from improperly sharing micro-architectural resources on a single machine (such as on-chip caches and bandwidth to memory). The impact of co-locating threads from multiple applications with diverse memory behavior changes the optimal mapping of thread to cores for each application. By employing an adaptive thread-to-core mapper, the authors improved the performance of the datacenter applications by up to 22% over status quo thread-to-core mapping, achieving performance within 3% of optimal.
“
Language-Independent Sandboxing of Just-In-Time Compilation and Self-Modifying Code
”, Jason Ansel, Petr Marchenko,
Úlfar Erlingsson
, Elijah Taylor,
Brad Chen
, Derek Schuff, David Sehr,
Cliff L. Biffle
, Bennet S. Yee,
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
, 2011.
Since its introduction in the early 90's, Software Fault Isolation, or SFI, has been a static code technique, commonly perceived as incompatible with dynamic libraries, runtime code generation, and other dynamic code. This paper describes how to address this limitation and explains how the SFI techniques in Google Native Client were extended to support modern language implementations based on just-in-time code generation and runtime instrumentation. This work is already deployed in Google Chrome, benefitting millions of users, and was developed over a summer collaboration with three Ph.D. interns; it exemplifies how Research at Google is focused on rapidly bringing significant benefits to our users through groundbreaking technology and real-world products.
“
Thialfi: A Client Notification Service for Internet-Scale Applications
”, Atul Adya, Gregory Cooper,
Daniel Myers
,
Michael Piatek
,
Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP)
, 2011, pp. 129-142.
This paper describes a notification service that scales to hundreds of millions of users, provides sub-second latency in the common case, and guarantees delivery even in the presence of a wide variety of failures. The service has been deployed in several popular Google applications including Chrome, Google Plus, and Contacts.
Voice Search in Underrepresented Languages
Tuesday, November 09, 2010
Posted by Pedro J. Moreno, Staff Research Scientist and Johan Schalkwyk, Senior Staff Engineer
Welkom*!
Today we’re introducing Voice Search support for Zulu and Afrikaans, as well as South African-accented English. The addition of Zulu in particular represents our first effort in building Voice Search for
underrepresented languages
.
We define underrepresented languages as those which, while spoken by millions, have little presence in electronic and physical media, e.g., webpages, newspapers and magazines. Underrepresented languages have also often received little attention from the speech research community. Their phonetics, grammar, acoustics, etc., haven’t been extensively studied, making the development of ASR (automatic speech recognition) voice search systems challenging.
We believe that the speech research community needs to start working on many of these underrepresented languages to advance progress and build speech recognition, translation and other Natural Language Processing (NLP) technologies. The development of NLP technologies in these languages is critical for enabling
information access for everybody
. Indeed, these technologies have the potential to break language barriers.
We also think it’s important that researchers in these countries take a leading role in advancing the state of the art in their own languages. To this end, we’ve collaborated with the Multilingual Speech Technology group at
South Africa’s North-West University
led by Prof. Ettiene Barnard (also of the
Meraka Research Institute
), an authority in speech technology for South African languages. Our development effort was spearheaded by Charl van Heerden, a South African intern and a student of Prof. Barnard. With the help of Prof. Barnard’s team, we collected acoustic data in the three languages, developed lexicons and grammars, and Charl and others used those to develop the three Voice Search systems. A team of language specialists traveled to several cities collecting audio samples from hundreds of speakers in multiple acoustic conditions such as street noise, background speech, etc. Speakers were asked to read typical search queries into an
Android app
specifically designed for audio data collection.
For Zulu, we faced the additional challenge of few text sources on the web. We often analyze the search queries from local versions of Google to build our lexicons and language models. However, for Zulu there weren’t enough queries to build a useful language model. Furthermore, since it has few online data sources, native speakers have learned to use a mix of Zulu and English when searching for information on the web. So for our Zulu Voice Search product, we had to build a truly hybrid recognizer, allowing free mixture of both languages. Our phonetic inventory covers both English and Zulu and our grammars allow natural switching from Zulu to English, emulating speaker behavior.
This is our first release of Voice Search in a native African language, and we hope that it won’t be the last. We’ll continue to work on technology for languages that have until now received little attention from the speech recognition community.
Salani kahle!**
* “Welcome” in Afrikaans
** “Stay well” in Zulu
Making an Impact on a Thriving Speech Research Community
Monday, October 11, 2010
Posted by Vincent Vanhoucke, Google Research
While we continue to launch exciting new speech products--most recently
Voice Actions
and
Google Search by Voice
in Russian, Czech and Polish--we also strive to contribute to the academic research community by sharing both innovative techniques and experiences with large-scale systems.
This year’s gathering of the world’s experts in speech technology research,
Interspeech 2010
in Makuhari, Japan, which Google co-sponsored, was a fantastic demonstration of the momentum of this community, driven by new challenges such as mobile voice communication, voice search, and the increasing international reach of speech technologies.
Googlers published papers that showcased the breadth and depth of our speech recognition research. Our work addresses both fundamental problems in acoustic and language modeling, as well as the practical issues of building scalable speech interfaces that real people use everyday to make their lives easier.
Here is a list of the papers presented by Googlers at the conference:
Direct Construction of Compact Context-Dependency Transducers From Data
, David Rybach and
Michael Riley
(Computer Speech & Language Best Paper Award).
Voice Search for Development
, Etienne Barnard, Johan Schalkwyk, Charl van Heerden and
Pedro J. Moreno
.
Unsupervised Discovery and Training of Maximally Dissimilar Cluster Models
,
Françoise Beaufays
,
Vincent Vanhoucke
and Brian Strope.
Search by Voice in Mandarin Chinese
, Jiulong Shan, Genqing Wu, Zhihong Hu, Xiliu Tang,
Martin Jansche
and
Pedro J. Moreno
.
On-Demand Language Model Interpolation for Mobile Speech Input
, Brandon Ballinger,
Cyril Allauzen
,
Alexander Gruenstein
, and Johan Schalkwyk.
Building Transcribed Speech Corpora Quickly and Cheaply for Many Languages
,
Thad Hughes
, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno and Mike LeBeau.
Say What? Why Users Choose to Speak their Web Queries
,
Maryam Kamvar
and Doug Beeferman.
Study on Interaction between Entropy Pruning and Kneser-Ney Smoothing
,
Ciprian Chelba
, Thorsten Brants, Will Neveitt and Peng Xu.
Decision Tree State Clustering with Word and Syllable Features
,
Hank Liao
, Chris Alberti, Michiel Bacchiani and Olivier Siohan.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gmail
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PhD Fellowship
PhotoScan
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2017
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.