Google Research Blog
The latest news from Research at Google
Large Scale Language Modeling in Automatic Speech Recognition
Wednesday, October 31, 2012
Posted by Ciprian Chelba, Research Scientist
At Google, we’re able to use the large amounts of data made available by the Web’s fast growth. Two such data sources are the anonymized queries on google.com and the web itself. They help improve automatic speech recognition through large language models: Voice Search makes use of the former, whereas YouTube speech transcription benefits significantly from the latter.
The language model is the component of a speech recognizer that assigns a probability to the next word in a sentence given the previous ones. As an example, if the previous words are “new york”, the model would assign a higher probability to “pizza” than say “granola”. The n-gram approach to language modeling (predicting the next word based on the previous n-1 words) is particularly well-suited to such large amounts of data: it scales gracefully, and the non-parametric nature of the model allows it to grow with more data. For example, on Voice Search we were able to train and evaluate 5-gram language models consisting of 12 billion n-grams, built using large vocabularies (1 million words), and trained on as many as 230 billion words.
The computational effort pays off, as highlighted by the plot above: both word error rate (a measure of speech recognition accuracy) and search error rate (a metric we use to evaluate the output of the speech recognition system when used in a search engine) decrease significantly with larger language models.
A more detailed
summary of results on Voice Search and a few YouTube speech transcription tasks
(authors: Ciprian Chelba, Dan Bikel, Maria Shugrina, Patrick Nguyen, Shankar Kumar) presents our results when increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, as well as language model size and the performance of the underlying speech recognizer, we observe reductions in word error rate between 6% and 10% relative, for systems on a wide range of operating points.
Cross-posted with the Research at Google G+ Page
Natural Language Processing
Ngram Viewer 2.0
Thursday, October 18, 2012
Posted by Jon Orwant, Engineering Manager
Since launching the
Google Books Ngram Viewer
, we’ve been overjoyed by the public reception. Co-creator Will Brockman and I hoped that the ability to track the usage of phrases across time would be of interest to professional linguists, historians, and bibliophiles. What we didn’t expect was its popularity among casual users. Since the launch in 2010, the Ngram Viewer has been used about 50 times every minute to explore how phrases have been used in books spanning the centuries. That’s over 45 million graphs created, each one a glimpse into the history of the written word. For instance, comparing
, you can see when each word peaked:
Meanwhile, Google Books reached a milestone, having scanned 20 million books. That’s approximately one-seventh of all the books published since Gutenberg invented the printing press. We’ve updated the Ngram Viewer datasets to include a lot of those new books we’ve scanned, as well as improvements our engineers made in OCR and in hammering out inconsistencies between library and publisher metadata. (We’ve kept the old dataset around for scientists pursuing empirical, replicable language experiments such as the ones Jean-Baptiste Michel and Erez Lieberman Aiden conducted for our
At Google, we’re also trying to understand the meaning behind what people write, and to do that it helps to understand grammar. Last summer Slav Petrov of Google’s Natural Language Processing group and his intern Yuri Lin (who’s since joined Google full-time) built a system that identified parts of speech—nouns, adverbs, conjunctions and so forth—for all of the words in the millions of Ngram Viewer books. Now, for instance, you can compare the verb and noun forms of “cheer” to see how the frequencies have converged over time:
Some users requested the ability to combine Ngrams, and Googler Matthew Gray generalized that notion into what we’re calling Ngram compositions: the ability to add, subtract, multiply, and divide Ngram counts. For instance, you can see how “record player” rose at the expense of “Victrola”:
explains all the details about this curious notion of treating phrases like components of a mathematical expression. We’re guessing they’ll only be of interest to lexicographers, but then again that’s what we thought about Ngram Viewer 1.0.
Oh, and we added Italian too, supplementing our current languages: English, Chinese, Spanish, French, German, Hebrew, and Russian. Buon divertimento!
Natural Language Processing
ReFr: A New Open-Source Framework for Building Reranking Models
Thursday, October 04, 2012
, Research Scientists at Google
We are pleased to announce the release of an open source, general-purpose framework designed for reranking problems, ReFr (Reranker Framework), now available at:
Many types of systems capable of processing speech and human language text produce multiple hypothesized outputs for a given input, each with a score. In the case of machine translation systems, these hypotheses correspond to possible translations from some sentence in a source language to a target language. In the case of speech recognition, the hypotheses are possible word sequences of what was said derived from the input audio. The goal of such systems is usually to produce a single output for a given input, and so they almost always just pick the highest-scoring hypothesis.
is a system that uses a trained model to rerank these scored hypotheses, possibly inducing a different ranked order. The goal is that by employing a second model after the fact, one can make use of additional information not available to the original model, and produce better overall results. This approach has been shown to be useful for a wide variety of speech and natural language processing problems, and was the
subject of one of the groups
at the 2011 summer workshop at Johns Hopkins’ Center for Language and Speech Processing. At that workshop, led by Professor Brian Roark of Oregon Health & Science University, we began building a general-purpose framework for training and using reranking models. The result of all this work is
From the outset, we designed ReFr with both speed and flexibility in mind. The core implementation is entirely in C++, with a flexible architecture allowing rich experimentation with both features and learning methods. The framework also employs a powerful runtime configuration mechanism to make experimentation even easier. Finally, ReFr leverages the parallel processing power of Hadoop to train and use large-scale reranking models in a distributed computing environment.
EMEA Faculty Summit 2012
Tuesday, October 02, 2012
Michel Benard, University Relations Manager
Last week we held our fifth Europe, Middle East and Africa (EMEA) Faculty Summit in London, bringing together 94 of EMEA’s foremost computer science academics from 65 universities representing 25 countries, together with more than 60 Googlers.
This year’s jam-packed agenda included a welcome reception at the
(plus a tour of the special exhibition: “
Codebreaker - Alan Turing’s life and legacy
”), a keynote on “Research at Google” by
, Vice President of Research and Special Initiatives and a welcome address by Nelson Mattos, Vice President of Engineering and Products in EMEA, covering Google’s engineering activity and recent innovations in the region.
The Faculty Summit is a chance for us to meet with academics in Computer Science and other areas to discuss the latest exciting developments in research and education, and to explore ways in which we can collaborate via our our
University Relations programs
The two and a half day program consisted of tech talks, break out sessions, a panel on online education, and demos. The program covered a variety of computer science topics including Infrastructure, Cloud Computing Applications, Information Retrieval, Machine Translation, Audio/Video, Machine Learning, User Interface, e-Commerce, Digital Humanities, Social Media, and Privacy. For example,
Ed H. Chi
summarized how researchers use
data analysis to understand the ways users share content with their audiences
Circle feature in Google+
summarized how UI design and user experience research is essential to creating a seamless experience on Google Maps.
discussed some of the research challenges - and opportunities - associated with building, managing, and using computer systems at massive scale. Breakout sessions ranged from technical follow-ups on the talk topics to discussing ways to increase the presence of women in computer science.
We also held one-on-one sessions where academics and Googlers could meet privately and discuss topics of personal interest, such as how to develop a compelling research award proposal, how to apply for a sabbatical at Google or how to gain Google support for a conference in a particular research area.
The Summit provides a great opportunity to build and strengthen research and academic collaborations. Our hope is to drive research and education forward by fostering mutually beneficial relationships with our academic colleagues and their universities.
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Cloud Platform
Google Play Apps
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog