Google Research Blog
The latest news from Research at Google
Education Awards on Google App Engine
Wednesday, March 27, 2013
Posted by Andrea Held, Google University Relations
Google Developers Blog
Last year we
proposals for innovative projects built on Google’s infrastructure. Today we are pleased to announce the 11 recipients of a
Google App Engine Education Award
. Professors and their students are using the award in cloud computing courses to study databases, distributed systems, web mashups and to build educational applications. Each selected project received $1000 in Google App Engine credits.
Awarding computational resources to classroom projects is always gratifying. It is impressive to see the creative ideas students and educators bring to these programs.
Below is a brief introduction to each project. Congratulations to the recipients!
John David N. Dionisio
, Loyola Marymount University
: The objective of this undergraduate database systems course is for students to implement one database application in two technology stacks, a traditional relational database and on Google App Engine. Students are asked to study both models and provide concrete comparison points.
Xiaohui (Helen) Gu
, North Carolina State University
Advanced Distributed Systems Class
The goal of the project is to allow the students to learn distributed system concepts by developing real distributed system management systems and testing them on real world cloud computing infrastructures such as Google App Engine.
, Brown University
is a programming environment that runs in the Web browser and supports interactive development. WeScheme uses App Engine to handle user accounts, serverside compilation, and file management.
, University of Utah
: A graduate-level course that will be offered in Fall 2013 on the design and implementation of large data management system kernels. The objective is to integrate features from a relational database engine with some of the new features from NoSQL systems to enable efficient and scalable data management over a cluster of commodity machines.
, Illinois Wesleyan University
is a free, simple classroom-response system built on Google App Engine. It lets students give instant, anonymous feedback to teachers about a lecture or discussion from any computer or mobile device with a web browser, facilitating more adaptive class sessions.
, Wellesley College
: Topics in Computer Science: Web Mashups. A CS2 course that combines Google App Engine and MIT App Inventor. Students will learn to build apps with App Inventor to collect data about their life on campus. They will use Google App Engine to build web services and apps to host the data and remix it to create web mashups. Offered in the 2013 Spring semester.
, Rutgers University
: Cloud Computing for Scientific Applications -- Autonomic Cloud Computing teaches students how a hybrid HPC/Grid + Cloud cyber infrastructure can be effectively used to support real-world science and engineering applications. The goal of our efforts is to explore application formulations, Cloud and hybrid HPC/Grid + Cloud infrastructure usage modes that are meaningful for various classes of science and engineering application workflows.
, Wellesley College
GreenTouch is a collaborative environment that enables novice users to engage in authentic scientific inquiry. It consists of a mobile user interface for capturing data in the field, a web application for data curation in the cloud, and a tabletop user interface for exploratory analysis of heterogeneous data.
, University of Michigan
: WeLearn Mobile Platform: Making Mobile Devices Effective Tools for K-12. The platform makes mobile devices (Android, iOS, WP8) effective, essential tools for all-the-time, everywhere learning. WeLearn’s suite of productivity and communication apps enable learners to work collaboratively; WeLearn’s portal, hosted on Google App Engine, enables teachers to send assignments, review, and grade student artifacts. WeLearn is available to educators at no charge.
, Harding University
: Teaching Cloud Computing in an Introduction to Engineering class for freshmen. We explore how well-designed systems are built to withstand unpredictable stresses, whether that system is a building, a piece of software or even the human body. The grant from Google is allowing us to add an overview of cloud computing as a platform that is robust under diverse loads.
Dr. Jiaofei Zhong
, University of Central Missouri
: By building an online Course Management System, students will be able to work on their team projects in the cloud. The system allows instructors and students to manage the course materials, including course syllabus, slides, assignments and tests in the cloud; the tool can be shared with educational institutions worldwide.
Scaling Computer Science Education
Wednesday, March 13, 2013
Posted by Maggie Johnson, Director of Education and University Relations
Last week, I attended the annual
(Special Interest Group, Computer Science Education) conference in Denver, CO. Google has been a platinum sponsor of SIGCSE for many years now, and the conference provides an opportunity for hundreds of computer science (CS) educators to share ideas and work on strategies to bring high quality CS education to K12 and undergraduate students.
Significant accomplishments over the last few years have laid a strong foundation for scaling CS curriculum, professional development (PD) and related programs in this country. The
has been funding curriculum and PD around the new
Advanced Placement course. The
for K12 CS and a
on the limited extent to which schools, districts and states provide CS instruction to their students. CS Advocacy group,
Computing in the Core
, even provides a toolkit for communities to follow as they urge legislators for integration of Computer Science education into core K12 curriculum.
All of this work has made an impact, but there is still more to do.
I see our priorities in CS education to be ones of awareness and access. As CS educators, we must continue to raise awareness about the tremendous demand for jobs in the computing sector, and balance misconceptions with accurate data. Many students, parents, teachers and administrators remember the hype and disillusionment of the Dotcom period and myths on outsourcing and dwindling jobs yet the US Bureau of Labor Statistics (BLS) reports that ⅔ of all job growth in Science and Engineering will be in Computer Science employment over the next decade. (See 2010 BLS report
.) Clearing up this misconception is essential if we hope to satisfy US labor needs with recent graduates over the next several years.
Source: Gianchandani, Erwin. Revisiting ‘Where the Jobs Are’. The Computing Community Consortium Blog post on 23 May 2012.
accessed on 8 March 2013.
Another misconception surrounds the range of CS-focused occupations that exist. The world of CS is expanding rapidly and we should celebrate the diversity of CS applications that are gaining momentum. Instead of the archetype of a sun-starved computer scientist, or software engineers working in isolation with little teamwork or communication opportunities, educators can encourage project-based learning, video game development, robotics, and graphic design as more concrete representations for abstract computational thinking.
Google believes that computing and CS are critical to our future, not only in the high tech sector, but for everyone. Our economy is becoming more and more dependent on technology-based solutions, which will require a future workforce with significant levels of CS knowledge and experience. In addition, we anticipate new career opportunities opening up in the next 3-5 years as more businesses move into the cloud and shift the way they run their IT departments.
Help us get the word out about the great opportunities in computing through organizations such as
. Google is doing its part to support CS education and outreach through many programs including
Exploring Computational Thinking
curriculum, and several
programs. So much opportunity, so little time!
Our Commitment to Social Computing Research: Social Interactions Focused Awards Announcement
Tuesday, March 12, 2013
Ed H. Chi, Staff Research Scientist
Social interactions have always been an important part of the human experience. Social interaction research has shown results ranging from
influences on our behavior from social networks
our understanding of social belonging on health
[Walton2011], as well as
how conflicts and coordination play out in Wikipedia
[Kittur2007]. Interestingly, social scientists have studied social interactions for many years, but it wasn’t until very recently that researchers can study these mechanisms through the explosion of services and data available on web-based social systems.
From information dissemination and the spread of innovation and ideas, to scientific discovery, we are seeing how a deep understanding of social interactions is affecting many different fields, such as health and education. For instance, scientists now have strong evidence that
social interactions underlie many fundamental learning mechanisms
starting from infancy well into adulthood [Meltzoff2009], and that
peer discussions are critical in conceptual learning in college classes
[Smith2009]. How might these learning science findings be built into social systems and products so that users maximize what they learn on the Web?
We know that interactions on the Web are diverse and people-centered. Google now enables social interactions to occur across many of our products, from
to Search to
. To understand the future of this socially connected web, we need to investigate fundamental patterns, design principles, and laws that shape and govern these social interactions.
We envision research at the intersection of disciplines including Computer Science, Human-Computer Interaction (HCI), Social Science, Social Psychology, Machine Learning, Big Data Analytics, Statistics and Economics. These fields are central to the study of how social interactions work, particularly driven by new sources of data, for example, open data sets from Web2.0 and social media sites, government databases, crowdsourcing, new survey techniques, and crisis management data collections. New techniques from network science and computational modeling, social network and sentiment analysis, application of statistical and machine learning, as well as theories from evolutionary theory, physics, and information theory, are actively being used in social interaction research.
We’re pleased to announce that Google has awarded over $1.2 million dollars to support the Social Interactions Research Awards, which are given to university research groups doing work in social computing and interactions. Research topics range from crowdsourcing, social annotations, a social media behavioral study, social learning, conversation curation, and scientific studies of how to start online communities.
We have awarded 15 researchers in 7 universities. We selected these proposals after a rigorous internal review. We believe the results will be broadly useful to product development and will further scientific research.
Joseph Konstan, Loren Terveen, and John Riedl from University of Minnesota. Precision Crowdsourcing: Closing the Loop to turn Information Consumers into Information Contributors.
Mor Naaman from Rutgers University, and Oded Nov from Polytechnic Institute of New York University. Examining the Impact of Social Traces on Page Visitors’ Opinions and Engagement.
Paul Resnick, Eytan Adar, and Cliff Lampe from University of Michigan. MTogether: A Living Lab for Social Media Research.
Marti Hearst from UC Berkeley. Understanding Social Learning Among Subgroups Within Large Online Learning Environments.
David Karger and Rob Miller from MIT. Crowdsourced Curation of Conversations.
Robert Kraut, Laura Dabbish, Jason Hong, Aniket Kittur from CMU. Successfully Starting Online Groups.
We look forward to working with these researchers, and we hope that we will jointly push the frontier of social interactions research to the next level.
 Aral, S., & Walker, D. (2012). Identifying Influential and Susceptible Members of Social Networks. Science , 337 (6092 ), 337–341. doi:10.1126/science.1215842
 Walton, G. M., & Cohen, G. L. (2011). A Brief Social-Belonging Intervention Improves Academic and Health Outcomes of Minority Students. Science , 331 (6023 ), 1447–1451. doi:10.1126/science.1198364
 Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed H. Chi.
He Says, She Says: Conflict and Coordination in Wikipedia
. In Proc. of ACM Conference on Human Factors in Computing Systems (CHI2007), pp. 453--462, April 2007. ACM Press. San Jose, CA.
 Meltzoff, A. N., Kuhl, P. K., Movellan, J., & Sejnowski, T. J. (2009). Foundations for a New Science of Learning. Science , 325 (5938), 284–288. doi:10.1126/science.1175626
 Smith, M. K., Wood, W. B., Adams, W. K., Wieman, C., Knight, J. K., Guild, N., & Su, T. T. (2009). Why Peer Discussion Improves Student Performance on In-Class Concept Questions. Science , 323 (5910), 122–124. doi:10.1126/science.1165919
Learning from Big Data: 40 Million Entities in Context
Friday, March 08, 2013
Posted by Dave Orr, Amar Subramanya, and Fernando Pereira, Google Research
When someone mentions Mercury, are they talking about the
, or one of some
89 other possibilities
? This problem is called
(a word that is itself
), and while it’s necessary for communication, and humans are amazingly good at it (when was the last time you confused a
giant tech company
?), computers need help.
To provide that help, we are releasing the Wikilinks Corpus: 40 million total disambiguated mentions within over 10 million web pages -- over 100 times bigger than the next largest corpus (about 100,000 documents, see the table below for mention and entity counts). The mentions are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If we think of each page on Wikipedia as an entity (
an idea we’ve discussed before
), then the anchor text can be thought of as a mention of the corresponding entity.
Number of Mentions
Number of Entities
Bentivogli et al.
Day et al.
less than 55,000
Artiles et al.
What might you do with this data? Well, we’ve already written one
ACL paper on cross-document co-reference
(and received lots of requests for the underlying data, which partly motivates this release). And really, we look forward to seeing what you are going to do with it! But here are a few ideas:
-- when different mentions mention the same entity -- or
-- matching a mention to the underlying entity
Work on the bigger problem of
, which is how to find out if different web pages are talking about the same person or other entity
Learn things about entities by aggregating information across all the documents they’re mentioned in
tries to assign types (they could be broad, like person, location, or specific, like amusement park ride) to entities. To the extent that the Wikipedia pages contain the type information you’re interested in, it would be easy to construct a training set that annotates the Wikilinks entities with types from Wikipedia.
Work on any of the above, or more, on subsets of the data. With existing datasets, it wasn’t possible to work on just musicians or chefs or train stations, because the sample sizes would be too small. But with 10 million Web pages, you can find a decent sampling of almost anything.
How do you actually get the data? It’s right here:
Google’s Wikilinks Corpus
. Tools and data with extra context can be found on our partners’ page:
. Understanding the corpus, however, is a little bit involved.
For copyright reasons, we cannot distribute actual annotated web pages. Instead, we’re providing an index of URLs, and the tools to create the dataset, or whichever slice of it you care about, yourself. Specifically, we’re providing:
The URLs of all the pages that contain labeled mentions, which are links to English Wikipedia
The anchor text of the link (the mention string), the Wikipedia link target, and the byte offset of the link for every page in the set
The byte offset of the 10 least frequent words on the page, to act as a signature to ensure that the underlying text hasn’t changed -- think of this as a version, or fingerprint, of the page
Software tools (on the
) to: download the web pages; extract the mentions, with ways to recover if the byte offsets don’t match; select the text around the mentions as local context; and compute evaluation metrics over predicted entities.
The format looks like this:
MENTION Lincoln Continental Mark IV 40110 http://en.wikipedia.org/wiki/Lincoln_Continental_Mark_IV
MENTION 1975 MGB roadster 41481 http://en.wikipedia.org/wiki/MG_MGB
MENTION Buick Riviera 43316 http://en.wikipedia.org/wiki/Buick_Riviera
MENTION Oldsmobile Toronado 43397 http://en.wikipedia.org/wiki/Oldsmobile_Toronado
TOKEN seen 58190
TOKEN crush 63118
TOKEN owners 69290
TOKEN desk 59772
TOKEN relocate 70683
TOKEN promote 35016
TOKEN between 70846
TOKEN re 52821
TOKEN getting 68968
TOKEN felt 41508
We’d love to hear what you’re working on, and look forward to what you can do with 40 million mentions across over 10 million web pages!
Thanks to our collaborators at
Natural Language Processing
Natural Language Understanding
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Cloud Platform
Google Play Apps
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog